Skip to main content

Showing 1–7 of 7 results for author: Rousseeuw, P J

  1. arXiv:2302.03931  [pdf, other

    stat.ML cs.LG stat.ME

    Fast Linear Model Trees by PILOT

    Authors: Jakob Raymaekers, Peter J. Rousseeuw, Tim Verdonck, Ruicong Yao

    Abstract: Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addit… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

    Journal ref: Machine Learning, 2024

  2. Silhouettes and quasi residual plots for neural nets and tree-based classifiers

    Authors: Jakob Raymaekers, Peter J. Rousseeuw

    Abstract: Classification by neural nets and by tree-based methods are powerful tools of machine learning. There exist interesting visualizations of the inner workings of these and other classifiers. Here we pursue a different goal, which is to visualize the cases being classified, either in training data or in test data. An important aspect is whether a case has been classified to its given class (label) or… ▽ More

    Submitted 26 February, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

    Journal ref: Journal of Computational and Graphical Statistics 2022, Volume 31, 1332-1343

  3. arXiv:2008.05171  [pdf, other

    cs.LG cs.AI stat.ML

    Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms

    Authors: Erich Schubert, Peter J. Rousseeuw

    Abstract: Clustering non-Euclidean data is difficult, and one of the most used algorithms besides hierarchical clustering is the popular algorithm Partitioning Around Medoids (PAM), also simply referred to as k-medoids clustering. In Euclidean geometry the mean-as used in k-means-is a good estimator for the cluster center, but this does not exist for arbitrary dissimilarities. PAM uses the medoid instead, t… ▽ More

    Submitted 1 June, 2021; v1 submitted 12 August, 2020; originally announced August 2020.

    Journal ref: Information Systems 2021, 101804

  4. arXiv:2008.02046  [pdf, other

    stat.ML cs.LG stat.CO

    Outlier detection in non-elliptical data by kernel MRCD

    Authors: Joachim Schreurs, Iwein Vranckx, Mia Hubert, Johan A. K. Suykens, Peter J. Rousseeuw

    Abstract: The minimum regularized covariance determinant method (MRCD) is a robust estimator for multivariate location and scatter, which detects outliers by fitting a robust covariance matrix to the data. Its regularization ensures that the covariance matrix is well-conditioned in any dimension. The MRCD assumes that the non-outlying observations are roughly elliptically distributed, but many datasets are… ▽ More

    Submitted 29 March, 2021; v1 submitted 5 August, 2020; originally announced August 2020.

    Journal ref: Statistics and Computing, 2021, Volume 31, article 66

  5. arXiv:2007.14495  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Class maps for visualizing classification results

    Authors: Jakob Raymaekers, Peter J. Rousseeuw, Mia Hubert

    Abstract: Classification is a major tool of statistics and machine learning. A classification method first processes a training set of objects with given classes (labels), with the goal of afterward assigning new objects to one of these classes. When running the resulting prediction method on the training data or on test data, it can happen that an object is predicted to lie in a class that differs from its… ▽ More

    Submitted 19 May, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

    Comments: Appeared online, Technometrics

    Journal ref: Technometrics 2022, Vol. 64, pages 151-165

  6. Transforming variables to central normality

    Authors: Jakob Raymaekers, Peter J. Rousseeuw

    Abstract: Many real data sets contain numerical features (variables) whose distribution is far from normal (gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box-Cox and Yeo-Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformati… ▽ More

    Submitted 21 November, 2020; v1 submitted 16 May, 2020; originally announced May 2020.

    Journal ref: Machine Learning, 2021

  7. Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

    Authors: Erich Schubert, Peter J. Rousseeuw

    Abstract: Clustering non-Euclidean data is difficult, and one of the most used algorithms besides hierarchical clustering is the popular algorithm Partitioning Around Medoids (PAM), also simply referred to as k-medoids. In Euclidean geometry the mean-as used in k-means-is a good estimator for the cluster center, but this does not hold for arbitrary dissimilarities. PAM uses the medoid instead, the object wi… ▽ More

    Submitted 29 October, 2019; v1 submitted 12 October, 2018; originally announced October 2018.

    Journal ref: Similarity Search and Applications, SISAP 2019