subscribe to arXiv mailings

Regression Trees for Fast and Adaptive Prediction Intervals

Authors: Luben M. C. Cabezas, Mateus P. Otto, Rafael Izbicki, Rafael B. Stern

Abstract: Predictive models make mistakes. Hence, there is a need to quantify the uncertainty associated with their predictions. Conformal inference has emerged as a powerful tool to create statistically valid prediction regions around point predictions, but its naive application to regression problems yields non-adaptive regions. New conformal scores, often relying upon quantile regressors or conditional d… ▽ More Predictive models make mistakes. Hence, there is a need to quantify the uncertainty associated with their predictions. Conformal inference has emerged as a powerful tool to create statistically valid prediction regions around point predictions, but its naive application to regression problems yields non-adaptive regions. New conformal scores, often relying upon quantile regressors or conditional density estimators, aim to address this limitation. Although they are useful for creating prediction bands, these scores are detached from the original goal of quantifying the uncertainty around an arbitrary predictive model. This paper presents a new, model-agnostic family of methods to calibrate prediction intervals for regression problems with local coverage guarantees. Our approach is based on pursuing the coarsest partition of the feature space that approximates conditional coverage. We create this partition by training regression trees and Random Forests on conformity scores. Our proposal is versatile, as it applies to various conformity scores and prediction settings and demonstrates superior scalability and performance compared to established baselines in simulated and real-world datasets. We provide a Python package clover that implements our methods using the standard scikit-learn interface. △ Less

Submitted 13 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

arXiv:2301.09671 [pdf, other]

Flexible conditional density estimation for time series

Authors: Gustavo Grivol, Rafael Izbicki, Alex A. Okuno, Rafael B. Stern

Abstract: This paper introduces FlexCodeTS, a new conditional density estimator for time series. FlexCodeTS is a flexible nonparametric conditional density estimator, which can be based on an arbitrary regression method. It is shown that FlexCodeTS inherits the rate of convergence of the chosen regression method. Hence, FlexCodeTS can adapt its convergence by employing the regression method that best fits t… ▽ More This paper introduces FlexCodeTS, a new conditional density estimator for time series. FlexCodeTS is a flexible nonparametric conditional density estimator, which can be based on an arbitrary regression method. It is shown that FlexCodeTS inherits the rate of convergence of the chosen regression method. Hence, FlexCodeTS can adapt its convergence by employing the regression method that best fits the structure of data. From an empirical perspective, FlexCodeTS is compared to NNKCDE and GARCH in both simulated and real data. FlexCodeTS is shown to generally obtain the best performance among the selected methods according to either the CDE loss or the pinball loss. △ Less

Submitted 23 January, 2023; originally announced January 2023.

Comments: 19 pages, 7 figures

MSC Class: 00-01; 99-00

arXiv:2112.01372 [pdf, other]

Hierarchical clustering: visualization, feature importance and model selection

Authors: Luben M. C. Cabezas, Rafael Izbicki, Rafael B. Stern

Abstract: We propose methods for the analysis of hierarchical clustering that fully use the multi-resolution structure provided by a dendrogram. Specifically, we propose a loss for choosing between clustering methods, a feature importance score and a graphical tool for visualizing the segmentation of features in a dendrogram. Current approaches to these tasks lead to loss of information since they require t… ▽ More We propose methods for the analysis of hierarchical clustering that fully use the multi-resolution structure provided by a dendrogram. Specifically, we propose a loss for choosing between clustering methods, a feature importance score and a graphical tool for visualizing the segmentation of features in a dendrogram. Current approaches to these tasks lead to loss of information since they require the user to generate a single partition of the instances by cutting the dendrogram at a specified level. Our proposed methods, instead, use the full structure of the dendrogram. The key insight behind the proposed methods is to view a dendrogram as a phylogeny. This analogy permits the assignment of a feature value to each internal node of a tree through an evolutionary model. Real and simulated datasets provide evidence that our proposed framework has desirable outcomes and gives more insights than state-of-art approaches. We provide an R package that implements our methods. △ Less

Submitted 27 January, 2023; v1 submitted 30 November, 2021; originally announced December 2021.

Comments: 29 pages, 9 figures, 3 tables

ACM Class: I.5.3

arXiv:2007.12778 [pdf, other]

CD-split and HPD-split: efficient conformal regions in high dimensions

Authors: Rafael Izbicki, Gilson Shimizu, Rafael B. Stern

Abstract: Conformal methods create prediction bands that control average coverage assuming solely i.i.d. data. Although the literature has mostly focused on prediction intervals, more general regions can often better represent uncertainty. For instance, a bimodal target is better represented by the union of two intervals. Such prediction regions are obtained by CD-split , which combines the split method and… ▽ More Conformal methods create prediction bands that control average coverage assuming solely i.i.d. data. Although the literature has mostly focused on prediction intervals, more general regions can often better represent uncertainty. For instance, a bimodal target is better represented by the union of two intervals. Such prediction regions are obtained by CD-split , which combines the split method and a data-driven partition of the feature space which scales to high dimensions. CD-split however contains many tuning parameters, and their role is not clear. In this paper, we provide new insights on CD-split by exploring its theoretical properties. In particular, we show that CD-split converges asymptotically to the oracle highest predictive density set and satisfies local and asymptotic conditional validity. We also present simulations that show how to tune CD-split. Finally, we introduce HPD-split, a variation of CD-split that requires less tuning, and show that it shares the same theoretical guarantees as CD-split. In a wide variety of our simulations, CD-split and HPD-split have better conditional coverage and yield smaller prediction regions than other methods. △ Less

Submitted 4 October, 2021; v1 submitted 24 July, 2020; originally announced July 2020.

Comments: 34 pages, 15 figures

MSC Class: 62G15

arXiv:1910.05575 [pdf, other]

Flexible distribution-free conditional predictive bands using density estimators

Authors: Rafael Izbicki, Gilson T. Shimizu, Rafael B. Stern

Abstract: Conformal methods create prediction bands that control average coverage under no assumptions besides i.i.d. data. Besides average coverage, one might also desire to control conditional coverage, that is, coverage for every new testing point. However, without strong assumptions, conditional coverage is unachievable. Given this limitation, the literature has focused on methods with asymptotical cond… ▽ More Conformal methods create prediction bands that control average coverage under no assumptions besides i.i.d. data. Besides average coverage, one might also desire to control conditional coverage, that is, coverage for every new testing point. However, without strong assumptions, conditional coverage is unachievable. Given this limitation, the literature has focused on methods with asymptotical conditional coverage. In order to obtain this property, these methods require strong conditions on the dependence between the target variable and the features. We introduce two conformal methods based on conditional density estimators that do not depend on this type of assumption to obtain asymptotic conditional coverage: Dist-split and CD-split. While Dist-split asymptotically obtains optimal intervals, which are easier to interpret than general regions, CD-split obtains optimal size regions, which are smaller than intervals. CD-split also obtains local coverage by creating a data-driven partition of the feature space that scales to high-dimensional settings and by generating prediction bands locally on the partition elements. In a wide variety of simulated scenarios, our methods have a better control of conditional coverage and have smaller length than previously proposed methods. △ Less

Submitted 9 December, 2019; v1 submitted 12 October, 2019; originally announced October 2019.

arXiv:1908.00105 [pdf, other]

Conditional independence testing: a predictive perspective

Authors: Marco Henrique de Almeida Inácio, Rafael Izbicki, Rafael Bassi Stern

Abstract: Conditional independence testing is a key problem required by many machine learning and statistics tools. In particular, it is one way of evaluating the usefulness of some features on a supervised prediction problem. We propose a novel conditional independence test in a predictive setting, and show that it achieves better power than competing approaches in several settings. Our approach consists i… ▽ More Conditional independence testing is a key problem required by many machine learning and statistics tools. In particular, it is one way of evaluating the usefulness of some features on a supervised prediction problem. We propose a novel conditional independence test in a predictive setting, and show that it achieves better power than competing approaches in several settings. Our approach consists in deriving a p-value using a permutation test where the predictive power using the unpermuted dataset is compared with the predictive power of using dataset where the feature(s) of interest are permuted. We conclude that the method achives sensible results on simulated and real datasets. △ Less

Submitted 31 July, 2019; originally announced August 2019.

arXiv:1807.03929 [pdf, other]

Quantification under prior probability shift: the ratio estimator and its extensions

Authors: Afonso Fernandes Vaz, Rafael Izbicki, Rafael Bassi Stern

Abstract: The quantification problem consists of determining the prevalence of a given label in a target population. However, one often has access to the labels in a sample from the training population but not in the target population. A common assumption in this situation is that of prior probability shift, that is, once the labels are known, the distribution of the features is the same in the training and… ▽ More The quantification problem consists of determining the prevalence of a given label in a target population. However, one often has access to the labels in a sample from the training population but not in the target population. A common assumption in this situation is that of prior probability shift, that is, once the labels are known, the distribution of the features is the same in the training and target populations. In this paper, we derive a new lower bound for the risk of the quantification problem under the prior shift assumption. Complementing this lower bound, we present a new approximately minimax class of estimators, ratio estimators, which generalize several previous proposals in the literature. Using a weaker version of the prior shift assumption, which can be tested, we show that ratio estimators can be used to build confidence intervals for the quantification problem. We also extend the ratio estimator so that it can: (i) incorporate labels from the target population, when they are available and (ii) estimate how the prevalence of positive labels varies according to a function of certain covariates. △ Less

Submitted 5 April, 2019; v1 submitted 10 July, 2018; originally announced July 2018.

Comments: 33 pages, 15 figures

MSC Class: 62F12; 62G05; 62G08

arXiv:1405.3292 [pdf, other]

doi 10.1002/sam.11206

Learning with many experts: model selection and sparsity

Authors: Rafael Izbicki, Rafael Bassi Stern

Abstract: Experts classifying data are often imprecise. Recently, several models have been proposed to train classifiers using the noisy labels generated by these experts. How to choose between these models? In such situations, the true labels are unavailable. Thus, one cannot perform model selection using the standard versions of methods such as empirical risk minimization and cross validation. In order to… ▽ More Experts classifying data are often imprecise. Recently, several models have been proposed to train classifiers using the noisy labels generated by these experts. How to choose between these models? In such situations, the true labels are unavailable. Thus, one cannot perform model selection using the standard versions of methods such as empirical risk minimization and cross validation. In order to allow model selection, we present a surrogate loss and provide theoretical guarantees that assure its consistency. Next, we discuss how this loss can be used to tune a penalization which introduces sparsity in the parameters of a traditional class of models. Sparsity provides more parsimonious models and can avoid overfitting. Nevertheless, it has seldom been discussed in the context of noisy labels due to the difficulty in model selection and, therefore, in choosing tuning parameters. We apply these techniques to several sets of simulated and real data. △ Less

Submitted 13 May, 2014; originally announced May 2014.

Comments: This is the pre-peer reviewed version

Journal ref: Izbicki, R., Stern, R. B. "Learning with many experts: Model selection and sparsity." Statistical Analysis and Data Mining 6.6 (2013): 565-577

Showing 1–8 of 8 results for author: Stern, R B