-
Policy Trees for Prediction: Interpretable and Adaptive Model Selection for Machine Learning
Authors:
Dimitris Bertsimas,
Matthew Peroni
Abstract:
As a multitude of capable machine learning (ML) models become widely available in forms such as open-source software and public APIs, central questions remain regarding their use in real-world applications, especially in high-stakes decision-making. Is there always one best model that should be used? When are the models likely to be error-prone? Should a black-box or interpretable model be used? I…
▽ More
As a multitude of capable machine learning (ML) models become widely available in forms such as open-source software and public APIs, central questions remain regarding their use in real-world applications, especially in high-stakes decision-making. Is there always one best model that should be used? When are the models likely to be error-prone? Should a black-box or interpretable model be used? In this work, we develop a prescriptive methodology to address these key questions, introducing a tree-based approach, Optimal Predictive-Policy Trees (OP2T), that yields interpretable policies for adaptively selecting a predictive model or ensemble, along with a parameterized option to reject making a prediction. We base our methods on learning globally optimized prescriptive trees. Our approach enables interpretable and adaptive model selection and rejection while only assuming access to model outputs. By learning policies over different feature spaces, including the model outputs, our approach works with both structured and unstructured datasets. We evaluate our approach on real-world datasets, including regression and classification tasks with both structured and unstructured data. We demonstrate that our approach provides both strong performance against baseline methods while yielding insights that help answer critical questions about which models to use, and when.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Catastrophe Insurance: An Adaptive Robust Optimization Approach
Authors:
Dimitris Bertsimas,
Cynthia Zeng
Abstract:
The escalating frequency and severity of natural disasters, exacerbated by climate change, underscore the critical role of insurance in facilitating recovery and promoting investments in risk reduction. This work introduces a novel Adaptive Robust Optimization (ARO) framework tailored for the calculation of catastrophe insurance premiums, with a case study applied to the United States National Flo…
▽ More
The escalating frequency and severity of natural disasters, exacerbated by climate change, underscore the critical role of insurance in facilitating recovery and promoting investments in risk reduction. This work introduces a novel Adaptive Robust Optimization (ARO) framework tailored for the calculation of catastrophe insurance premiums, with a case study applied to the United States National Flood Insurance Program (NFIP). To the best of our knowledge, it is the first time an ARO approach has been applied to for disaster insurance pricing. Our methodology is designed to protect against both historical and emerging risks, the latter predicted by machine learning models, thus directly incorporating amplified risks induced by climate change. Using the US flood insurance data as a case study, optimization models demonstrate effectiveness in covering losses and produce surpluses, with a smooth balance transition through parameter fine-tuning. Among tested optimization models, results show ARO models with conservative parameter values achieving low number of insolvent states with the least insurance premium charged. Overall, optimization frameworks offer versatility and generalizability, making it adaptable to a variety of natural disaster scenarios, such as wildfires, droughts, etc. This work not only advances the field of insurance premium modeling but also serves as a vital tool for policymakers and stakeholders in building resilience to the growing risks of natural catastrophes.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
M3H: Multimodal Multitask Machine Learning for Healthcare
Authors:
Dimitris Bertsimas,
Yu Ma
Abstract:
Developing an integrated many-to-many framework leveraging multimodal data for multiple tasks is crucial to unifying healthcare applications ranging from diagnoses to operations. In resource-constrained hospital environments, a scalable and unified machine learning framework that improves previous forecast performances could improve hospital operations and save costs. We introduce M3H, an explaina…
▽ More
Developing an integrated many-to-many framework leveraging multimodal data for multiple tasks is crucial to unifying healthcare applications ranging from diagnoses to operations. In resource-constrained hospital environments, a scalable and unified machine learning framework that improves previous forecast performances could improve hospital operations and save costs. We introduce M3H, an explainable Multimodal Multitask Machine Learning for Healthcare framework that consolidates learning from tabular, time-series, language, and vision data for supervised binary/multiclass classification, regression, and unsupervised clustering. It features a novel attention mechanism balancing self-exploitation (learning source-task), and cross-exploration (learning cross-tasks), and offers explainability through a proposed TIM score, shedding light on the dynamics of task learning interdependencies. M3H encompasses an unprecedented range of medical tasks and machine learning problem classes and consistently outperforms traditional single-task models by on average 11.6% across 40 disease diagnoses from 16 medical departments, three hospital operation forecasts, and one patient phenotyping task. The modular design of the framework ensures its generalizability in data processing, task definition, and rapid model prototyping, making it production ready for both clinical and operational healthcare settings, especially those in constrained environments.
△ Less
Submitted 8 June, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Towards Stable Machine Learning Model Retraining via Slowly Varying Sequences
Authors:
Dimitris Bertsimas,
Vassilis Digalakis Jr,
Yu Ma,
Phevos Paschalidis
Abstract:
We consider the task of retraining machine learning (ML) models when new batches of data become available. Existing methods focus largely on greedy approaches to find the best-performing model for each batch, without considering the stability of the model's structure across retraining iterations. In this study, we propose a methodology for finding sequences of ML models that are stable across retr…
▽ More
We consider the task of retraining machine learning (ML) models when new batches of data become available. Existing methods focus largely on greedy approaches to find the best-performing model for each batch, without considering the stability of the model's structure across retraining iterations. In this study, we propose a methodology for finding sequences of ML models that are stable across retraining iterations. We develop a mixed-integer optimization formulation that is guaranteed to recover Pareto optimal models (in terms of the predictive power-stability trade-off) and an efficient polynomial-time algorithm that performs well in practice. We focus on retaining consistent analytical insights - which is important to model interpretability, ease of implementation, and fostering trust with users - by using custom-defined distance metrics that can be directly incorporated into the optimization problem. Our method shows stronger stability than greedily trained models with a small, controllable sacrifice in predictive power, as evidenced through a real-world case study in a major hospital system in Connecticut.
△ Less
Submitted 22 May, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Adaptive Optimization for Prediction with Missing Data
Authors:
Dimitris Bertsimas,
Arthur Delarue,
Jean Pauphilet
Abstract:
When training predictive models on data with missing entries, the most widely used and versatile approach is a pipeline technique where we first impute missing entries and then compute predictions. In this paper, we view prediction with missing data as a two-stage adaptive optimization problem and propose a new class of models, adaptive linear regression models, where the regression coefficients a…
▽ More
When training predictive models on data with missing entries, the most widely used and versatile approach is a pipeline technique where we first impute missing entries and then compute predictions. In this paper, we view prediction with missing data as a two-stage adaptive optimization problem and propose a new class of models, adaptive linear regression models, where the regression coefficients adapt to the set of observed features. We show that some adaptive linear regression models are equivalent to learning an imputation rule and a downstream linear regression model simultaneously instead of sequentially. We leverage this joint-impute-then-regress interpretation to generalize our framework to non-linear models. In settings where data is strongly not missing at random, our methods achieve a 2-10% improvement in out-of-sample accuracy.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Universal Neurons in GPT2 Language Models
Authors:
Wes Gurnee,
Theo Horsley,
Zifan Carl Guo,
Tara Rezaei Kheirkhah,
Qinyi Sun,
Will Hathaway,
Neel Nanda,
Dimitris Bertsimas
Abstract:
A basic question within the emerging field of mechanistic interpretability is the degree to which neural networks learn the same underlying mechanisms. In other words, are neural mechanisms universal across different models? In this work, we study the universality of individual neurons across GPT2 models trained from different initial random seeds, motivated by the hypothesis that universal neuron…
▽ More
A basic question within the emerging field of mechanistic interpretability is the degree to which neural networks learn the same underlying mechanisms. In other words, are neural mechanisms universal across different models? In this work, we study the universality of individual neurons across GPT2 models trained from different initial random seeds, motivated by the hypothesis that universal neurons are likely to be interpretable. In particular, we compute pairwise correlations of neuron activations over 100 million tokens for every neuron pair across five different seeds and find that 1-5\% of neurons are universal, that is, pairs of neurons which consistently activate on the same inputs. We then study these universal neurons in detail, finding that they usually have clear interpretations and taxonomize them into a small number of neuron families. We conclude by studying patterns in neuron weights to establish several universal functional roles of neurons in simple circuits: deactivating attention heads, changing the entropy of the next token distribution, and predicting the next token to (not) be within a particular set.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Robust Regression over Averaged Uncertainty
Authors:
Dimitris Bertsimas,
Yu Ma
Abstract:
We propose a new formulation of robust regression by integrating all realizations of the uncertainty set and taking an averaged approach to obtain the optimal solution for the ordinary least-squared regression problem. We show that this formulation surprisingly recovers ridge regression and establishes the missing link between robust optimization and the mean squared error approaches for existing…
▽ More
We propose a new formulation of robust regression by integrating all realizations of the uncertainty set and taking an averaged approach to obtain the optimal solution for the ordinary least-squared regression problem. We show that this formulation surprisingly recovers ridge regression and establishes the missing link between robust optimization and the mean squared error approaches for existing regression problems. We first prove the equivalence for four uncertainty sets: ellipsoidal, box, diamond, and budget, and provide closed-form formulations of the penalty term as a function of the sample size, feature size, as well as perturbation protection strength. We then show in synthetic datasets with different levels of perturbations, a consistent improvement of the averaged formulation over the existing worst-case formulation in out-of-sample performance. Importantly, as the perturbation level increases, the improvement increases, confirming our method's advantage in high-noise environments. We report similar improvements in the out-of-sample datasets in real-world regression problems obtained from UCI datasets.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Global Optimization: A Machine Learning Approach
Authors:
Dimitris Bertsimas,
Georgios Margaritis
Abstract:
Many approaches for addressing Global Optimization problems typically rely on relaxations of nonlinear constraints over specific mathematical primitives. This is restricting in applications with constraints that are black-box, implicit or consist of more general primitives. Trying to address such limitations, Bertsimas and Ozturk (2023) proposed OCTHaGOn as a way of solving black-box global optimi…
▽ More
Many approaches for addressing Global Optimization problems typically rely on relaxations of nonlinear constraints over specific mathematical primitives. This is restricting in applications with constraints that are black-box, implicit or consist of more general primitives. Trying to address such limitations, Bertsimas and Ozturk (2023) proposed OCTHaGOn as a way of solving black-box global optimization problems by approximating the nonlinear constraints using hyperplane-based Decision-Trees and then using those trees to construct a unified mixed integer optimization (MIO) approximation of the original problem. We provide extensions to this approach, by (i) approximating the original problem using other MIO-representable ML models besides Decision Trees, such as Gradient Boosted Trees, Multi Layer Perceptrons and Suport Vector Machines, (ii) proposing adaptive sampling procedures for more accurate machine learning-based constraint approximations, (iii) utilizing robust optimization to account for the uncertainty of the sample-dependent training of the ML models, and (iv) leveraging a family of relaxations to address the infeasibilities of the final MIO approximation. We then test the enhanced framework in 81 Global Optimization instances. We show improvements in solution feasibility and optimality in the majority of instances. We also compare against BARON, showing improved optimality gaps or solution times in 11 instances.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
The R.O.A.D. to precision medicine
Authors:
Dimitris Bertsimas,
Angelos G. Koulouras,
Georgios Antonios Margonis
Abstract:
We propose a prognostic stratum matching framework that addresses the deficiencies of Randomized trial data subgroup analysis and transforms ObservAtional Data to be used as if they were randomized, thus paving the road for precision medicine. Our approach counters the effects of unobserved confounding in observational data by correcting the estimated probabilities of the outcome under a treatment…
▽ More
We propose a prognostic stratum matching framework that addresses the deficiencies of Randomized trial data subgroup analysis and transforms ObservAtional Data to be used as if they were randomized, thus paving the road for precision medicine. Our approach counters the effects of unobserved confounding in observational data by correcting the estimated probabilities of the outcome under a treatment through a novel two-step process. These probabilities are then used to train Optimal Policy Trees (OPTs), which are decision trees that optimally assign treatments to subgroups of patients based on their characteristics. This facilitates the creation of clinically intuitive treatment recommendations. We applied our framework to observational data of patients with gastrointestinal stromal tumors (GIST) and validated the OPTs in an external cohort using the sensitivity and specificity metrics. We show that these recommendations outperformed those of experts in GIST. We further applied the same framework to randomized clinical trial (RCT) data of patients with extremity sarcomas. Remarkably, despite the initial trial results suggesting that all patients should receive treatment, our framework, after addressing imbalances in patient distribution due to the trial's small sample size, identified through the OPTs a subset of patients with unique characteristics who may not require treatment. Again, we successfully validated our recommendations in an external cohort.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
A Machine Learning Approach to Two-Stage Adaptive Robust Optimization
Authors:
Dimitris Bertsimas,
Cheol Woo Kim
Abstract:
We propose an approach based on machine learning to solve two-stage linear adaptive robust optimization (ARO) problems with binary here-and-now variables and polyhedral uncertainty sets. We encode the optimal here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the optimal wait-and-see decisions into what we denote as the strategy. We solve multi…
▽ More
We propose an approach based on machine learning to solve two-stage linear adaptive robust optimization (ARO) problems with binary here-and-now variables and polyhedral uncertainty sets. We encode the optimal here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the optimal wait-and-see decisions into what we denote as the strategy. We solve multiple similar ARO instances in advance using the column and constraint generation algorithm and extract the optimal strategies to generate a training set. We train a machine learning model that predicts high-quality strategies for the here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the wait-and-see decisions. We also introduce an algorithm to reduce the number of different target classes the machine learning algorithm needs to be trained on. We apply the proposed approach to the facility location, the multi-item inventory control and the unit commitment problems. Our approach solves ARO problems drastically faster than the state-of-the-art algorithms with high accuracy.
△ Less
Submitted 7 December, 2023; v1 submitted 23 July, 2023;
originally announced July 2023.
-
Optimal Control of Multiclass Fluid Queueing Networks: A Machine Learning Approach
Authors:
Dimitris Bertsimas,
Cheol Woo Kim
Abstract:
We propose a machine learning approach to the optimal control of multiclass fluid queueing networks (MFQNETs) that provides explicit and insightful control policies. We prove that a threshold type optimal policy exists for MFQNET control problems, where the threshold curves are hyperplanes passing through the origin. We use Optimal Classification Trees with hyperplane splits (OCT-H) to learn an op…
▽ More
We propose a machine learning approach to the optimal control of multiclass fluid queueing networks (MFQNETs) that provides explicit and insightful control policies. We prove that a threshold type optimal policy exists for MFQNET control problems, where the threshold curves are hyperplanes passing through the origin. We use Optimal Classification Trees with hyperplane splits (OCT-H) to learn an optimal control policy for MFQNETs. We use numerical solutions of MFQNET control problems as a training set and apply OCT-H to learn explicit control policies. We report experimental results with up to 33 servers and 99 classes that demonstrate that the learned policies achieve 100\% accuracy on the test set. While the offline training of OCT-H can take days in large networks, the online application takes milliseconds.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
Compressed Sensing: A Discrete Optimization Approach
Authors:
Dimitris Bertsimas,
Nicholas A. G. Johnson
Abstract:
We study the Compressed Sensing (CS) problem, which is the problem of finding the most sparse vector that satisfies a set of linear measurements up to some numerical tolerance. We introduce an $\ell_2$ regularized formulation of CS which we reformulate as a mixed integer second order cone program. We derive a second order cone relaxation of this problem and show that under mild conditions on the r…
▽ More
We study the Compressed Sensing (CS) problem, which is the problem of finding the most sparse vector that satisfies a set of linear measurements up to some numerical tolerance. We introduce an $\ell_2$ regularized formulation of CS which we reformulate as a mixed integer second order cone program. We derive a second order cone relaxation of this problem and show that under mild conditions on the regularization parameter, the resulting relaxation is equivalent to the well studied basis pursuit denoising problem. We present a semidefinite relaxation that strengthens the second order cone relaxation and develop a custom branch-and-bound algorithm that leverages our second order cone relaxation to solve small-scale instances of CS to certifiable optimality. When compared against solutions produced by three state of the art benchmark methods on synthetic data, our numerical results show that our approach produces solutions that are on average $6.22\%$ more sparse. When compared only against the experiment-wise best performing benchmark method on synthetic data, our approach produces solutions that are on average $3.10\%$ more sparse. On real world ECG data, for a given $\ell_2$ reconstruction error our approach produces solutions that are on average $9.95\%$ more sparse than benchmark methods ($3.88\%$ more sparse if only compared against the best performing benchmark), while for a given sparsity level our approach produces solutions that have on average $10.77\%$ lower reconstruction error than benchmark methods ($1.42\%$ lower error if only compared against the best performing benchmark). When used as a component of a multi-label classification algorithm, our approach achieves greater classification accuracy than benchmark compressed sensing methods. This improved accuracy comes at the cost of an increase in computation time by several orders of magnitude.
△ Less
Submitted 11 July, 2024; v1 submitted 4 June, 2023;
originally announced June 2023.
-
Improving Stability in Decision Tree Models
Authors:
Dimitris Bertsimas,
Vassilis Digalakis Jr
Abstract:
Owing to their inherently interpretable structure, decision trees are commonly used in applications where interpretability is essential. Recent work has focused on improving various aspects of decision trees, including their predictive power and robustness; however, their instability, albeit well-documented, has been addressed to a lesser extent. In this paper, we take a step towards the stabiliza…
▽ More
Owing to their inherently interpretable structure, decision trees are commonly used in applications where interpretability is essential. Recent work has focused on improving various aspects of decision trees, including their predictive power and robustness; however, their instability, albeit well-documented, has been addressed to a lesser extent. In this paper, we take a step towards the stabilization of decision tree models through the lens of real-world health care applications due to the relevance of stability and interpretability in this space. We introduce a new distance metric for decision trees and use it to determine a tree's level of stability. We propose a novel methodology to train stable decision trees and investigate the existence of trade-offs that are inherent to decision tree models - including between stability, predictive power, and interpretability. We demonstrate the value of the proposed methodology through an extensive quantitative and qualitative analysis of six case studies from real-world health care applications, and we show that, on average, with a small 4.6% decrease in predictive power, we gain a significant 38% improvement in the model's stability.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Patient Outcome Predictions Improve Operations at a Large Hospital Network
Authors:
Liangyuan Na,
Kimberly Villalobos Carballo,
Jean Pauphilet,
Ali Haddad-Sisakht,
Daniel Kombert,
Melissa Boisjoli-Langlois,
Andrew Castiglione,
Maram Khalifa,
Pooja Hebbal,
Barry Stein,
Dimitris Bertsimas
Abstract:
Problem definition: Access to accurate predictions of patients' outcomes can enhance medical staff's decision-making, which ultimately benefits all stakeholders in the hospitals. A large hospital network in the US has been collaborating with academics and consultants to predict short-term and long-term outcomes for all inpatients across their seven hospitals. Methodology/results: We develop machin…
▽ More
Problem definition: Access to accurate predictions of patients' outcomes can enhance medical staff's decision-making, which ultimately benefits all stakeholders in the hospitals. A large hospital network in the US has been collaborating with academics and consultants to predict short-term and long-term outcomes for all inpatients across their seven hospitals. Methodology/results: We develop machine learning models that predict the probabilities of next 24-hr/48-hr discharge and intensive care unit transfers, end-of-stay mortality and discharge dispositions. All models achieve high out-of-sample AUC (75.7%-92.5%) and are well calibrated. In addition, combining 48-hr discharge predictions with doctors' predictions simultaneously enables more patient discharges (10%-28.7%) and fewer 7-day/30-day readmissions ($p$-value $<0.001$). We implement an automated pipeline that extracts data and updates predictions every morning, as well as user-friendly software and a color-coded alert system to communicate these patient-level predictions (alongside explanations) to clinical teams. Managerial implications: Since we have been gradually deploying the tool, and training medical staff, over 200 doctors, nurses, and case managers across seven hospitals use it in their daily patient review process. We observe a significant reduction in the average length of stay (0.67 days per patient) following its adoption and anticipate substantial financial benefits (between \$55 and \$72 million annually) for the healthcare system.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Optimal Low-Rank Matrix Completion: Semidefinite Relaxations and Eigenvector Disjunctions
Authors:
Dimitris Bertsimas,
Ryan Cory-Wright,
Sean Lo,
Jean Pauphilet
Abstract:
Low-rank matrix completion consists of computing a matrix of minimal complexity that recovers a given set of observations as accurately as possible. Unfortunately, existing methods for matrix completion are heuristics that, while highly scalable and often identifying high-quality solutions, do not possess any optimality guarantees. We reexamine matrix completion with an optimality-oriented eye. We…
▽ More
Low-rank matrix completion consists of computing a matrix of minimal complexity that recovers a given set of observations as accurately as possible. Unfortunately, existing methods for matrix completion are heuristics that, while highly scalable and often identifying high-quality solutions, do not possess any optimality guarantees. We reexamine matrix completion with an optimality-oriented eye. We reformulate these low-rank problems as convex problems over the non-convex set of projection matrices and implement a disjunctive branch-and-bound scheme that solves them to certifiable optimality. Further, we derive a novel and often tight class of convex relaxations by decomposing a low-rank matrix as a sum of rank-one matrices and incentivizing that two-by-two minors in each rank-one matrix have determinant zero. In numerical experiments, our new convex relaxations decrease the optimality gap by two orders of magnitude compared to existing attempts, and our disjunctive branch-and-bound scheme solves nxn rank-r matrix completion problems to certifiable optimality in hours for n<=150 and r<=5.
△ Less
Submitted 26 January, 2024; v1 submitted 20 May, 2023;
originally announced May 2023.
-
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Authors:
Wes Gurnee,
Neel Nanda,
Matthew Pauly,
Katherine Harvey,
Dmitrii Troitskii,
Dimitris Bertsimas
Abstract:
Despite rapid adoption and deployment of large language models (LLMs), the internal computations of these models remain opaque and poorly understood. In this work, we seek to understand how high-level human-interpretable features are represented within the internal neuron activations of LLMs. We train $k$-sparse linear classifiers (probes) on these internal activations to predict the presence of f…
▽ More
Despite rapid adoption and deployment of large language models (LLMs), the internal computations of these models remain opaque and poorly understood. In this work, we seek to understand how high-level human-interpretable features are represented within the internal neuron activations of LLMs. We train $k$-sparse linear classifiers (probes) on these internal activations to predict the presence of features in the input; by varying the value of $k$ we study the sparsity of learned representations and how this varies with model scale. With $k=1$, we localize individual neurons which are highly relevant for a particular feature, and perform a number of case studies to illustrate general properties of LLMs. In particular, we show that early layers make use of sparse combinations of neurons to represent many features in superposition, that middle layers have seemingly dedicated neurons to represent higher-level contextual features, and that increasing scale causes representational sparsity to increase on average, but there are multiple types of scaling dynamics. In all, we probe for over 100 unique features comprising 10 different categories in 7 different models spanning 70 million to 6.9 billion parameters.
△ Less
Submitted 2 June, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Ensemble Modeling for Time Series Forecasting: an Adaptive Robust Optimization Approach
Authors:
Dimitris Bertsimas,
Leonard Boussioux
Abstract:
Accurate time series forecasting is critical for a wide range of problems with temporal data. Ensemble modeling is a well-established technique for leveraging multiple predictive models to increase accuracy and robustness, as the performance of a single predictor can be highly variable due to shifts in the underlying data distribution. This paper proposes a new methodology for building robust ense…
▽ More
Accurate time series forecasting is critical for a wide range of problems with temporal data. Ensemble modeling is a well-established technique for leveraging multiple predictive models to increase accuracy and robustness, as the performance of a single predictor can be highly variable due to shifts in the underlying data distribution. This paper proposes a new methodology for building robust ensembles of time series forecasting models. Our approach utilizes Adaptive Robust Optimization (ARO) to construct a linear regression ensemble in which the models' weights can adapt over time. We demonstrate the effectiveness of our method through a series of synthetic experiments and real-world applications, including air pollution management, energy consumption forecasting, and tropical cyclone intensity forecasting. Our results show that our adaptive ensembles outperform the best ensemble member in hindsight by 16-26% in root mean square error and 14-28% in conditional value at risk and improve over competitive ensemble techniques.
△ Less
Submitted 9 April, 2023;
originally announced April 2023.
-
Reducing Air Pollution through Machine Learning
Authors:
Dimitris Bertsimas,
Leonard Boussioux,
Cynthia Zeng
Abstract:
This paper presents a data-driven approach to mitigate the effects of air pollution from industrial plants on nearby cities by linking operational decisions with weather conditions. Our method combines predictive and prescriptive machine learning models to forecast short-term wind speed and direction and recommend operational decisions to reduce or pause the industrial plant's production. We exhib…
▽ More
This paper presents a data-driven approach to mitigate the effects of air pollution from industrial plants on nearby cities by linking operational decisions with weather conditions. Our method combines predictive and prescriptive machine learning models to forecast short-term wind speed and direction and recommend operational decisions to reduce or pause the industrial plant's production. We exhibit several trade-offs between reducing environmental impact and maintaining production activities. The predictive component of our framework employs various machine learning models, such as gradient-boosted tree-based models and ensemble methods, for time series forecasting. The prescriptive component utilizes interpretable optimal policy trees to propose multiple trade-offs, such as reducing dangerous emissions by 33-47% and unnecessary costs by 40-63%. Our deployed models significantly reduced forecasting errors, with a range of 38-52% for less than 12-hour lead time and 14-46% for 12 to 48-hour lead time compared to official weather forecasts. We have successfully implemented the predictive component at the OCP Safi site, which is Morocco's largest chemical industrial plant, and are currently in the process of deploying the prescriptive component. Our framework enables sustainable industrial development by eliminating the pollution-industrial activity trade-off through data-driven weather-based operational decisions, significantly enhancing factory optimization and sustainability. This modernizes factory planning and resource allocation while maintaining environmental compliance. The predictive component has boosted production efficiency, leading to cost savings and reduced environmental impact by minimizing air pollution.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
Multistage Stochastic Optimization via Kernels
Authors:
Dimitris Bertsimas,
Kimberly Villalobos Carballo
Abstract:
We develop a non-parametric, data-driven, tractable approach for solving multistage stochastic optimization problems in which decisions do not affect the uncertainty. The proposed framework represents the decision variables as elements of a reproducing kernel Hilbert space and performs functional stochastic gradient descent to minimize the empirical regularized loss. By incorporating sparsificatio…
▽ More
We develop a non-parametric, data-driven, tractable approach for solving multistage stochastic optimization problems in which decisions do not affect the uncertainty. The proposed framework represents the decision variables as elements of a reproducing kernel Hilbert space and performs functional stochastic gradient descent to minimize the empirical regularized loss. By incorporating sparsification techniques based on function subspace projections we are able to overcome the computational complexity that standard kernel methods introduce as the data size increases. We prove that the proposed approach is asymptotically optimal for multistage stochastic optimization with side information. Across various computational experiments on stochastic inventory management problems, {our method performs well in multidimensional settings} and remains tractable when the data size is large. Lastly, by computing lower bounds for the optimal loss of the inventory control problem, we show that the proposed method produces decision rules with near-optimal average performance.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.
-
Global Flood Prediction: a Multimodal Machine Learning Approach
Authors:
Cynthia Zeng,
Dimitris Bertsimas
Abstract:
Flooding is one of the most destructive and costly natural disasters, and climate changes would further increase risks globally. This work presents a novel multimodal machine learning approach for multi-year global flood risk prediction, combining geographical information and historical natural disaster dataset. Our multimodal framework employs state-of-the-art processing techniques to extract emb…
▽ More
Flooding is one of the most destructive and costly natural disasters, and climate changes would further increase risks globally. This work presents a novel multimodal machine learning approach for multi-year global flood risk prediction, combining geographical information and historical natural disaster dataset. Our multimodal framework employs state-of-the-art processing techniques to extract embeddings from each data modality, including text-based geographical data and tabular-based time-series data. Experiments demonstrate that a multimodal approach, that is combining text and statistical data, outperforms a single-modality approach. Our most advanced architecture, employing embeddings extracted using transfer learning upon DistilBert model, achieves 75\%-77\% ROCAUC score in predicting the next 1-5 year flooding event in historically flooded locations. This work demonstrates the potentials of using machine learning for long-term planning in natural disaster management.
△ Less
Submitted 29 January, 2023;
originally announced January 2023.
-
Distributionally Robust Causal Inference with Observational Data
Authors:
Dimitris Bertsimas,
Kosuke Imai,
Michael Lingzhi Li
Abstract:
We consider the estimation of average treatment effects in observational studies and propose a new framework of robust causal inference with unobserved confounders. Our approach is based on distributionally robust optimization and proceeds in two steps. We first specify the maximal degree to which the distribution of unobserved potential outcomes may deviate from that of observed outcomes. We then…
▽ More
We consider the estimation of average treatment effects in observational studies and propose a new framework of robust causal inference with unobserved confounders. Our approach is based on distributionally robust optimization and proceeds in two steps. We first specify the maximal degree to which the distribution of unobserved potential outcomes may deviate from that of observed outcomes. We then derive sharp bounds on the average treatment effects under this assumption. Our framework encompasses the popular marginal sensitivity model as a special case, and we demonstrate how the proposed methodology can address a primary challenge of the marginal sensitivity model that it produces uninformative results when unobserved confounders substantially affect treatment and outcome. Specifically, we develop an alternative sensitivity model, called the distributional sensitivity model, under the assumption that heterogeneity of treatment effect due to unobserved variables is relatively small. Unlike the marginal sensitivity model, the distributional sensitivity model allows for potential lack of overlap and often produces informative bounds even when unobserved variables substantially affect both treatment and outcome. Finally, we show how to extend the distributional sensitivity model to difference-in-differences designs and settings with instrumental variables. Through simulation and empirical studies, we demonstrate the applicability of the proposed methodology.
△ Less
Submitted 2 February, 2023; v1 submitted 15 October, 2022;
originally announced October 2022.
-
TabText: A Flexible and Contextual Approach to Tabular Data Representation
Authors:
Kimberly Villalobos Carballo,
Liangyuan Na,
Yu Ma,
Léonard Boussioux,
Cynthia Zeng,
Luis R. Soenksen,
Dimitris Bertsimas
Abstract:
Tabular data is essential for applying machine learning tasks across various industries. However, traditional data processing methods do not fully utilize all the information available in the tables, ignoring important contextual information such as column header descriptions. In addition, pre-processing data into a tabular format can remain a labor-intensive bottleneck in model development. This…
▽ More
Tabular data is essential for applying machine learning tasks across various industries. However, traditional data processing methods do not fully utilize all the information available in the tables, ignoring important contextual information such as column header descriptions. In addition, pre-processing data into a tabular format can remain a labor-intensive bottleneck in model development. This work introduces TabText, a processing and feature extraction framework that extracts contextual information from tabular data structures. TabText addresses processing difficulties by converting the content into language and utilizing pre-trained large language models (LLMs). We evaluate our framework on nine healthcare prediction tasks ranging from patient discharge, ICU admission, and mortality. We show that 1) applying our TabText framework enables the generation of high-performing and simple machine learning baseline models with minimal data pre-processing, and 2) augmenting pre-processed tabular data with TabText representations improves the average and worst-case AUC performance of standard machine learning models by as much as 6%.
△ Less
Submitted 21 July, 2023; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization
Authors:
Dimitris Bertsimas,
Wes Gurnee
Abstract:
Discovering governing equations of complex dynamical systems directly from data is a central problem in scientific machine learning. In recent years, the sparse identification of nonlinear dynamics (SINDy) framework, powered by heuristic sparse regression methods, has become a dominant tool for learning parsimonious models. We propose an exact formulation of the SINDy problem using mixed-integer o…
▽ More
Discovering governing equations of complex dynamical systems directly from data is a central problem in scientific machine learning. In recent years, the sparse identification of nonlinear dynamics (SINDy) framework, powered by heuristic sparse regression methods, has become a dominant tool for learning parsimonious models. We propose an exact formulation of the SINDy problem using mixed-integer optimization (MIO) to solve the sparsity constrained regression problem to provable optimality in seconds. On a large number of canonical ordinary and partial differential equations, we illustrate the dramatic improvement of our approach in accurate model discovery while being more sample efficient, robust to noise, and flexible in accommodating physical constraints.
△ Less
Submitted 31 May, 2022;
originally announced June 2022.
-
Integrated multimodal artificial intelligence framework for healthcare applications
Authors:
Luis R. Soenksen,
Yu Ma,
Cynthia Zeng,
Leonard D. J. Boussioux,
Kimberly Villalobos Carballo,
Liangyuan Na,
Holly M. Wiberg,
Michael L. Li,
Ignacio Fuentes,
Dimitris Bertsimas
Abstract:
Artificial intelligence (AI) systems hold great promise to improve healthcare over the next decades. Specifically, AI systems leveraging multiple data sources and input modalities are poised to become a viable method to deliver more accurate results and deployable pipelines across a wide range of applications. In this work, we propose and evaluate a unified Holistic AI in Medicine (HAIM) framework…
▽ More
Artificial intelligence (AI) systems hold great promise to improve healthcare over the next decades. Specifically, AI systems leveraging multiple data sources and input modalities are poised to become a viable method to deliver more accurate results and deployable pipelines across a wide range of applications. In this work, we propose and evaluate a unified Holistic AI in Medicine (HAIM) framework to facilitate the generation and testing of AI systems that leverage multimodal inputs. Our approach uses generalizable data pre-processing and machine learning modeling stages that can be readily adapted for research and deployment in healthcare environments. We evaluate our HAIM framework by training and characterizing 14,324 independent models based on HAIM-MIMIC-MM, a multimodal clinical database (N=34,537 samples) containing 7,279 unique hospitalizations and 6,485 patients, spanning all possible input combinations of 4 data modalities (i.e., tabular, time-series, text, and images), 11 unique data sources and 12 predictive tasks. We show that this framework can consistently and robustly produce models that outperform similar single-source approaches across various healthcare demonstrations (by 6-33%), including 10 distinct chest pathology diagnoses, along with length-of-stay and 48-hour mortality predictions. We also quantify the contribution of each modality and data source using Shapley values, which demonstrates the heterogeneity in data modality importance and the necessity of multimodal inputs across different healthcare-relevant tasks. The generalizable properties and flexibility of our Holistic AI in Medicine (HAIM) framework could offer a promising pathway for future multimodal predictive systems in clinical and operational healthcare settings.
△ Less
Submitted 26 September, 2022; v1 submitted 25 February, 2022;
originally announced February 2022.
-
Robust Upper Bounds for Adversarial Training
Authors:
Dimitris Bertsimas,
Xavier Boix,
Kimberly Villalobos Carballo,
Dick den Hertog
Abstract:
Many state-of-the-art adversarial training methods for deep learning leverage upper bounds of the adversarial loss to provide security guarantees against adversarial attacks. Yet, these methods rely on convex relaxations to propagate lower and upper bounds for intermediate layers, which affect the tightness of the bound at the output layer. We introduce a new approach to adversarial training by mi…
▽ More
Many state-of-the-art adversarial training methods for deep learning leverage upper bounds of the adversarial loss to provide security guarantees against adversarial attacks. Yet, these methods rely on convex relaxations to propagate lower and upper bounds for intermediate layers, which affect the tightness of the bound at the output layer. We introduce a new approach to adversarial training by minimizing an upper bound of the adversarial loss that is based on a holistic expansion of the network instead of separate bounds for each layer. This bound is facilitated by state-of-the-art tools from Robust Optimization; it has closed-form and can be effectively trained using backpropagation. We derive two new methods with the proposed approach. The first method (Approximated Robust Upper Bound or aRUB) uses the first order approximation of the network as well as basic tools from Linear Robust Optimization to obtain an empirical upper bound of the adversarial loss that can be easily implemented. The second method (Robust Upper Bound or RUB), computes a provable upper bound of the adversarial loss. Across a variety of tabular and vision data sets we demonstrate the effectiveness of our approach -- RUB is substantially more robust than state-of-the-art methods for larger perturbations, while aRUB matches the performance of state-of-the-art methods for small perturbations.
△ Less
Submitted 5 April, 2023; v1 submitted 16 December, 2021;
originally announced December 2021.
-
Mixed-Integer Optimization with Constraint Learning
Authors:
Donato Maragno,
Holly Wiberg,
Dimitris Bertsimas,
S. Ilker Birbil,
Dick den Hertog,
Adejuyigbe Fajemisin
Abstract:
We establish a broad methodological foundation for mixed-integer optimization with learned constraints. We propose an end-to-end pipeline for data-driven decision making in which constraints and objectives are directly learned from data using machine learning, and the trained models are embedded in an optimization formulation. We exploit the mixed-integer optimization-representability of many mach…
▽ More
We establish a broad methodological foundation for mixed-integer optimization with learned constraints. We propose an end-to-end pipeline for data-driven decision making in which constraints and objectives are directly learned from data using machine learning, and the trained models are embedded in an optimization formulation. We exploit the mixed-integer optimization-representability of many machine learning methods, including linear models, decision trees, ensembles, and multi-layer perceptrons, which allows us to capture various underlying relationships between decisions, contextual variables, and outcomes. We also introduce two approaches for handling the inherent uncertainty of learning from data. First, we characterize a decision trust region using the convex hull of the observations, to ensure credible recommendations and avoid extrapolation. We efficiently incorporate this representation using column generation and propose a more flexible formulation to deal with low-density regions and high-dimensional datasets. Then, we propose an ensemble learning approach that enforces constraint satisfaction over multiple bootstrapped estimators or multiple algorithms. In combination with domain-driven components, the embedded models and trust region define a mixed-integer optimization problem for prescription generation. We implement this framework as a Python package (OptiCL) for practitioners. We demonstrate the method in both World Food Programme planning and chemotherapy optimization. The case studies illustrate the framework's ability to generate high-quality prescriptions as well as the value added by the trust region, the use of ensembles to control model robustness, the consideration of multiple machine learning methods, and the inclusion of multiple learned constraints.
△ Less
Submitted 26 October, 2023; v1 submitted 4 November, 2021;
originally announced November 2021.
-
Holistic Deep Learning
Authors:
Dimitris Bertsimas,
Kimberly Villalobos Carballo,
Léonard Boussioux,
Michael Lingzhi Li,
Alex Paskov,
Ivan Paskov
Abstract:
This paper presents a novel holistic deep learning framework that simultaneously addresses the challenges of vulnerability to input perturbations, overparametrization, and performance instability from different train-validation splits. The proposed framework holistically improves accuracy, robustness, sparsity, and stability over standard deep learning models, as demonstrated by extensive experime…
▽ More
This paper presents a novel holistic deep learning framework that simultaneously addresses the challenges of vulnerability to input perturbations, overparametrization, and performance instability from different train-validation splits. The proposed framework holistically improves accuracy, robustness, sparsity, and stability over standard deep learning models, as demonstrated by extensive experiments on both tabular and image data sets. The results are further validated by ablation experiments and SHAP value analysis, which reveal the interactions and trade-offs between the different evaluation metrics. To support practitioners applying our framework, we provide a prescriptive approach that offers recommendations for selecting an appropriate training loss function based on their specific objectives. All the code to reproduce the results can be found at https://github.com/kimvc7/HDL.
△ Less
Submitted 20 March, 2023; v1 submitted 29 October, 2021;
originally announced October 2021.
-
Sparse Plus Low Rank Matrix Decomposition: A Discrete Optimization Approach
Authors:
Dimitris Bertsimas,
Ryan Cory-Wright,
Nicholas A. G. Johnson
Abstract:
We study the Sparse Plus Low-Rank decomposition problem (SLR), which is the problem of decomposing a corrupted data matrix into a sparse matrix of perturbations plus a low-rank matrix containing the ground truth. SLR is a fundamental problem in Operations Research and Machine Learning which arises in various applications, including data compression, latent semantic indexing, collaborative filterin…
▽ More
We study the Sparse Plus Low-Rank decomposition problem (SLR), which is the problem of decomposing a corrupted data matrix into a sparse matrix of perturbations plus a low-rank matrix containing the ground truth. SLR is a fundamental problem in Operations Research and Machine Learning which arises in various applications, including data compression, latent semantic indexing, collaborative filtering, and medical imaging. We introduce a novel formulation for SLR that directly models its underlying discreteness. For this formulation, we develop an alternating minimization heuristic that computes high-quality solutions and a novel semidefinite relaxation that provides meaningful bounds for the solutions returned by our heuristic. We also develop a custom branch-and-bound algorithm that leverages our heuristic and convex relaxations to solve small instances of SLR to certifiable (near) optimality. Given an input $n$-by-$n$ matrix, our heuristic scales to solve instances where $n=10000$ in minutes, our relaxation scales to instances where $n=200$ in hours, and our branch-and-bound algorithm scales to instances where $n=25$ in minutes. Our numerical results demonstrate that our approach outperforms existing state-of-the-art approaches in terms of rank, sparsity, and mean-square error while maintaining a comparable runtime.
△ Less
Submitted 1 October, 2023; v1 submitted 26 September, 2021;
originally announced September 2021.
-
The Price of Diversity
Authors:
Hari Bandi,
Dimitris Bertsimas
Abstract:
Systemic bias with respect to gender, race and ethnicity, often unconscious, is prevalent in datasets involving choices among individuals. Consequently, society has found it challenging to alleviate bias and achieve diversity in a way that maintains meritocracy in such settings. We propose (a) a novel optimization approach based on optimally flipping outcome labels and training classification mode…
▽ More
Systemic bias with respect to gender, race and ethnicity, often unconscious, is prevalent in datasets involving choices among individuals. Consequently, society has found it challenging to alleviate bias and achieve diversity in a way that maintains meritocracy in such settings. We propose (a) a novel optimization approach based on optimally flipping outcome labels and training classification models simultaneously to discover changes to be made in the selection process so as to achieve diversity without significantly affecting meritocracy, and (b) a novel implementation tool employing optimal classification trees to provide insights on which attributes of individuals lead to flipping of their labels, and to help make changes in the current selection processes in a manner understandable by human decision makers. We present case studies on three real-world datasets consisting of parole, admissions to the bar and lending decisions, and demonstrate that the price of diversity is low and sometimes negative, that is we can modify our selection processes in a way that enhances diversity without affecting meritocracy significantly, and sometimes improving it.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
Algorithmic Insurance
Authors:
Dimitris Bertsimas,
Agni Orfanoudaki
Abstract:
As machine learning algorithms start to get integrated into the decision-making process of companies and organizations, insurance products are being developed to protect their owners from liability risk. Algorithmic liability differs from human liability since it is based on a single model compared to multiple heterogeneous decision-makers and its performance is known a priori for a given set of d…
▽ More
As machine learning algorithms start to get integrated into the decision-making process of companies and organizations, insurance products are being developed to protect their owners from liability risk. Algorithmic liability differs from human liability since it is based on a single model compared to multiple heterogeneous decision-makers and its performance is known a priori for a given set of data. Traditional actuarial tools for human liability do not take these properties into consideration, primarily focusing on the distribution of historical claims. We propose, for the first time, a quantitative framework to estimate the risk exposure of insurance contracts for machine-driven liability, introducing the concept of algorithmic insurance. Specifically, we present an optimization formulation to estimate the risk exposure of a binary classification model given a pre-defined range of premiums. We adjust the formulation to account for uncertainty in the resulting losses using robust optimization. Our approach outlines how properties of the model, such as accuracy, interpretability, and generalizability, can influence the insurance contract evaluation. To showcase a practical implementation of the proposed framework, we present a case study of medical malpractice in the context of breast cancer detection. Our analysis focuses on measuring the effect of the model parameters on the expected financial loss and identifying the aspects of algorithmic performance that predominantly affect the risk of the contract.
△ Less
Submitted 14 December, 2022; v1 submitted 1 June, 2021;
originally announced June 2021.
-
A new perspective on low-rank optimization
Authors:
Dimitris Bertsimas,
Ryan Cory-Wright,
Jean Pauphilet
Abstract:
A key question in many low-rank problems throughout optimization, machine learning, and statistics is to characterize the convex hulls of simple low-rank sets and judiciously apply these convex hulls to obtain strong yet computationally tractable convex relaxations. We invoke the matrix perspective function - the matrix analog of the perspective function - and characterize explicitly the convex hu…
▽ More
A key question in many low-rank problems throughout optimization, machine learning, and statistics is to characterize the convex hulls of simple low-rank sets and judiciously apply these convex hulls to obtain strong yet computationally tractable convex relaxations. We invoke the matrix perspective function - the matrix analog of the perspective function - and characterize explicitly the convex hull of epigraphs of simple matrix convex functions under low-rank constraints. Further, we combine the matrix perspective function with orthogonal projection matrices-the matrix analog of binary variables which capture the row-space of a matrix-to develop a matrix perspective reformulation technique that reliably obtains strong relaxations for a variety of low-rank problems, including reduced rank regression, non-negative matrix factorization, and factor analysis. Moreover, we establish that these relaxations can be modeled via semidefinite constraints and thus optimized over tractably. The proposed approach parallels and generalizes the perspective reformulation technique in mixed-integer optimization and leads to new relaxations for a broad class of problems.
△ Less
Submitted 2 March, 2022; v1 submitted 12 May, 2021;
originally announced May 2021.
-
Simple Imputation Rules for Prediction with Missing Data: Contrasting Theoretical Guarantees with Empirical Performance
Authors:
Dimitris Bertsimas,
Arthur Delarue,
Jean Pauphilet
Abstract:
Missing data is a common issue in real-world datasets. This paper studies the performance of impute-then-regress pipelines by contrasting theoretical and empirical evidence. We establish the asymptotic consistency of such pipelines for a broad family of imputation methods. While common sense suggests that a `good' imputation method produces datasets that are plausible, we show, on the contrary, th…
▽ More
Missing data is a common issue in real-world datasets. This paper studies the performance of impute-then-regress pipelines by contrasting theoretical and empirical evidence. We establish the asymptotic consistency of such pipelines for a broad family of imputation methods. While common sense suggests that a `good' imputation method produces datasets that are plausible, we show, on the contrary, that, as far as prediction is concerned, crude can be good. Among others, we find that mode-impute is asymptotically sub-optimal, while mean-impute is asymptotically optimal. We then exhaustively assess the validity of these theoretical conclusions on a large corpus of synthetic, semi-real, and real datasets. While the empirical evidence we collect mostly supports our theoretical findings, it also highlights gaps between theory and practice and opportunities for future research, regarding the relevance of the MAR assumption, the complex interdependency between the imputation and regression tasks, and the need for realistic synthetic data generation models.
△ Less
Submitted 2 February, 2024; v1 submitted 7 April, 2021;
originally announced April 2021.
-
Slowly Varying Regression under Sparsity
Authors:
Dimitris Bertsimas,
Vassilis Digalakis Jr,
Michael Linghzi Li,
Omar Skali Lami
Abstract:
We present the framework of slowly varying regression under sparsity, allowing sparse regression models to exhibit slow and sparse variations. The problem of parameter estimation is formulated as a mixed-integer optimization problem. We demonstrate that it can be precisely reformulated as a binary convex optimization problem through a novel relaxation technique. This relaxation involves a new equa…
▽ More
We present the framework of slowly varying regression under sparsity, allowing sparse regression models to exhibit slow and sparse variations. The problem of parameter estimation is formulated as a mixed-integer optimization problem. We demonstrate that it can be precisely reformulated as a binary convex optimization problem through a novel relaxation technique. This relaxation involves a new equality on Moore-Penrose inverses, convexifying the non-convex objective function while matching the original objective on all feasible binary points. This enables us to efficiently solve the problem to provable optimality using a cutting plane-type algorithm. We develop a highly optimized implementation of this algorithm, substantially improving upon the asymptotic computational complexity of a straightforward implementation. Additionally, we propose a fast heuristic method that guarantees a feasible solution and, as empirically illustrated, produces high-quality warm-start solutions for the binary optimization problem. To tune the framework's hyperparameters, we suggest a practical procedure relying on binary search that, under certain assumptions, is guaranteed to recover the true model parameters. On both synthetic and real-world datasets, we demonstrate that the resulting algorithm outperforms competing formulations in comparable times across various metrics, including estimation accuracy, predictive power, and computational time. The algorithm is highly scalable, allowing us to train models with thousands of parameters. Our implementation is available open-source at https://github.com/vvdigalakis/SSVRegression.git.
△ Less
Submitted 11 November, 2023; v1 submitted 21 February, 2021;
originally announced February 2021.
-
Optimal Survival Trees
Authors:
Dimitris Bertsimas,
Jack Dunn,
Emma Gibson,
Agni Orfanoudaki
Abstract:
Tree-based models are increasingly popular due to their ability to identify complex relationships that are beyond the scope of parametric models. Survival tree methods adapt these models to allow for the analysis of censored outcomes, which often appear in medical data. We present a new Optimal Survival Trees algorithm that leverages mixed-integer optimization (MIO) and local search techniques to…
▽ More
Tree-based models are increasingly popular due to their ability to identify complex relationships that are beyond the scope of parametric models. Survival tree methods adapt these models to allow for the analysis of censored outcomes, which often appear in medical data. We present a new Optimal Survival Trees algorithm that leverages mixed-integer optimization (MIO) and local search techniques to generate globally optimized survival tree models. We demonstrate that the OST algorithm improves on the accuracy of existing survival tree methods, particularly in large datasets.
△ Less
Submitted 8 December, 2020;
originally announced December 2020.
-
Hurricane Forecasting: A Novel Multimodal Machine Learning Framework
Authors:
Léonard Boussioux,
Cynthia Zeng,
Théo Guénais,
Dimitris Bertsimas
Abstract:
This paper describes a novel machine learning (ML) framework for tropical cyclone intensity and track forecasting, combining multiple ML techniques and utilizing diverse data sources. Our multimodal framework, called Hurricast, efficiently combines spatial-temporal data with statistical data by extracting features with deep-learning encoder-decoder architectures and predicting with gradient-booste…
▽ More
This paper describes a novel machine learning (ML) framework for tropical cyclone intensity and track forecasting, combining multiple ML techniques and utilizing diverse data sources. Our multimodal framework, called Hurricast, efficiently combines spatial-temporal data with statistical data by extracting features with deep-learning encoder-decoder architectures and predicting with gradient-boosted trees. We evaluate our models in the North Atlantic and Eastern Pacific basins on 2016-2019 for 24-hour lead time track and intensity forecasts and show they achieve comparable mean absolute error and skill to current operational forecast models while computing in seconds. Furthermore, the inclusion of Hurricast into an operational forecast consensus model could improve over the National Hurricane Center's official forecast, thus highlighting the complementary properties with existing approaches. In summary, our work demonstrates that utilizing machine learning techniques to combine different data sources can lead to new opportunities in tropical cyclone forecasting.
△ Less
Submitted 24 September, 2022; v1 submitted 11 November, 2020;
originally announced November 2020.
-
Mixed-Projection Conic Optimization: A New Paradigm for Modeling Rank Constraints
Authors:
Dimitris Bertsimas,
Ryan Cory-Wright,
Jean Pauphilet
Abstract:
We propose a framework for modeling and solving low-rank optimization problems to certifiable optimality. We introduce symmetric projection matrices that satisfy $Y^2=Y$, the matrix analog of binary variables that satisfy $z^2=z$, to model rank constraints. By leveraging regularization and strong duality, we prove that this modeling paradigm yields tractable convex optimization problems over the n…
▽ More
We propose a framework for modeling and solving low-rank optimization problems to certifiable optimality. We introduce symmetric projection matrices that satisfy $Y^2=Y$, the matrix analog of binary variables that satisfy $z^2=z$, to model rank constraints. By leveraging regularization and strong duality, we prove that this modeling paradigm yields tractable convex optimization problems over the non-convex set of orthogonal projection matrices. Furthermore, we design outer-approximation algorithms to solve low-rank problems to certifiable optimality, compute lower bounds via their semidefinite relaxations, and provide near-optimal solutions through rounding and local search techniques. We implement these numerical ingredients and, for the first time, solve low-rank optimization problems to certifiable optimality. Using currently available spatial branch-and-bound codes, not tailored to projection matrices, we can scale our exact (resp. near-exact) algorithms to matrices with up to 30 (resp. 600) rows/columns. Our algorithms also supply certifiably near-optimal solutions for larger problem sizes and outperform existing heuristics, by deriving an alternative to the popular nuclear norm relaxation which generalizes the perspective relaxation from vectors to matrices. All in all, our framework, which we name Mixed-Projection Conic Optimization, solves low-rank problems to certifiable optimality in a tractable and unified fashion.
△ Less
Submitted 2 April, 2021; v1 submitted 22 September, 2020;
originally announced September 2020.
-
Frequency Estimation in Data Streams: Learning the Optimal Hashing Scheme
Authors:
Dimitris Bertsimas,
Vassilis Digalakis Jr
Abstract:
We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. Contrary to state-of-the-art streaming frequency estimation algorithms, which heavily rely on random hashing to maintain the frequency distribution of the data steam using limited storage, the proposed approach exploits an observed stream prefix to near-optimally…
▽ More
We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. Contrary to state-of-the-art streaming frequency estimation algorithms, which heavily rely on random hashing to maintain the frequency distribution of the data steam using limited storage, the proposed approach exploits an observed stream prefix to near-optimally hash elements and compress the target frequency distribution. We develop an exact mixed-integer linear optimization formulation, which enables us to compute optimal or near-optimal hashing schemes for elements seen in the observed stream prefix; then, we use machine learning to hash unseen elements. Further, we develop an efficient block coordinate descent algorithm, which, as we empirically show, produces high quality solutions, and, in a special case, we are able to solve the proposed formulation exactly in linear time using dynamic programming. We empirically evaluate the proposed approach both on synthetic datasets and on real-world search query data. We show that the proposed approach outperforms existing approaches by one to two orders of magnitude in terms of its average (per element) estimation error and by 45-90% in terms of its expected magnitude of estimation error.
△ Less
Submitted 2 June, 2021; v1 submitted 17 July, 2020;
originally announced July 2020.
-
The Backbone Method for Ultra-High Dimensional Sparse Machine Learning
Authors:
Dimitris Bertsimas,
Vassilis Digalakis Jr
Abstract:
We present the backbone method, a generic framework that enables sparse and interpretable supervised machine learning methods to scale to ultra-high dimensional problems. We solve sparse regression problems with $10^7$ features in minutes and $10^8$ features in hours, as well as decision tree problems with $10^5$ features in minutes.The proposed method operates in two phases: we first determine th…
▽ More
We present the backbone method, a generic framework that enables sparse and interpretable supervised machine learning methods to scale to ultra-high dimensional problems. We solve sparse regression problems with $10^7$ features in minutes and $10^8$ features in hours, as well as decision tree problems with $10^5$ features in minutes.The proposed method operates in two phases: we first determine the backbone set, consisting of potentially relevant features, by solving a number of tractable subproblems; then, we solve a reduced problem, considering only the backbone features. For the sparse regression problem, our theoretical analysis shows that, under certain assumptions and with high probability, the backbone set consists of the truly relevant features. Numerical experiments on both synthetic and real-world datasets demonstrate that our method outperforms or competes with state-of-the-art methods in ultra-high dimensional problems, and competes with optimal solutions in problems where exact methods scale, both in terms of recovering the truly relevant features and in its out-of-sample predictive performance.
△ Less
Submitted 14 October, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality
Authors:
Dimitris Bertsimas,
Ryan Cory-Wright,
Jean Pauphilet
Abstract:
Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than $p=100s$ of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem,…
▽ More
Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than $p=100s$ of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality at the scale of selecting k=5 covariates from p=300 variables, and provides small bound gaps at a larger scale. We also propose a convex relaxation and greedy rounding scheme that provides bound gaps of $1-2\%$ in practice within minutes for $p=100$s or hours for $p=1,000$s and is therefore a viable alternative to the exact method at scale. Using real-world financial and medical datasets, we illustrate our approach's ability to derive interpretable principal components tractably at scale.
△ Less
Submitted 25 August, 2021; v1 submitted 11 May, 2020;
originally announced May 2020.
-
Learning Mixed-Integer Convex Optimization Strategies for Robot Planning and Control
Authors:
A. Cauligi,
P. Culbertson,
B. Stellato,
D. Bertsimas,
M. Schwager,
M. Pavone
Abstract:
Mixed-integer convex programming (MICP) has seen significant algorithmic and hardware improvements with several orders of magnitude solve time speedups compared to 25 years ago. Despite these advances, MICP has been rarely applied to real-world robotic control because the solution times are still too slow for online applications. In this work, we present the CoCo (Combinatorial Offline, Convex Onl…
▽ More
Mixed-integer convex programming (MICP) has seen significant algorithmic and hardware improvements with several orders of magnitude solve time speedups compared to 25 years ago. Despite these advances, MICP has been rarely applied to real-world robotic control because the solution times are still too slow for online applications. In this work, we present the CoCo (Combinatorial Offline, Convex Online) framework to solve MICPs arising in robotics at very high speed. CoCo encodes the combinatorial part of the optimal solution into a strategy. Using data collected from offline problem solutions, we train a multiclass classifier to predict the optimal strategy given problem-specific parameters such as states or obstacles. Compared to previous approaches, we use task-specific strategies and prune redundant ones to significantly reduce the number of classes the predictor has to select from, thereby greatly improving scalability. Given the predicted strategy, the control task becomes a small convex optimization problem that we can solve in milliseconds. Numerical experiments on a cart-pole system with walls, a free-flying space robot, and task-oriented grasps show that our method provides not only 1 to 2 orders of magnitude speedups compared to state-of-the-art solvers but also performance close to the globally optimal MICP solution.
△ Less
Submitted 11 April, 2022; v1 submitted 7 April, 2020;
originally announced April 2020.
-
Fast Exact Matrix Completion: A Unified Optimization Framework for Matrix Completion
Authors:
Dimitris Bertsimas,
Michael Lingzhi Li
Abstract:
We formulate the problem of matrix completion with and without side information as a non-convex optimization problem. We design fastImpute based on non-convex gradient descent and show it converges to a global minimum that is guaranteed to recover closely the underlying matrix while it scales to matrices of sizes beyond $10^5 \times 10^5$. We report experiments on both synthetic and real-world dat…
▽ More
We formulate the problem of matrix completion with and without side information as a non-convex optimization problem. We design fastImpute based on non-convex gradient descent and show it converges to a global minimum that is guaranteed to recover closely the underlying matrix while it scales to matrices of sizes beyond $10^5 \times 10^5$. We report experiments on both synthetic and real-world datasets that show fastImpute is competitive in both the accuracy of the matrix recovered and the time needed across all cases. Furthermore, when a high number of entries are missing, fastImpute is over $75\%$ lower in MAPE and $15$ times faster than current state-of-the-art matrix completion methods in both the case with side information and without.
△ Less
Submitted 31 December, 2020; v1 submitted 20 October, 2019;
originally announced October 2019.
-
Personalized Treatment for Coronary Artery Disease Patients: A Machine Learning Approach
Authors:
Dimitris Bertsimas,
Agni Orfanoudaki,
Rory B. Weiner
Abstract:
Current clinical practice guidelines for managing Coronary Artery Disease (CAD) account for general cardiovascular risk factors. However, they do not present a framework that considers personalized patient-specific characteristics. Using the electronic health records of 21,460 patients, we created data-driven models for personalized CAD management that significantly improve health outcomes relativ…
▽ More
Current clinical practice guidelines for managing Coronary Artery Disease (CAD) account for general cardiovascular risk factors. However, they do not present a framework that considers personalized patient-specific characteristics. Using the electronic health records of 21,460 patients, we created data-driven models for personalized CAD management that significantly improve health outcomes relative to the standard of care. We develop binary classifiers to detect whether a patient will experience an adverse event due to CAD within a 10-year time frame. Combining the patients' medical history and clinical examination results, we achieve 81.5% AUC. For each treatment, we also create a series of regression models that are based on different supervised machine learning algorithms. We are able to estimate with average R squared = 0.801 the time from diagnosis to a potential adverse event (TAE) and gain accurate approximations of the counterfactual treatment effects. Leveraging combinations of these models, we present ML4CAD, a novel personalized prescriptive algorithm. Considering the recommendations of multiple predictive models at once, ML4CAD identifies for every patient the therapy with the best expected outcome using a voting mechanism. We evaluate its performance by measuring the prescription effectiveness and robustness under alternative ground truths. We show that our methodology improves the expected TAE upon the current baseline by 24.11%, increasing it from 4.56 to 5.66 years. The algorithm performs particularly well for the male (24.3% improvement) and Hispanic (58.41% improvement) subpopulations. Finally, we create an interactive interface, providing physicians with an intuitive, accurate, readily implementable, and effective tool.
△ Less
Submitted 18 October, 2019;
originally announced October 2019.
-
On Polyhedral and Second-Order Cone Decompositions of Semidefinite Optimization Problems
Authors:
Dimitris Bertsimas,
Ryan Cory-Wright
Abstract:
We study a cutting-plane method for semidefinite optimization problems (SDOs), and supply a proof of the method's convergence, under a boundedness assumption. By relating the method's rate of convergence to an initial outer approximation's diameter, we argue that the method performs well when initialized with a second-order-cone approximation, instead of a linear approximation. We invoke the metho…
▽ More
We study a cutting-plane method for semidefinite optimization problems (SDOs), and supply a proof of the method's convergence, under a boundedness assumption. By relating the method's rate of convergence to an initial outer approximation's diameter, we argue that the method performs well when initialized with a second-order-cone approximation, instead of a linear approximation. We invoke the method to provide bound gaps of 0.5-6.5% for sparse PCA problems with $1000$s of covariates, and solve nuclear norm problems over 500x500 matrices.
△ Less
Submitted 2 December, 2019; v1 submitted 7 October, 2019;
originally announced October 2019.
-
Dynamic optimization with side information
Authors:
Dimitris Bertsimas,
Christopher McCord,
Bradley Sturt
Abstract:
We develop a tractable and flexible approach for incorporating side information into dynamic optimization under uncertainty. The proposed framework uses predictive machine learning methods (such as $k$-nearest neighbors, kernel regression, and random forests) to weight the relative importance of various data-driven uncertainty sets in a robust optimization formulation. Through a novel measure conc…
▽ More
We develop a tractable and flexible approach for incorporating side information into dynamic optimization under uncertainty. The proposed framework uses predictive machine learning methods (such as $k$-nearest neighbors, kernel regression, and random forests) to weight the relative importance of various data-driven uncertainty sets in a robust optimization formulation. Through a novel measure concentration result for a class of machine learning methods, we prove that the proposed approach is asymptotically optimal for multi-period stochastic programming with side information. We also describe a general-purpose approximation for these optimization problems, based on overlapping linear decision rules, which is computationally tractable and produces high-quality solutions for dynamic problems with many stages. Across a variety of examples in inventory management, finance, and shipment planning, our method achieves improvements of up to 15\% over alternatives and requires less than one minute of computation time on problems with twelve stages.
△ Less
Submitted 21 July, 2020; v1 submitted 16 July, 2019;
originally announced July 2019.
-
Optimal Explanations of Linear Models
Authors:
Dimitris Bertsimas,
Arthur Delarue,
Patrick Jaillet,
Sebastien Martin
Abstract:
When predictive models are used to support complex and important decisions, the ability to explain a model's reasoning can increase trust, expose hidden biases, and reduce vulnerability to adversarial attacks. However, attempts at interpreting models are often ad hoc and application-specific, and the concept of interpretability itself is not well-defined. We propose a general optimization framewor…
▽ More
When predictive models are used to support complex and important decisions, the ability to explain a model's reasoning can increase trust, expose hidden biases, and reduce vulnerability to adversarial attacks. However, attempts at interpreting models are often ad hoc and application-specific, and the concept of interpretability itself is not well-defined. We propose a general optimization framework to create explanations for linear models. Our methodology decomposes a linear model into a sequence of models of increasing complexity using coordinate updates on the coefficients. Computing this decomposition optimally is a difficult optimization problem for which we propose exact algorithms and scalable heuristics. By solving this problem, we can derive a parametrized family of interpretability metrics for linear models that generalizes typical proxies, and study the tradeoff between interpretability and predictive accuracy.
△ Less
Submitted 8 July, 2019;
originally announced July 2019.
-
The Price of Interpretability
Authors:
Dimitris Bertsimas,
Arthur Delarue,
Patrick Jaillet,
Sebastien Martin
Abstract:
When quantitative models are used to support decision-making on complex and important topics, understanding a model's ``reasoning'' can increase trust in its predictions, expose hidden biases, or reduce vulnerability to adversarial attacks. However, the concept of interpretability remains loosely defined and application-specific. In this paper, we introduce a mathematical framework in which machin…
▽ More
When quantitative models are used to support decision-making on complex and important topics, understanding a model's ``reasoning'' can increase trust in its predictions, expose hidden biases, or reduce vulnerability to adversarial attacks. However, the concept of interpretability remains loosely defined and application-specific. In this paper, we introduce a mathematical framework in which machine learning models are constructed in a sequence of interpretable steps. We show that for a variety of models, a natural choice of interpretable steps recovers standard interpretability proxies (e.g., sparsity in linear models). We then generalize these proxies to yield a parametrized family of consistent measures of model interpretability. This formal definition allows us to quantify the ``price'' of interpretability, i.e., the tradeoff with predictive accuracy. We demonstrate practical algorithms to apply our framework on real and synthetic datasets.
△ Less
Submitted 8 July, 2019;
originally announced July 2019.
-
Online Mixed-Integer Optimization in Milliseconds
Authors:
Dimitris Bertsimas,
Bartolomeo Stellato
Abstract:
We propose a method to solve online mixed-integer optimization (MIO) problems at very high speed using machine learning. By exploiting the repetitive nature of online optimization, we are able to greatly speedup the solution time. Our approach encodes the optimal solution into a small amount of information denoted as strategy using the Voice of Optimization framework proposed in [BS21]. In this wa…
▽ More
We propose a method to solve online mixed-integer optimization (MIO) problems at very high speed using machine learning. By exploiting the repetitive nature of online optimization, we are able to greatly speedup the solution time. Our approach encodes the optimal solution into a small amount of information denoted as strategy using the Voice of Optimization framework proposed in [BS21]. In this way the core part of the optimization algorithm becomes a multiclass classification problem which can be solved very quickly. In this work, we extend that framework to real-time and high-speed applications focusing on parametric mixed-integer quadratic optimization (MIQO). We propose an extremely fast online optimization algorithm consisting of a feedforward neural network (NN) evaluation and a linear system solution where the matrix has already been factorized. Therefore, this online approach does not require any solver nor iterative algorithm. We show the speed of the proposed method both in terms of total computations required and measured execution time. We estimate the number of floating point operations (flops) required to completely recover the optimal solution as a function of the problem dimensions. Compared to state-of-the-art MIO routines, the online running time of our method is very predictable and can be lower than a single matrix factorization time. We benchmark our method against the state-of-the-art solver Gurobi obtaining from two to three orders of magnitude speedups on examples from fuel cell energy management, sparse portfolio optimization and motion planning with obstacle avoidance.
△ Less
Submitted 22 March, 2021; v1 submitted 3 July, 2019;
originally announced July 2019.
-
A unified approach to mixed-integer optimization problems with logical constraints
Authors:
Dimitris Bertsimas,
Ryan Cory-Wright,
Jean Pauphilet
Abstract:
We propose a unified framework to address a family of classical mixed-integer optimization problems with logically constrained decision variables, including network design, facility location, unit commitment, sparse portfolio selection, binary quadratic optimization, sparse principal analysis and sparse learning problems. These problems exhibit logical relationships between continuous and discrete…
▽ More
We propose a unified framework to address a family of classical mixed-integer optimization problems with logically constrained decision variables, including network design, facility location, unit commitment, sparse portfolio selection, binary quadratic optimization, sparse principal analysis and sparse learning problems. These problems exhibit logical relationships between continuous and discrete variables, which are usually reformulated linearly using a big-M formulation. In this work, we challenge this longstanding modeling practice and express the logical constraints in a non-linear way. By imposing a regularization condition, we reformulate these problems as convex binary optimization problems, which are solvable using an outer-approximation procedure. In numerical experiments, we establish that a general-purpose numerical strategy, which combines cutting-plane, first-order and local search methods, solves these problems faster and at a larger scale than state-of-the-art mixed-integer linear or second-order cone methods. Our approach successfully solves network design problems with 100s of nodes and provides solutions up to 40\% better than the state-of-the-art; sparse portfolio selection problems with up to 3,200 securities compared with 400 securities for previous attempts; and sparse regression problems with up to 100,000 covariates.
△ Less
Submitted 25 January, 2021; v1 submitted 3 July, 2019;
originally announced July 2019.
-
Certifiably Optimal Sparse Inverse Covariance Estimation
Authors:
Dimitris Bertsimas,
Jourdain Lamperski,
Jean Pauphilet
Abstract:
We consider the maximum likelihood estimation of sparse inverse covariance matrices. We demonstrate that current heuristic approaches primarily encourage robustness, instead of the desired sparsity. We give a novel approach that solves the cardinality constrained likelihood problem to certifiable optimality. The approach uses techniques from mixed-integer optimization and convex optimization, and…
▽ More
We consider the maximum likelihood estimation of sparse inverse covariance matrices. We demonstrate that current heuristic approaches primarily encourage robustness, instead of the desired sparsity. We give a novel approach that solves the cardinality constrained likelihood problem to certifiable optimality. The approach uses techniques from mixed-integer optimization and convex optimization, and provides a high-quality solution with a guarantee on its suboptimality, even if the algorithm is terminated early. Using a variety of synthetic and real datasets, we demonstrate that our approach can solve problems where the dimension of the inverse covariance matrix is up to 1,000s. We also demonstrate that our approach produces significantly sparser solutions than Glasso and other popular learning procedures, makes less false discoveries, while still maintaining state-of-the-art accuracy.
△ Less
Submitted 24 June, 2019;
originally announced June 2019.
-
From Predictions to Prescriptions in Multistage Optimization Problems
Authors:
Dimitris Bertsimas,
Christopher McCord
Abstract:
In this paper, we introduce a framework for solving finite-horizon multistage optimization problems under uncertainty in the presence of auxiliary data. We assume the joint distribution of the uncertain quantities is unknown, but noisy observations, along with observations of auxiliary covariates, are available. We utilize effective predictive methods from machine learning (ML), including $k$-near…
▽ More
In this paper, we introduce a framework for solving finite-horizon multistage optimization problems under uncertainty in the presence of auxiliary data. We assume the joint distribution of the uncertain quantities is unknown, but noisy observations, along with observations of auxiliary covariates, are available. We utilize effective predictive methods from machine learning (ML), including $k$-nearest neighbors regression ($k$NN), classification and regression trees (CART), and random forests (RF), to develop specific methods that are applicable to a wide variety of problems. We demonstrate that our solution methods are asymptotically optimal under mild conditions. Additionally, we establish finite sample guarantees for the optimality of our method with $k$NN weight functions. Finally, we demonstrate the practicality of our approach with computational examples. We see a significant decrease in cost by taking into account the auxiliary data in the multistage setting.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.