subscribe to arXiv mailings

VISTA3D: Versatile Imaging SegmenTation and Annotation model for 3D Computed Tomography

Authors: Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu, Wenqi Li

Abstract: Segmentation foundation models have attracted great interest, however, none of them are adequate enough for the use cases in 3D computed tomography scans (CT) images. Existing works finetune on medical images with 2D foundation models trained on natural images, but interactive segmentation, especially in 2D, is too time-consuming for 3D scans and less useful for large cohort analysis. Models that… ▽ More Segmentation foundation models have attracted great interest, however, none of them are adequate enough for the use cases in 3D computed tomography scans (CT) images. Existing works finetune on medical images with 2D foundation models trained on natural images, but interactive segmentation, especially in 2D, is too time-consuming for 3D scans and less useful for large cohort analysis. Models that can perform out-of-the-box automatic segmentation are more desirable. However, the model trained in this way lacks the ability to perform segmentation on unseen objects like novel tumors. Thus for 3D medical image analysis, an ideal segmentation solution might expect two features: accurate out-of-the-box performance covering major organ classes, and effective adaptation or zero-shot ability to novel structures. In this paper, we discuss what features a 3D CT segmentation foundation model should have, and introduce VISTA3D, Versatile Imaging SegmenTation and Annotation model. The model is trained systematically on 11454 volumes encompassing 127 types of human anatomical structures and various lesions and provides accurate out-of-the-box segmentation. The model's design also achieves state-of-the-art zero-shot interactive segmentation in 3D. The novel model design and training recipe represent a promising step toward developing a versatile medical image foundation model. Code and model weights will be released shortly. The early version of online demo can be tried on https://build.nvidia.com/nvidia/vista-3d. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2404.17544 [pdf, ps, other]

Root-to-Leaf Scheduling in Write-Optimized Trees

Authors: Christopher Chung, William Jannen, Samuel McCauley, Bertrand Simon

Abstract: Write-optimized dictionaries are a class of cache-efficient data structures that buffer updates and apply them in batches to optimize the amortized cache misses per update. For example, a B^epsilon tree inserts updates as messages at the root. B^epsilon trees only move ("flush") messages when they have total size close to a cache line, optimizing the amount of work done per cache line written. Thu… ▽ More Write-optimized dictionaries are a class of cache-efficient data structures that buffer updates and apply them in batches to optimize the amortized cache misses per update. For example, a B^epsilon tree inserts updates as messages at the root. B^epsilon trees only move ("flush") messages when they have total size close to a cache line, optimizing the amount of work done per cache line written. Thus, recently-inserted messages reside at or near the root and are only flushed down the tree after a sufficient number of new messages arrive. Although this lazy approach works well for many operations, some types of updates do not complete until the update message reaches a leaf. For example, deferred queries and secure deletes must flush through all nodes along their root-to-leaf path before taking effect. What happens when we want to service a large number of (say) secure deletes as quickly as possible? Classic techniques leave us with an unsavory choice. On the one hand, we can group the delete messages using a write-optimized approach and move them down the tree lazily. But then many individual deletes may be left incomplete for an extended period of time, as their messages wait to be grouped with a sufficiently large number of related messages. On the other hand, we can ignore cache efficiency and perform a root-to-leaf flush for each delete. This begins work on individual deletes immediately, but harms system throughput. This paper investigates a new framework for efficiently flushing collections of messages from the root to their leaves in a write-optimized data structure. Our goal is to minimize the average time that messages reach the leaves. We give an algorithm that O(1)-approximates the optimal average completion time in this model. Along the way, we give a new 4-approximation algorithm for scheduling parallel tasks for weighted completion time with tree precedence constraints. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.12485 [pdf, other]

Contract Scheduling with Distributional and Multiple Advice

Authors: Spyros Angelopoulos, Marcin Bienkowski, Christoph Dürr, Bertrand Simon

Abstract: Contract scheduling is a widely studied framework for designing real-time systems with interruptible capabilities. Previous work has showed that a prediction on the interruption time can help improve the performance of contract-based systems, however it has relied on a single prediction that is provided by a deterministic oracle. In this work, we introduce and study more general and realistic lear… ▽ More Contract scheduling is a widely studied framework for designing real-time systems with interruptible capabilities. Previous work has showed that a prediction on the interruption time can help improve the performance of contract-based systems, however it has relied on a single prediction that is provided by a deterministic oracle. In this work, we introduce and study more general and realistic learning-augmented settings in which the prediction is in the form of a probability distribution, or it is given as a set of multiple possible interruption times. For both prediction settings, we design and analyze schedules which perform optimally if the prediction is accurate, while simultaneously guaranteeing the best worst-case performance if the prediction is adversarial. We also provide evidence that the resulting system is robust to prediction errors in the distributional setting. Last, we present an experimental evaluation that confirms the theoretical findings, and illustrates the performance improvements that can be attained in practice. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: To appear in Proceedings of IJCAI 2024

arXiv:2402.05764 [pdf]

Datastringer: easy dataset monitoring for journalists

Authors: Matt Shearer, Basile Simon, Clément Geiger

Abstract: We created a software enabling journalists to define a set of criteria they would like to see applied regularly to a constantly-updated dataset, sending them an alert when these criteria are met, thus signaling them that there may be a story to write. The main challenges were to keep the product scalable and powerful, while making sure that it could be used by journalists who would not possess all… ▽ More We created a software enabling journalists to define a set of criteria they would like to see applied regularly to a constantly-updated dataset, sending them an alert when these criteria are met, thus signaling them that there may be a story to write. The main challenges were to keep the product scalable and powerful, while making sure that it could be used by journalists who would not possess all the technical knowledge to exploit it fully. In order to do so, we had to choose Javascript as our main language, as well as designing the code in such a way that it would allow re-usability and further improvements. This project is a proof of concept being tested in a real-life environment, and will be developed towards more and more accessibility. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2311.14646 [pdf, other]

More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory

Authors: James B. Simon, Dhruva Karkada, Nikhil Ghosh, Mikhail Belkin

Abstract: In our era of enormous neural networks, empirical progress has been driven by the philosophy that more is better. Recent deep learning practice has found repeatedly that larger model size, more data, and more computation (resulting in lower training loss) improves performance. In this paper, we give theoretical backing to these empirical observations by showing that these three properties hold in… ▽ More In our era of enormous neural networks, empirical progress has been driven by the philosophy that more is better. Recent deep learning practice has found repeatedly that larger model size, more data, and more computation (resulting in lower training loss) improves performance. In this paper, we give theoretical backing to these empirical observations by showing that these three properties hold in random feature (RF) regression, a class of models equivalent to shallow networks with only the last layer trained. Concretely, we first show that the test risk of RF regression decreases monotonically with both the number of features and the number of samples, provided the ridge penalty is tuned optimally. In particular, this implies that infinite width RF architectures are preferable to those of any finite width. We then proceed to demonstrate that, for a large class of tasks characterized by powerlaw eigenstructure, training to near-zero training loss is obligatory: near-optimal performance can only be achieved when the training error is much smaller than the test error. Grounding our theory in real-world data, we find empirically that standard computer vision tasks with convolutional neural tangent kernels clearly fall into this class. Taken together, our results tell a simple, testable story of the benefits of overparameterization, overfitting, and more data in random feature models. △ Less

Submitted 15 May, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

Comments: Appeared in ICLR 2024

arXiv:2310.17813 [pdf, other]

A Spectral Condition for Feature Learning

Authors: Greg Yang, James B. Simon, Jeremy Bernstein

Abstract: The push to train ever larger neural networks has motivated the study of initialization and training at large network width. A key challenge is to scale training so that a network's internal representations evolve nontrivially at all widths, a process known as feature learning. Here, we show that feature learning is achieved by scaling the spectral norm of weight matrices and their updates like… ▽ More The push to train ever larger neural networks has motivated the study of initialization and training at large network width. A key challenge is to scale training so that a network's internal representations evolve nontrivially at all widths, a process known as feature learning. Here, we show that feature learning is achieved by scaling the spectral norm of weight matrices and their updates like $\sqrt{\texttt{fan-out}/\texttt{fan-in}}$, in contrast to widely used but heuristic scalings based on Frobenius norm and entry size. Our spectral scaling analysis also leads to an elementary derivation of \emph{maximal update parametrization}. All in all, we aim to provide the reader with a solid conceptual understanding of feature learning in neural networks. △ Less

Submitted 13 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2309.10594 [pdf, other]

Decentralized Online Learning in Task Assignment Games for Mobile Crowdsensing

Authors: Bernd Simon, Andrea Ortiz, Walid Saad, Anja Klein

Abstract: The problem of coordinated data collection is studied for a mobile crowdsensing (MCS) system. A mobile crowdsensing platform (MCSP) sequentially publishes sensing tasks to the available mobile units (MUs) that signal their willingness to participate in a task by sending sensing offers back to the MCSP. From the received offers, the MCSP decides the task assignment. A stable task assignment must ad… ▽ More The problem of coordinated data collection is studied for a mobile crowdsensing (MCS) system. A mobile crowdsensing platform (MCSP) sequentially publishes sensing tasks to the available mobile units (MUs) that signal their willingness to participate in a task by sending sensing offers back to the MCSP. From the received offers, the MCSP decides the task assignment. A stable task assignment must address two challenges: the MCSP's and MUs' conflicting goals, and the uncertainty about the MUs' required efforts and preferences. To overcome these challenges a novel decentralized approach combining matching theory and online learning, called collision-avoidance multi-armed bandit with strategic free sensing (CA-MAB-SFS), is proposed. The task assignment problem is modeled as a matching game considering the MCSP's and MUs' individual goals while the MUs learn their efforts online. Our innovative "free-sensing" mechanism significantly improves the MU's learning process while reducing collisions during task allocation. The stable regret of CA-MAB-SFS, i.e., the loss of learning, is analytically shown to be bounded by a sublinear function, ensuring the convergence to a stable optimal solution. Simulation results show that CA-MAB-SFS increases the MUs' and the MCSP's satisfaction compared to state-of-the-art methods while reducing the average task completion time by at least 16%. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.01592 [pdf, other]

Les Houches Lectures on Deep Learning at Large & Infinite Width

Authors: Yasaman Bahri, Boris Hanin, Antonin Brossollet, Vittorio Erba, Christian Keup, Rosalba Pacelli, James B. Simon

Abstract: These lectures, presented at the 2022 Les Houches Summer School on Statistical Physics and Machine Learning, focus on the infinite-width limit and large-width regime of deep neural networks. Topics covered include various statistical and dynamical properties of these networks. In particular, the lecturers discuss properties of random deep neural networks; connections between trained deep neural ne… ▽ More These lectures, presented at the 2022 Les Houches Summer School on Statistical Physics and Machine Learning, focus on the infinite-width limit and large-width regime of deep neural networks. Topics covered include various statistical and dynamical properties of these networks. In particular, the lecturers discuss properties of random deep neural networks; connections between trained deep neural networks, linear models, kernels, and Gaussian processes that arise in the infinite-width limit; and perturbative and non-perturbative treatments of large but finite-width networks, at initialization and after training. △ Less

Submitted 12 February, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

Comments: These are notes from lectures delivered by Yasaman Bahri and Boris Hanin at the 2022 Les Houches Summer School on Statistics Physics and Machine Learning and a first version of them were transcribed by Antonin Brossollet, Vittorio Erba, Christian Keup, Rosalba Pacelli, James B. Simon

arXiv:2306.13185 [pdf, ps, other]

An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression

Authors: Lijia Zhou, James B. Simon, Gal Vardi, Nathan Srebro

Abstract: We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model. We take an "agnostic" view in the following sense: we consider the cost as a function of sample size for any target function, even if the sample size is not large enough for consistency or… ▽ More We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model. We take an "agnostic" view in the following sense: we consider the cost as a function of sample size for any target function, even if the sample size is not large enough for consistency or the target is outside the RKHS. We analyze the cost of overfitting under a Gaussian universality ansatz using recently derived (non-rigorous) risk estimates in terms of the task eigenstructure. Our analysis provides a more refined characterization of benign, tempered and catastrophic overfitting (cf. Mallinar et al. 2022). △ Less

Submitted 22 March, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: This is the ICLR CR version

arXiv:2306.08055 [pdf, other]

Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training

Authors: Abraham J. Fetterman, Ellie Kitanidis, Joshua Albrecht, Zachary Polizzi, Bryden Fogelman, Maksis Knutins, Bartosz Wróblewski, James B. Simon, Kanjun Qiu

Abstract: Hyperparameter tuning of deep learning models can lead to order-of-magnitude performance gains for the same amount of compute. Despite this, systematic tuning is uncommon, particularly for large models, which are expensive to evaluate and tend to have many hyperparameters, necessitating difficult judgment calls about tradeoffs, budgets, and search bounds. To address these issues and propose a prac… ▽ More Hyperparameter tuning of deep learning models can lead to order-of-magnitude performance gains for the same amount of compute. Despite this, systematic tuning is uncommon, particularly for large models, which are expensive to evaluate and tend to have many hyperparameters, necessitating difficult judgment calls about tradeoffs, budgets, and search bounds. To address these issues and propose a practical method for robustly tuning large models, we present Cost-Aware Pareto Region Bayesian Search (CARBS), a Bayesian optimization algorithm that performs local search around the performance-cost Pareto frontier. CARBS does well even in unbounded search spaces with many hyperparameters, learns scaling relationships so that it can tune models even as they are scaled up, and automates much of the "black magic" of tuning. Among our results, we effectively solve the entire ProcGen benchmark just by tuning a simple baseline (PPO, as provided in the original ProcGen paper). We also reproduce the model size vs. training tokens scaling result from the Chinchilla project (Hoffmann et al. 2022), while simultaneously discovering scaling laws for every other hyperparameter, via an easy automated process that uses significantly less compute and is applicable to any deep learning problem (not just language models). △ Less

Submitted 13 June, 2023; originally announced June 2023.

arXiv:2305.12682 [pdf, other]

Matching Game for Optimized Association in Quantum Communication Networks

Authors: Mahdi Chehimi, Bernd Simon, Walid Saad, Anja Klein, Don Towsley, Mérouane Debbah

Abstract: Enabling quantum switches (QSs) to serve requests submitted by quantum end nodes in quantum communication networks (QCNs) is a challenging problem due to the heterogeneous fidelity requirements of the submitted requests and the limited resources of the QCN. Effectively determining which requests are served by a given QS is fundamental to foster developments in practical QCN applications, like quan… ▽ More Enabling quantum switches (QSs) to serve requests submitted by quantum end nodes in quantum communication networks (QCNs) is a challenging problem due to the heterogeneous fidelity requirements of the submitted requests and the limited resources of the QCN. Effectively determining which requests are served by a given QS is fundamental to foster developments in practical QCN applications, like quantum data centers. However, the state-of-the-art on QS operation has overlooked this association problem, and it mainly focused on QCNs with a single QS. In this paper, the request-QS association problem in QCNs is formulated as a matching game that captures the limited QCN resources, heterogeneous application-specific fidelity requirements, and scheduling of the different QS operations. To solve this game, a swap-stable request-QS association (RQSA) algorithm is proposed while considering partial QCN information availability. Extensive simulations are conducted to validate the effectiveness of the proposed RQSA algorithm. Simulation results show that the proposed RQSA algorithm achieves a near-optimal (within 5%) performance in terms of the percentage of served requests and overall achieved fidelity, while outperforming benchmark greedy solutions by over 13%. Moreover, the proposed RQSA algorithm is shown to be scalable and maintain its near-optimal performance even when the size of the QCN increases. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: 6 pages, 4 figures

arXiv:2304.01781 [pdf, ps, other]

Mixing predictions for online metric algorithms

Authors: Antonios Antoniadis, Christian Coester, Marek Eliáš, Adam Polak, Bertrand Simon

Abstract: A major technique in learning-augmented online algorithms is combining multiple algorithms or predictors. Since the performance of each predictor may vary over time, it is desirable to use not the single best predictor as a benchmark, but rather a dynamic combination which follows different predictors at different times. We design algorithms that combine predictions and are competitive against suc… ▽ More A major technique in learning-augmented online algorithms is combining multiple algorithms or predictors. Since the performance of each predictor may vary over time, it is desirable to use not the single best predictor as a benchmark, but rather a dynamic combination which follows different predictors at different times. We design algorithms that combine predictions and are competitive against such dynamic combinations for a wide class of online problems, namely, metrical task systems. Against the best (in hindsight) unconstrained combination of $\ell$ predictors, we obtain a competitive ratio of $O(\ell^2)$, and show that this is best possible. However, for a benchmark with slightly constrained number of switches between different predictors, we can get a $(1+ε)$-competitive algorithm. Moreover, our algorithms can be adapted to access predictors in a bandit-like fashion, querying only one predictor at a time. An unexpected implication of one of our lower bounds is a new structural insight about covering formulations for the $k$-server problem. △ Less

Submitted 15 December, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

arXiv:2303.15438 [pdf, other]

On the Stepwise Nature of Self-Supervised Learning

Authors: James B. Simon, Maksis Knutins, Liu Ziyin, Daniel Geisz, Abraham J. Fetterman, Joshua Albrecht

Abstract: We present a simple picture of the training process of joint embedding self-supervised learning methods. We find that these methods learn their high-dimensional embeddings one dimension at a time in a sequence of discrete, well-separated steps. We arrive at this conclusion via the study of a linearized model of Barlow Twins applicable to the case in which the trained network is infinitely wide. We… ▽ More We present a simple picture of the training process of joint embedding self-supervised learning methods. We find that these methods learn their high-dimensional embeddings one dimension at a time in a sequence of discrete, well-separated steps. We arrive at this conclusion via the study of a linearized model of Barlow Twins applicable to the case in which the trained network is infinitely wide. We solve the training dynamics of this model from small initialization, finding that the model learns the top eigenmodes of a certain contrastive kernel in a stepwise fashion, and obtain a closed-form expression for the final learned representations. Remarkably, we then see the same stepwise learning phenomenon when training deep ResNets using the Barlow Twins, SimCLR, and VICReg losses. Our theory suggests that, just as kernel regression can be thought of as a model of supervised learning, kernel PCA may serve as a useful model of self-supervised learning. △ Less

Submitted 30 May, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

Comments: 9 pages (main text) + 14 pages (refs + appendices). ICML '23

arXiv:2210.13417 [pdf, other]

Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds

Authors: Joshua Albrecht, Abraham J. Fetterman, Bryden Fogelman, Ellie Kitanidis, Bartosz Wróblewski, Nicole Seo, Michael Rosenthal, Maksis Knutins, Zachary Polizzi, James B. Simon, Kanjun Qiu

Abstract: Despite impressive successes, deep reinforcement learning (RL) systems still fall short of human performance on generalization to new tasks and environments that differ from their training. As a benchmark tailored for studying RL generalization, we introduce Avalon, a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gatheri… ▽ More Despite impressive successes, deep reinforcement learning (RL) systems still fall short of human performance on generalization to new tasks and environments that differ from their training. As a benchmark tailored for studying RL generalization, we introduce Avalon, a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gathering food, and avoiding hazards. Avalon is unique among existing RL benchmarks in that the reward function, world dynamics, and action space are the same for every task, with tasks differentiated solely by altering the environment; its 20 tasks, ranging in complexity from eat and throw to hunt and navigate, each create worlds in which the agent must perform specific skills in order to survive. This setup enables investigations of generalization within tasks, between tasks, and to compositional tasks that require combining skills learned from previous tasks. Avalon includes a highly efficient simulator, a library of baselines, and a benchmark with scoring metrics evaluated against hundreds of hours of human performance, all of which are open-source and publicly available. We find that standard RL baselines make progress on most tasks but are still far from human performance, suggesting Avalon is challenging enough to advance the quest for generalizable RL. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: Accepted to NeurIPS Datasets and Benchmarks 2022. Video and links to all code, data, etc can be found at https://generallyintelligent.com/avalon/

arXiv:2210.02775 [pdf, ps, other]

Paging with Succinct Predictions

Authors: Antonios Antoniadis, Joan Boyar, Marek Eliáš, Lene M. Favrholdt, Ruben Hoeksma, Kim S. Larsen, Adam Polak, Bertrand Simon

Abstract: Paging is a prototypical problem in the area of online algorithms. It has also played a central role in the development of learning-augmented algorithms -- a recent line of research that aims to ameliorate the shortcomings of classical worst-case analysis by giving algorithms access to predictions. Such predictions can typically be generated using a machine learning approach, but they are inherent… ▽ More Paging is a prototypical problem in the area of online algorithms. It has also played a central role in the development of learning-augmented algorithms -- a recent line of research that aims to ameliorate the shortcomings of classical worst-case analysis by giving algorithms access to predictions. Such predictions can typically be generated using a machine learning approach, but they are inherently imperfect. Previous work on learning-augmented paging has investigated predictions on (i) when the current page will be requested again (reoccurrence predictions), (ii) the current state of the cache in an optimal algorithm (state predictions), (iii) all requests until the current page gets requested again, and (iv) the relative order in which pages are requested. We study learning-augmented paging from the new perspective of requiring the least possible amount of predicted information. More specifically, the predictions obtained alongside each page request are limited to one bit only. We consider two natural such setups: (i) discard predictions, in which the predicted bit denotes whether or not it is ``safe'' to evict this page, and (ii) phase predictions, where the bit denotes whether the current page will be requested in the next phase (for an appropriate partitioning of the input into phases). We develop algorithms for each of the two setups that satisfy all three desirable properties of learning-augmented algorithms -- that is, they are consistent, robust and smooth -- despite being limited to a one-bit prediction per request. We also present lower bounds establishing that our algorithms are essentially best possible. △ Less

Submitted 6 October, 2022; originally announced October 2022.

arXiv:2209.01691 [pdf, other]

On Kernel Regression with Data-Dependent Kernels

Authors: James B. Simon

Abstract: The primary hyperparameter in kernel regression (KR) is the choice of kernel. In most theoretical studies of KR, one assumes the kernel is fixed before seeing the training data. Under this assumption, it is known that the optimal kernel is equal to the prior covariance of the target function. In this note, we consider KR in which the kernel may be updated after seeing the training data. We point o… ▽ More The primary hyperparameter in kernel regression (KR) is the choice of kernel. In most theoretical studies of KR, one assumes the kernel is fixed before seeing the training data. Under this assumption, it is known that the optimal kernel is equal to the prior covariance of the target function. In this note, we consider KR in which the kernel may be updated after seeing the training data. We point out that an analogous choice of kernel using the posterior of the target function is optimal in this setting. Connections to the view of deep neural networks as data-dependent kernel learners are discussed. △ Less

Submitted 26 September, 2022; v1 submitted 4 September, 2022; originally announced September 2022.

Comments: 7 pages, 1 figure

arXiv:2207.06569 [pdf, other]

Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting

Authors: Neil Mallinar, James B. Simon, Amirhesam Abedsoltan, Parthe Pandit, Mikhail Belkin, Preetum Nakkiran

Abstract: The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods, which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body… ▽ More The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods, which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body of recent work has studied benign overfitting, a phenomenon where some interpolating methods approach Bayes optimality, even in the presence of noise. In this work we argue that while benign overfitting has been instructive and fruitful to study, many real interpolating methods like neural networks do not fit benignly: modest noise in the training set causes nonzero (but non-infinite) excess risk at test time, implying these models are neither benign nor catastrophic but rather fall in an intermediate regime. We call this intermediate regime tempered overfitting, and we initiate its systematic study. We first explore this phenomenon in the context of kernel (ridge) regression (KR) by obtaining conditions on the ridge parameter and kernel eigenspectrum under which KR exhibits each of the three behaviors. We find that kernels with powerlaw spectra, including Laplace kernels and ReLU neural tangent kernels, exhibit tempered overfitting. We then empirically study deep neural networks through the lens of our taxonomy, and find that those trained to interpolation are tempered, while those stopped early are benign. We hope our work leads to a more refined understanding of overfitting in modern learning. △ Less

Submitted 15 July, 2024; v1 submitted 13 July, 2022; originally announced July 2022.

Comments: NM and JS co-first authors

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2203.17019 [pdf, other]

DeepFry: Identifying Vocal Fry Using Deep Neural Networks

Authors: Bronya R. Chernyak, Talia Ben Simon, Yael Segal, Jeremy Steffman, Eleanor Chodroff, Jennifer S. Cole, Joseph Keshet

Abstract: Vocal fry or creaky voice refers to a voice quality characterized by irregular glottal opening and low pitch. It occurs in diverse languages and is prevalent in American English, where it is used not only to mark phrase finality, but also sociolinguistic factors and affect. Due to its irregular periodicity, creaky voice challenges automatic speech processing and recognition systems, particularly f… ▽ More Vocal fry or creaky voice refers to a voice quality characterized by irregular glottal opening and low pitch. It occurs in diverse languages and is prevalent in American English, where it is used not only to mark phrase finality, but also sociolinguistic factors and affect. Due to its irregular periodicity, creaky voice challenges automatic speech processing and recognition systems, particularly for languages where creak is frequently used. This paper proposes a deep learning model to detect creaky voice in fluent speech. The model is composed of an encoder and a classifier trained together. The encoder takes the raw waveform and learns a representation using a convolutional neural network. The classifier is implemented as a multi-headed fully-connected network trained to detect creaky voice, voicing, and pitch, where the last two are used to refine creak prediction. The model is trained and tested on speech of American English speakers, annotated for creak by trained phoneticians. We evaluated the performance of our system using two encoders: one is tailored for the task, and the other is based on a state-of-the-art unsupervised representation. Results suggest our best-performing system has improved recall and F1 scores compared to previous methods on unseen data. △ Less

Submitted 26 June, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

Comments: Accepted to Interspeech 2022

arXiv:2202.11730 [pdf, other]

doi 10.3847/1538-4357/ac7a3c

Using Bayesian Deep Learning to infer Planet Mass from Gaps in Protoplanetary Disks

Authors: Sayantan Auddy, Ramit Dey, Min-Kai Lin, Daniel Carrera, Jacob B. Simon

Abstract: Planet induced sub-structures, like annular gaps, observed in dust emission from protoplanetary disks provide a unique probe to characterize unseen young planets. While deep learning based model has an edge in characterizing the planet's properties over traditional methods, like customized simulations and empirical relations, it lacks in its ability to quantify the uncertainty associated with its… ▽ More Planet induced sub-structures, like annular gaps, observed in dust emission from protoplanetary disks provide a unique probe to characterize unseen young planets. While deep learning based model has an edge in characterizing the planet's properties over traditional methods, like customized simulations and empirical relations, it lacks in its ability to quantify the uncertainty associated with its predictions. In this paper, we introduce a Bayesian deep learning network "DPNNet-Bayesian" that can predict planet mass from disk gaps and provides uncertainties associated with the prediction. A unique feature of our approach is that it can distinguish between the uncertainty associated with the deep learning architecture and uncertainty inherent in the input data due to measurement noise. The model is trained on a data set generated from disk-planet simulations using the \textsc{fargo3d} hydrodynamics code with a newly implemented fixed grain size module and improved initial conditions. The Bayesian framework enables estimating a gauge/confidence interval over the validity of the prediction when applied to unknown observations. As a proof-of-concept, we apply DPNNet-Bayesian to dust gaps observed in HL Tau. The network predicts masses of $ 86.0 \pm 5.5 M_{\Earth} $, $ 43.8 \pm 3.3 M_{\Earth} $, and $ 92.2 \pm 5.1 M_{\Earth} $ respectively, which are comparable to other studies based on specialized simulations. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Comments: 14 pages, 6 figures, submitted to ApJ

arXiv:2112.09384 [pdf, ps, other]

An Exact Algorithm for the Linear Tape Scheduling Problem

Authors: Valentin Honoré, Bertrand Simon, Frédéric Suter

Abstract: Magnetic tapes are often considered as an outdated storage technology, yet they are still used to store huge amounts of data. Their main interests are a large capacity and a low price per gigabyte, which come at the cost of a much larger file access time than on disks. With tapes, finding the right ordering of multiple file accesses is thus key to performance. Moving the reading head back and fort… ▽ More Magnetic tapes are often considered as an outdated storage technology, yet they are still used to store huge amounts of data. Their main interests are a large capacity and a low price per gigabyte, which come at the cost of a much larger file access time than on disks. With tapes, finding the right ordering of multiple file accesses is thus key to performance. Moving the reading head back and forth along a kilometer long tape has a non-negligible cost and unnecessary movements thus have to be avoided. However, the optimization of tape request ordering has then rarely been studied in the scheduling literature, much less than I/O scheduling on disks. For instance, minimizing the average service time for several read requests on a linear tape remains an open question. Therefore, in this paper, we aim at improving the quality of service experienced by users of tape storage systems, and not only the peak performance of such systems. To this end, we propose a reasonable polynomial-time exact algorithm while this problem and simpler variants have been conjectured NP-hard. We also refine the proposed model by considering U-turn penalty costs accounting for inherent mechanical accelerations. Then, we propose low-cost variants of our optimal algorithm by restricting the solution space, yet still yielding an accurate suboptimal solution. Finally, we compare our algorithms to existing solutions from the literature on logs of the mass storage management system of a major datacenter. This allows us to assess the quality of previous solutions and the improvement achieved by our low-cost algorithms. Aiming for reproducibility, we make available the complete implementation of the algorithms used in our evaluation, alongside the dataset of tape requests that is, to the best of our knowledge, the first of its kind to be publicly released. △ Less

Submitted 4 May, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

arXiv:2110.13116 [pdf, other]

Learning-Augmented Dynamic Power Management with Multiple States via New Ski Rental Bounds

Authors: Antonios Antoniadis, Christian Coester, Marek Eliáš, Adam Polak, Bertrand Simon

Abstract: We study the online problem of minimizing power consumption in systems with multiple power-saving states. During idle periods of unknown lengths, an algorithm has to choose between power-saving states of different energy consumption and wake-up costs. We develop a learning-augmented online algorithm that makes decisions based on (potentially inaccurate) predicted lengths of the idle periods. The a… ▽ More We study the online problem of minimizing power consumption in systems with multiple power-saving states. During idle periods of unknown lengths, an algorithm has to choose between power-saving states of different energy consumption and wake-up costs. We develop a learning-augmented online algorithm that makes decisions based on (potentially inaccurate) predicted lengths of the idle periods. The algorithm's performance is near-optimal when predictions are accurate and degrades gracefully with increasing prediction error, with a worst-case guarantee almost identical to the optimal classical online algorithm for the problem. A key ingredient in our approach is a new algorithm for the online ski rental problem in the learning augmented setting with tight dependence on the prediction error. We support our theoretical findings with experiments. △ Less

Submitted 25 October, 2021; originally announced October 2021.

arXiv:2110.03922 [pdf, other]

The Eigenlearning Framework: A Conservation Law Perspective on Kernel Regression and Wide Neural Networks

Authors: James B. Simon, Madeline Dickens, Dhruva Karkada, Michael R. DeWeese

Abstract: We derive simple closed-form estimates for the test risk and other generalization metrics of kernel ridge regression (KRR). Relative to prior work, our derivations are greatly simplified and our final expressions are more readily interpreted. These improvements are enabled by our identification of a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions.… ▽ More We derive simple closed-form estimates for the test risk and other generalization metrics of kernel ridge regression (KRR). Relative to prior work, our derivations are greatly simplified and our final expressions are more readily interpreted. These improvements are enabled by our identification of a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions. Test risk and other objects of interest are expressed transparently in terms of our conserved quantity evaluated in the kernel eigenbasis. We use our improved framework to: i) provide a theoretical explanation for the "deep bootstrap" of Nakkiran et al (2020), ii) generalize a previous result regarding the hardness of the classic parity problem, iii) fashion a theoretical tool for the study of adversarial robustness, and iv) draw a tight analogy between KRR and a well-studied system in statistical physics. △ Less

Submitted 26 October, 2023; v1 submitted 8 October, 2021; originally announced October 2021.

Comments: 12 pages (main text) + 25 pages (refs + appendices). A previous version of this manuscript was entitled "Neural Tangent Kernel Eigenvalues Accurately Predict Generalization."

arXiv:2107.11774 [pdf, other]

SGD with a Constant Large Learning Rate Can Converge to Local Maxima

Authors: Liu Ziyin, Botao Li, James B. Simon, Masahito Ueda

Abstract: Previous works on stochastic gradient descent (SGD) often focus on its success. In this work, we construct worst-case optimization problems illustrating that, when not in the regimes that the previous works often assume, SGD can exhibit many strange and potentially undesirable behaviors. Specifically, we construct landscapes and data distributions such that (1) SGD converges to local maxima, (2) S… ▽ More Previous works on stochastic gradient descent (SGD) often focus on its success. In this work, we construct worst-case optimization problems illustrating that, when not in the regimes that the previous works often assume, SGD can exhibit many strange and potentially undesirable behaviors. Specifically, we construct landscapes and data distributions such that (1) SGD converges to local maxima, (2) SGD escapes saddle points arbitrarily slowly, (3) SGD prefers sharp minima over flat ones, and (4) AMSGrad converges to local maxima. We also realize results in a minimal neural network-like example. Our results highlight the importance of simultaneously analyzing the minibatch sampling, discrete-time updates rules, and realistic landscapes to understand the role of SGD in deep learning. △ Less

Submitted 27 May, 2023; v1 submitted 25 July, 2021; originally announced July 2021.

Comments: Fixed typos

arXiv:2106.03186 [pdf, other]

Reverse Engineering the Neural Tangent Kernel

Authors: James B. Simon, Sajant Anand, Michael R. DeWeese

Abstract: The development of methods to guide the design of neural networks is an important open challenge for deep learning theory. As a paradigm for principled neural architecture design, we propose the translation of high-performing kernels, which are better-understood and amenable to first-principles design, into equivalent network architectures, which have superior efficiency, flexibility, and feature… ▽ More The development of methods to guide the design of neural networks is an important open challenge for deep learning theory. As a paradigm for principled neural architecture design, we propose the translation of high-performing kernels, which are better-understood and amenable to first-principles design, into equivalent network architectures, which have superior efficiency, flexibility, and feature learning. To this end, we constructively prove that, with just an appropriate choice of activation function, any positive-semidefinite dot-product kernel can be realized as either the NNGP or neural tangent kernel of a fully-connected neural network with only one hidden layer. We verify our construction numerically and demonstrate its utility as a design tool for finite fully-connected networks in several experiments. △ Less

Submitted 13 August, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

Comments: 15 pages, 5 figures

arXiv:2103.01640 [pdf, other]

Double Coverage with Machine-Learned Advice

Authors: Alexander Lindermayr, Nicole Megow, Bertrand Simon

Abstract: We study the fundamental online k-server problem in a learning-augmented setting. While in the traditional online model, an algorithm has no information about the request sequence, we assume that there is given some advice (e.g. machine-learned predictions) on an algorithm's decision. There is, however, no guarantee on the quality of the prediction and it might be far from being correct. Our mai… ▽ More We study the fundamental online k-server problem in a learning-augmented setting. While in the traditional online model, an algorithm has no information about the request sequence, we assume that there is given some advice (e.g. machine-learned predictions) on an algorithm's decision. There is, however, no guarantee on the quality of the prediction and it might be far from being correct. Our main result is a learning-augmented variation of the well-known Double Coverage algorithm for k-server on the line (Chrobak et al., SIDMA 1991) in which we integrate predictions as well as our trust into their quality. We give an error-dependent competitive ratio, which is a function of a user-defined confidence parameter, and which interpolates smoothly between an optimal consistency, the performance in case that all predictions are correct, and the best-possible robustness regardless of the prediction quality. When given good predictions, we improve upon known lower bounds for online algorithms without advice. We further show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff, within a class of deterministic algorithms respecting local and memoryless properties. Our algorithm outperforms a previously proposed (more general) learning-augmented algorithm. It is remarkable that the previous algorithm crucially exploits memory, whereas our algorithm is memoryless. Finally, we demonstrate in experiments the practicability and the superior performance of our algorithm on real-world data. △ Less

Submitted 16 November, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

Comments: Accepted at ITCS 2022

arXiv:2011.05181 [pdf, other]

Speed-Robust Scheduling -- Sand, Bricks, and Rocks

Authors: Franziska Eberle, Ruben Hoeksma, Nicole Megow, Lukas Nölke, Kevin Schewior, Bertrand Simon

Abstract: The speed-robust scheduling problem is a two-stage problem where given $m$ machines, jobs must be grouped into at most $m$ bags while the processing speeds of the given $m$ machines are unknown. After the speeds are revealed, the grouped jobs must be assigned to the machines without being separated. To evaluate the performance of algorithms, we determine upper bounds on the worst-case ratio of the… ▽ More The speed-robust scheduling problem is a two-stage problem where given $m$ machines, jobs must be grouped into at most $m$ bags while the processing speeds of the given $m$ machines are unknown. After the speeds are revealed, the grouped jobs must be assigned to the machines without being separated. To evaluate the performance of algorithms, we determine upper bounds on the worst-case ratio of the algorithm's makespan and the optimal makespan given full information. We refer to this ratio as the robustness factor. We give an algorithm with a robustness factor $2-1/m$ for the most general setting and improve this to $1.8$ for equal-size jobs. For the special case of infinitesimal jobs, we give an algorithm with an optimal robustness factor equal to $e/(e-1) \approx 1.58$. The particular machine environment in which all machines have either speed $0$ or $1$ was studied before by Stein and Zhong (SODA 2019). For this setting, we provide an algorithm for scheduling infinitesimal jobs with an optimal robustness factor of $(1+\sqrt{2})/2 \approx 1.207$. It lays the foundation for an algorithm matching the lower bound of $4/3$ for equal-size jobs. △ Less

Submitted 31 May, 2022; v1 submitted 10 November, 2020; originally announced November 2020.

arXiv:2007.08415 [pdf, other]

Fully Dynamic Algorithms for Knapsack Problems with Polylogarithmic Update Time

Authors: Franziska Eberle, Nicole Megow, Lukas Nölke, Bertrand Simon, Andreas Wiese

Abstract: Knapsack problems are among the most fundamental problems in optimization. In the Multiple Knapsack problem, we are given multiple knapsacks with different capacities and items with values and sizes. The task is to find a subset of items of maximum total value that can be packed into the knapsacks without exceeding the capacities. We investigate this problem and special cases thereof in the contex… ▽ More Knapsack problems are among the most fundamental problems in optimization. In the Multiple Knapsack problem, we are given multiple knapsacks with different capacities and items with values and sizes. The task is to find a subset of items of maximum total value that can be packed into the knapsacks without exceeding the capacities. We investigate this problem and special cases thereof in the context of dynamic algorithms and design data structures that efficiently maintain near-optimal knapsack solutions for dynamically changing input. More precisely, we handle the arrival and departure of individual items or knapsacks during the execution of the algorithm with worst-case update time polylogarithmic in the number of items. As the optimal and any approximate solution may change drastically, we only maintain implicit solutions and support certain queries in polylogarithmic time, such as the packing of an item and the solution value. While dynamic algorithms are well-studied in the context of graph problems, there is hardly any work on packing problems and generally much less on non-graph problems. Given the theoretical interest in knapsack problems and their practical relevance, it is somewhat surprising that Knapsack has not been addressed before in the context of dynamic algorithms and our work bridges this gap. △ Less

Submitted 4 October, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

Comments: Accepted for publication at FSTTCS 2021

arXiv:2003.05699 [pdf, other]

doi 10.1137/21M1425487

On Hop-Constrained Steiner Trees in Tree-Like Metrics

Authors: Martin Böhm, Ruben Hoeksma, Nicole Megow, Lukas Nölke, Bertrand Simon

Abstract: We consider the problem of computing a Steiner tree of minimum cost under a hop constraint which requires the depth of the tree to be at most $k$. Our main result is an exact algorithm for metrics induced by graphs with bounded treewidth that runs in time $n^{O(k)}$. For the special case of a path, we give a simple algorithm that solves the problem in polynomial time, even if $k$ is part of the in… ▽ More We consider the problem of computing a Steiner tree of minimum cost under a hop constraint which requires the depth of the tree to be at most $k$. Our main result is an exact algorithm for metrics induced by graphs with bounded treewidth that runs in time $n^{O(k)}$. For the special case of a path, we give a simple algorithm that solves the problem in polynomial time, even if $k$ is part of the input. The main result can be used to obtain, in quasi-polynomial time, a near-optimal solution that violates the $k$-hop constraint by at most one hop for more general metrics induced by graphs of bounded highway dimension and bounded doubling dimension. For non-metric graphs, we rule out an $o(\log n)$-approximation, assuming P$\neq$NP even when relaxing the hop constraint by any additive constant. △ Less

Submitted 11 October, 2022; v1 submitted 12 March, 2020; originally announced March 2020.

Journal ref: SIAM Journal on Discrete Mathematics, Vol. 36, Iss. 2 (2022)

arXiv:2003.02144 [pdf, ps, other]

Online metric algorithms with untrusted predictions

Authors: Antonios Antoniadis, Christian Coester, Marek Elias, Adam Polak, Bertrand Simon

Abstract: Machine-learned predictors, although achieving very good results for inputs resembling training data, cannot possibly provide perfect predictions in all situations. Still, decision-making systems that are based on such predictors need not only to benefit from good predictions but also to achieve a decent performance when the predictions are inadequate. In this paper, we propose a prediction setup… ▽ More Machine-learned predictors, although achieving very good results for inputs resembling training data, cannot possibly provide perfect predictions in all situations. Still, decision-making systems that are based on such predictors need not only to benefit from good predictions but also to achieve a decent performance when the predictions are inadequate. In this paper, we propose a prediction setup for arbitrary metrical task systems (MTS) (e.g., caching, k-server and convex body chasing) and online matching on the line. We utilize results from the theory of online algorithms to show how to make the setup robust. Specifically for caching, we present an algorithm whose performance, as a function of the prediction error, is exponentially better than what is achievable for general MTS. Finally, we present an empirical evaluation of our methods on real world datasets, which suggests practicality. △ Less

Submitted 6 April, 2023; v1 submitted 4 March, 2020; originally announced March 2020.

arXiv:2001.01125 [pdf, ps, other]

doi 10.1016/j.tcs.2022.10.004

Discovering and Certifying Lower Bounds for the Online Bin Stretching Problem

Authors: Martin Böhm, Bertrand Simon

Abstract: There are several problems in the theory of online computation where tight lower bounds on the competitive ratio are unknown and expected to be difficult to describe in a short form. A good example is the Online Bin Stretching problem, in which the task is to pack the incoming items online into bins while minimizing the load of the largest bin. Additionally, the optimal load of the entire instance… ▽ More There are several problems in the theory of online computation where tight lower bounds on the competitive ratio are unknown and expected to be difficult to describe in a short form. A good example is the Online Bin Stretching problem, in which the task is to pack the incoming items online into bins while minimizing the load of the largest bin. Additionally, the optimal load of the entire instance is known in advance. The contribution of this paper is twofold. We use the Coq proof assistant to formalize the Online Bin Stretching problem and provide a program certifying lower bounds of this problem. Because of the size of the certificates, previously claimed lower bounds were never formally proven. To the best of our knowledge, this is the first use of a formal verification toolkit to certify a lower bound for an online problem. We also provide the first non-trivial lower bounds for Online Bin Stretching with 6, 7 and 8 bins, and increase the best known lower bound for 3 bins. We describe in detail the algorithmic improvements which were necessary for the discovery of the new lower bounds, which are several orders of magnitude more complex. △ Less

Submitted 14 October, 2022; v1 submitted 4 January, 2020; originally announced January 2020.

arXiv:1912.09170 [pdf, ps, other]

Energy Minimization in DAG Scheduling on MPSoCs at Run-Time: Theory and Practice

Authors: Bertrand Simon, Joachim Falk, Nicole Megow, Jürgen Teich

Abstract: Static (offline) techniques for mapping applications given by task graphs to MPSoC systems often deliver overly pessimistic and thus suboptimal results w.r.t. exploiting time slack in order to minimize the energy consumption. This holds true in particular in case computation times of tasks may be workload-dependent and becoming known only at runtime or in case of conditionally executed tasks or sc… ▽ More Static (offline) techniques for mapping applications given by task graphs to MPSoC systems often deliver overly pessimistic and thus suboptimal results w.r.t. exploiting time slack in order to minimize the energy consumption. This holds true in particular in case computation times of tasks may be workload-dependent and becoming known only at runtime or in case of conditionally executed tasks or scenarios. This paper studies and quantitatively evaluates different classes of algorithms for scheduling periodic applications given by task graphs (i.e., DAGs) with precedence constraints and a global deadline on homogeneous MPSoCs purely at runtime on a per-instance base. We present and analyze algorithms providing provably optimal results as well as approximation algorithms with proven guarantees on the achieved energy savings. For problem instances taken from realistic embedded system benchmarks as well as synthetic scalable problems, we provide results on the computation time and quality of each algorithm to perform a) scheduling and b) voltage/speed assignments for each task at runtime. In our portfolio, we distinguish as well continuous and discrete speed (e.g., DVFS-related) assignment problems. In summary, the presented ties between theory (algorithmic complexity and optimality) and execution time analysis deliver important insights on the practical usability of the presented algorithms for runtime optimization of task scheduling and speed assignment on MPSoCs. △ Less

Submitted 19 December, 2019; originally announced December 2019.

arXiv:1912.03088 [pdf, ps, other]

Scheduling on Hybrid Platforms: Improved Approximability Window

Authors: Vincent Fagnon, Imed Kacem, Giorgio Lucarelli, Bertrand Simon

Abstract: Modern platforms are using accelerators in conjunction with standard processing units in order to reduce the running time of specific operations, such as matrix operations, and improve their performance. Scheduling on such hybrid platforms is a challenging problem since the algorithms used for the case of homogeneous resources do not adapt well. In this paper we consider the problem of scheduling… ▽ More Modern platforms are using accelerators in conjunction with standard processing units in order to reduce the running time of specific operations, such as matrix operations, and improve their performance. Scheduling on such hybrid platforms is a challenging problem since the algorithms used for the case of homogeneous resources do not adapt well. In this paper we consider the problem of scheduling a set of tasks subject to precedence constraints on hybrid platforms, composed of two types of processing units. We propose a $(3+2\sqrt{2})$-approximation algorithm and a conditional lower bound of 3 on the approximation ratio. These results improve upon the 6-approximation algorithm proposed by Kedad-Sidhoum et al. as well as the lower bound of 2 due to Svensson for identical machines. Our algorithm is inspired by the former one and distinguishes the allocation and the scheduling phases. However, we propose a different allocation procedure which, although is less efficient for the allocation sub-problem, leads to an improved approximation ratio for the whole scheduling problem. This approximation ratio actually decreases when the number of processing units of each type is close and matches the conditional lower bound when they are equal. △ Less

Submitted 9 February, 2020; v1 submitted 6 December, 2019; originally announced December 2019.

arXiv:1909.11365 [pdf, other]

doi 10.1145/3387110

Scheduling on Two Types of Resources: a Survey

Authors: Olivier Beaumont, Louis-claude Canon, Lionel Eyraud-Dubois, Giorgio Lucarelli, Loris Marchal, Clément Mommessin, Bertrand Simon, Denis Trystram

Abstract: The evolution in the design of modern parallel platforms leads to revisit the scheduling jobs on distributed heterogeneous resources. The goal of this survey is to present the main existing algorithms, to classify them based on their underlying principles and to propose unified implementations to enable their fair comparison, both in terms of running time and quality of schedules, on a large set o… ▽ More The evolution in the design of modern parallel platforms leads to revisit the scheduling jobs on distributed heterogeneous resources. The goal of this survey is to present the main existing algorithms, to classify them based on their underlying principles and to propose unified implementations to enable their fair comparison, both in terms of running time and quality of schedules, on a large set of common benchmarks that we made available for the community. Beyond this comparison, our goal is also to understand the main difficulties that heterogeneity conveys and the shared principles that guide the design of efficient algorithms. △ Less

Submitted 30 July, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

Journal ref: ACM Computing Survey, Vol. 53, No. 3, 2020

arXiv:1909.02923 [pdf, other]

doi 10.1109/JSYST.2019.2940145

Data Driven Vulnerability Exploration for Design Phase System Analysis

Authors: Georgios Bakirtzis, Brandon J. Simon, Aidan G. Collins, Cody H. Fleming, Carl R. Elks

Abstract: Applying security as a lifecycle practice is becoming increasingly important to combat targeted attacks in safety-critical systems. Among others there are two significant challenges in this area: (1) the need for models that can characterize a realistic system in the absence of an implementation and (2) an automated way to associate attack vector information; that is, historical data, to such syst… ▽ More Applying security as a lifecycle practice is becoming increasingly important to combat targeted attacks in safety-critical systems. Among others there are two significant challenges in this area: (1) the need for models that can characterize a realistic system in the absence of an implementation and (2) an automated way to associate attack vector information; that is, historical data, to such system models. We propose the cybersecurity body of knowledge (CYBOK), which takes in sufficiently characteristic models of systems and acts as a search engine for potential attack vectors. CYBOK is fundamentally an algorithmic approach to vulnerability exploration, which is a significant extension to the body of knowledge it builds upon. By using CYBOK, security analysts and system designers can work together to assess the overall security posture of systems early in their lifecycle, during major design decisions and before final product designs. Consequently, assisting in applying security earlier and throughout the systems lifecycle. △ Less

Submitted 6 September, 2019; originally announced September 2019.

arXiv:1808.08081 [pdf, other]

doi 10.1109/VIZSEC.2018.8709187

Looking for a Black Cat in a Dark Room: Security Visualization for Cyber-Physical System Design and Analysis

Authors: Georgios Bakirtzis, Brandon J. Simon, Cody H. Fleming, Carl R. Elks

Abstract: Today, there is a plethora of software security tools employing visualizations that enable the creation of useful and effective interactive security analyst dashboards. Such dashboards can assist the analyst to understand the data at hand and, consequently, to conceive more targeted preemption and mitigation security strategies. Despite the recent advances, model-based security analysis is lacking… ▽ More Today, there is a plethora of software security tools employing visualizations that enable the creation of useful and effective interactive security analyst dashboards. Such dashboards can assist the analyst to understand the data at hand and, consequently, to conceive more targeted preemption and mitigation security strategies. Despite the recent advances, model-based security analysis is lacking tools that employ effective dashboards---to manage potential attack vectors, system components, and requirements. This problem is further exacerbated because model-based security analysis produces significantly larger result spaces than security analysis applied to realized systems---where platform specific information, software versions, and system element dependencies are known. Therefore, there is a need to manage the analysis complexity in model-based security through better visualization techniques. Towards that goal, we propose an interactive security analysis dashboard that provides different views largely centered around the system, its requirements, and its associated attack vector space. This tool makes it possible to start analysis earlier in the system lifecycle. We apply this tool in a significant area of engineering design---the design of cyber-physical systems---where security violations can lead to safety hazards. △ Less

Submitted 23 October, 2018; v1 submitted 24 August, 2018; originally announced August 2018.

arXiv:1711.10693 [pdf]

Small Drone Field Experiment: Data Collection & Processing

Authors: Dalton Rosario, Christoph Borel, Damon Conover, Ryan McAlinden, Anthony Ortiz, Sarah Shiver, Blair Simon

Abstract: Following an initiative formalized in April 2016 formally known as ARL West between the U.S. Army Research Laboratory (ARL) and University of Southern California's Institute for Creative Technologies (USC ICT), a field experiment was coordinated and executed in the summer of 2016 by ARL, USC ICT, and Headwall Photonics. The purpose was to image part of the USC main campus in Los Angeles, USA, usin… ▽ More Following an initiative formalized in April 2016 formally known as ARL West between the U.S. Army Research Laboratory (ARL) and University of Southern California's Institute for Creative Technologies (USC ICT), a field experiment was coordinated and executed in the summer of 2016 by ARL, USC ICT, and Headwall Photonics. The purpose was to image part of the USC main campus in Los Angeles, USA, using two portable COTS (commercial off the shelf) aerial drone solutions for data acquisition, for photogrammetry (3D reconstruction from images), and fusion of hyperspectral data with the recovered set of 3D point clouds representing the target area. The research aims for determining the viability of having a machine capable of segmenting the target area into key material classes (e.g., manmade structures, live vegetation, water) for use in multiple purposes, to include providing the user with a more accurate scene understanding and enabling the unsupervised automatic sampling of meaningful material classes from the target area for adaptive semi-supervised machine learning. In the latter, a target set library may be used for automatic machine training with data of local material classes, as an example, to increase the prediction chances of machines recognizing targets. The field experiment and associated data post processing approach to correct for reflectance, geo-rectify, recover the area's dense point clouds from images, register spectral with elevation properties of scene surfaces from the independently collected datasets, and generate the desired scene segmented maps are discussed. Lessons learned from the experience are also highlighted throughout the paper. △ Less

Submitted 29 November, 2017; originally announced November 2017.

arXiv:1610.04064 [pdf, other]

doi 10.1145/2994620.2994632

An Efficient and Robust Social Network De-anonymization Attack

Authors: Gábor György Gulyás, Benedek Simon, Sándor Imre

Abstract: Releasing connection data from social networking services can pose a significant threat to user privacy. In our work, we consider structural social network de-anonymization attacks, which are used when a malicious party uses connections in a public or other identified network to re-identify users in an anonymized social network release that he obtained previously. In this paper we design and eva… ▽ More Releasing connection data from social networking services can pose a significant threat to user privacy. In our work, we consider structural social network de-anonymization attacks, which are used when a malicious party uses connections in a public or other identified network to re-identify users in an anonymized social network release that he obtained previously. In this paper we design and evaluate a novel social de-anonymization attack. In particular, we argue that the similarity function used to re-identify nodes is a key component of such attacks, and we design a novel measure tailored for social networks. We incorporate this measure in an attack called Bumblebee. We evaluate Bumblebee in depth, and show that it significantly outperforms the state-of-the-art, for example it has higher re-identification rates with high precision, robustness against noise, and also has better error control. △ Less

Submitted 13 October, 2016; originally announced October 2016.

arXiv:1604.03446 [pdf]

The Importance of Computing Education Research

Authors: Steve Cooper, Jeff Forbes, Armando Fox, Susanne Hambrusch, Andrew Ko, Beth Simon

Abstract: Interest in computer science is growing. As a result, computer science (CS) and related departments are experiencing an explosive increase in undergraduate enrollments and unprecedented demand from other disciplines for learning computing. According to the 2014 CRA Taulbee Survey, the number of undergraduates declaring a computing major at Ph.D. granting departments in the US has increased 60% fro… ▽ More Interest in computer science is growing. As a result, computer science (CS) and related departments are experiencing an explosive increase in undergraduate enrollments and unprecedented demand from other disciplines for learning computing. According to the 2014 CRA Taulbee Survey, the number of undergraduates declaring a computing major at Ph.D. granting departments in the US has increased 60% from 2011-2014 and the number of degrees granted has increased by 34% from 2008-2013. However, this growth is not limited to higher education. New York City, San Francisco and Oakland public schools will soon be offering computer science to all students at all schools from preschool to 12th grade, although it will be an elective for high school students. This unprecedented demand means that CS departments are likely to teach not only more students in the coming decades, but more diverse students, with more varied backgrounds, motivations, preparations, and abilities. This growth is an unparalleled opportunity to expand the reach of computing education. However, this growth is also a unique research challenge, as we know very little about how best to teach our current students, let alone the students soon to arrive. The burgeoning field of Computing Education Research (CER) is positioned to address this challenge by answering research questions such as, how should we teach computer science, from programming to advanced principles, to a broader and more diverse audience? We argue that computer science departments should lead the way in establishing CER as a foundational research area of computer science, discovering the best ways to teach CS, and inventing the best technologies with which to teach it. This white paper provides a snapshot of the current state of CER and makes actionable recommendations for academic leaders to grow CER as a successful research area in their departments. △ Less

Submitted 12 April, 2016; originally announced April 2016.

Comments: A Computing Community Consortium (CCC) white paper, 12 pages

arXiv:1410.7249 [pdf, other]

Scheduling Trees of Malleable Tasks for Sparse Linear Algebra

Authors: Abdou Guermouche, Loris Marchal, Bertrand Simon, Frédéric Vivien

Abstract: Scientific workloads are often described as directed acyclic task graphs. In this paper, we focus on the multifrontal factorization of sparse matrices, whose task graph is structured as a tree of parallel tasks. Among the existing models for parallel tasks, the concept of malleable tasks is especially powerful as it allows each task to be processed on a time-varying number of processors. Following… ▽ More Scientific workloads are often described as directed acyclic task graphs. In this paper, we focus on the multifrontal factorization of sparse matrices, whose task graph is structured as a tree of parallel tasks. Among the existing models for parallel tasks, the concept of malleable tasks is especially powerful as it allows each task to be processed on a time-varying number of processors. Following the model advocated by Prasanna and Musicus for matrix computations, we consider malleable tasks whose speedup is $p^α$, where $p$ is the fractional share of processors on which a task executes, and $α$ ($0 < α\leq 1$) is a parameter which does not depend on the task. We first motivate the relevance of this model for our application with actual experiments on multicore platforms. Then, we study the optimal allocation proposed by Prasanna and Musicus for makespan minimization using optimal control theory. We largely simplify their proofs by resorting only to pure scheduling arguments. Building on the insight gained thanks to these new proofs, we extend the study to distributed multicore platforms. There, a task cannot be distributed among several distributed nodes. In such a distributed setting (homogeneous or heterogeneous), we prove the NP-completeness of the corresponding scheduling problem, and propose some approximation algorithms. We finally assess the relevance of our approach by simulations on realistic trees. We show that the average performance gain of our allocations with respect to existing solutions (that are thus unaware of the actual speedup functions) is up to 16% for $α=0.9$ (the value observed in the real experiments). △ Less

Submitted 4 June, 2015; v1 submitted 27 October, 2014; originally announced October 2014.

Comments: Paper accepted for publication at EuroPar 2015

Showing 1–40 of 40 results for author: Simon, B