Skip to main content

Showing 1–50 of 69 results for author: Holmes, C

  1. arXiv:2407.08410  [pdf, other

    cs.AI

    Specialist vision-language models for clinical ophthalmology

    Authors: Robbie Holland, Thomas R. P. Taylor, Christopher Holmes, Sophie Riedl, Julia Mai, Maria Patsiamanidi, Dimitra Mitsopoulou, Paul Hager, Philip Müller, Hendrik P. N. Scholl, Hrvoje Bogunović, Ursula Schmidt-Erfurth, Daniel Rueckert, Sobha Sivaprasad, Andrew J. Lotery, Martin J. Menten

    Abstract: Clinicians spend a significant amount of time reviewing medical images and transcribing their findings regarding patient diagnosis, referral and treatment in text form. Vision-language models (VLMs), which automatically interpret images and summarize their findings as text, have enormous potential to alleviate clinical workloads and increase patient access to high-quality medical care. While found… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Submitted to Nature Medicine

  2. arXiv:2406.05213  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    On Subjective Uncertainty Quantification and Calibration in Natural Language Generation

    Authors: Ziyu Wang, Chris Holmes

    Abstract: Applications of large language models often involve the generation of free-form responses, in which case uncertainty quantification becomes challenging. This is due to the need to identify task-specific uncertainties (e.g., about the semantics) which appears difficult to define in general cases. This work addresses these challenges from a perspective of Bayesian decision theory, starting from the… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  3. arXiv:2406.02365  [pdf, other

    cs.RO

    Exploiting Chordal Sparsity for Fast Global Optimality with Application to Localization

    Authors: Frederike Dümbgen, Connor Holmes, Timothy D. Barfoot

    Abstract: In recent years, many estimation problems in robotics have been shown to be solvable to global optimality using their semidefinite relaxations. However, the runtime complexity of off-the-shelf semidefinite programming solvers is up to cubic in problem size, which inhibits real-time solutions of problems involving large state dimensions. We show that for a large class of problems, namely those with… ▽ More

    Submitted 17 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 23 pages, 7 figures

  4. arXiv:2406.00793  [pdf, other

    stat.ML cs.LG

    Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective

    Authors: Fabian Falck, Ziyu Wang, Chris Holmes

    Abstract: In-context learning (ICL) has emerged as a particularly remarkable characteristic of Large Language Models (LLM): given a pretrained LLM and an observed dataset, LLMs can make predictions for new data points from the same distribution without fine-tuning. Numerous works have postulated ICL as approximately Bayesian inference, rendering this a natural hypothesis. In this work, we analyse this hypot… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Accepted at International Conference on Machine Learning (ICML) 2024

  5. arXiv:2405.19309  [pdf, other

    cs.RO

    SDPRLayers: Certifiable Backpropagation Through Polynomial Optimization Problems in Robotics

    Authors: Connor Holmes, Frederike Dümbgen, Timothy D. Barfoot

    Abstract: Differentiable optimization is a powerful new paradigm capable of reconciling model-based and learning-based approaches in robotics. However, the majority of robotics optimization problems are non-convex and current differentiable optimization techniques are therefore prone to convergence to local minima. When this occurs, the gradients provided by these existing solvers can be wildly inaccurate a… ▽ More

    Submitted 21 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  6. arXiv:2403.19381  [pdf, other

    stat.ML cs.LG

    On Uncertainty Quantification for Near-Bayes Optimal Algorithms

    Authors: Ziyu Wang, Chris Holmes

    Abstract: Bayesian modelling allows for the quantification of predictive uncertainty which is crucial in safety-critical applications. Yet for many machine learning (ML) algorithms, it is difficult to construct or implement their Bayesian counterpart. In this work we present a promising approach to address this challenge, based on the hypothesis that commonly used ML algorithms are efficient across a wide v… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  7. arXiv:2403.01485  [pdf, other

    stat.ML cs.CV cs.LG

    Approximations to the Fisher Information Metric of Deep Generative Models for Out-Of-Distribution Detection

    Authors: Sam Dauncey, Chris Holmes, Christopher Williams, Fabian Falck

    Abstract: Likelihood-based deep generative models such as score-based diffusion models and variational autoencoders are state-of-the-art machine learning models approximating high-dimensional distributions of data such as images, text, or audio. One of many downstream tasks they can be naturally applied to is out-of-distribution (OOD) detection. However, seminal work by Nalisnick et al. which we reproduce s… ▽ More

    Submitted 25 May, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

  8. arXiv:2402.00072  [pdf, ps, other

    cs.LG stat.ME stat.ML

    Explainable AI for survival analysis: a median-SHAP approach

    Authors: Lucile Ter-Minassian, Sahra Ghalebikesabi, Karla Diaz-Ordaz, Chris Holmes

    Abstract: With the adoption of machine learning into routine clinical practice comes the need for Explainable AI methods tailored to medical applications. Shapley values have sparked wide interest for locally explaining models. Here, we demonstrate their interpretation strongly depends on both the summary statistic and the estimator for it, which in turn define what we identify as an 'anchor point'. We show… ▽ More

    Submitted 30 January, 2024; originally announced February 2024.

    Comments: Accepted to the Interpretable Machine Learning for Healthcare (IMLH) workshop of the ICML 2022 Conference

  9. arXiv:2401.17737  [pdf, other

    stat.ME cs.LG stat.ML

    Hierarchical Bias-Driven Stratification for Interpretable Causal Effect Estimation

    Authors: Lucile Ter-Minassian, Liran Szlak, Ehud Karavani, Chris Holmes, Yishai Shimoni

    Abstract: Interpretability and transparency are essential for incorporating causal effect models from observational data into policy decision-making. They can provide trust for the model in the absence of ground truth labels to evaluate the accuracy of such models. To date, attempts at transparent causal effect estimation consist of applying post hoc explanation methods to black-box models, which are not in… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  10. arXiv:2401.08671  [pdf, other

    cs.PF cs.LG

    DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

    Authors: Connor Holmes, Masahiro Tanaka, Michael Wyatt, Ammar Ahmad Awan, Jeff Rasley, Samyam Rajbhandari, Reza Yazdani Aminabadi, Heyang Qin, Arash Bakhtiari, Lev Kurilenko, Yuxiong He

    Abstract: The deployment and scaling of large language models (LLMs) have become critical as they permeate various applications, demanding high-throughput and low-latency serving systems. Existing frameworks struggle to balance these requirements, especially for workloads with long prompts. This paper introduces DeepSpeed-FastGen, a system that employs Dynamic SplitFuse, a novel prompt and generation compos… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  11. arXiv:2310.04610  [pdf, other

    cs.AI cs.LG

    DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

    Authors: Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri , et al. (67 additional authors not shown)

    Abstract: In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique… ▽ More

    Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  12. arXiv:2309.15793  [pdf, other

    stat.ME cs.LG stat.ML

    Targeting Relative Risk Heterogeneity with Causal Forests

    Authors: Vik Shirvaikar, Chris Holmes

    Abstract: Treatment effect heterogeneity (TEH), or variability in treatment effect for different subgroups within a population, is of significant interest in clinical trial analysis. Causal forests (Wager and Athey, 2018) is a highly popular method for this problem, but like many other methods for detecting TEH, its criterion for separating subgroups focuses on differences in absolute risk. This can dilute… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 10 pages, 4 figures

  13. arXiv:2309.05518  [pdf, other

    cs.RO

    STAR-loc: Dataset for STereo And Range-based localization

    Authors: Frederike Dümbgen, Mohammed A. Shalaby, Connor Holmes, Charles C. Cossette, James R. Forbes, Jerome Le Ny, Timothy D. Barfoot

    Abstract: This document contains a detailed description of the STAR-loc dataset. For a quick starting guide please refer to the associated Github repository (https://github.com/utiasASRL/starloc). The dataset consists of stereo camera data (rectified/raw images and inertial measurement unit measurements) and ultra-wideband (UWB) data (range measurements) collected on a sensor rig in a Vicon motion capture a… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 15 pages, 15 figures

  14. arXiv:2309.00810  [pdf, other

    cs.CV cs.AI

    RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

    Authors: Fengxiang Bie, Yibo Yang, Zhongzhu Zhou, Adam Ghanem, Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Golnari, David A. Clifton, Yuxiong He, Dacheng Tao, Shuaiwen Leon Song

    Abstract: Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions. Text-to-image generation using neural networks could be traced back to the emergence of Generative Adversial Network (GAN), followed by the autoregressive Transformer. Diffusion models are one prominent type of generative model used for the genera… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  15. arXiv:2308.12418  [pdf, other

    cs.RO

    Certifiably Optimal Rotation and Pose Estimation Based on the Cayley Map

    Authors: Timothy D Barfoot, Connor Holmes, Frederike Dümbgen

    Abstract: We present novel, convex relaxations for rotation and pose estimation problems that can a posteriori guarantee global optimality for practical measurement noise levels. Some such relaxations exist in the literature for specific problem setups that assume the matrix von Mises-Fisher distribution (a.k.a., matrix Langevin distribution or chordal distance)for isotropic rotational uncertainty. However,… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: 22 pages, 13 figures

  16. arXiv:2308.07275  [pdf, other

    cs.RO math.OC

    On Semidefinite Relaxations for Matrix-Weighted State-Estimation Problems in Robotics

    Authors: Connor Holmes, Frederike Dümbgen, Timothy D Barfoot

    Abstract: In recent years, there has been remarkable progress in the development of so-called certifiable perception methods, which leverage semidefinite, convex relaxations to find global optima of perception problems in robotics. However, many of these relaxations rely on simplifying assumptions that facilitate the problem formulation, such as an isotropic measurement noise distribution. In this paper, we… ▽ More

    Submitted 1 May, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

  17. arXiv:2308.05783  [pdf, other

    cs.RO

    Toward Globally Optimal State Estimation Using Automatically Tightened Semidefinite Relaxations

    Authors: Frederike Dümbgen, Connor Holmes, Ben Agro, Timothy D. Barfoot

    Abstract: In recent years, semidefinite relaxations of common optimization problems in robotics have attracted growing attention due to their ability to provide globally optimal solutions. In many cases, it was shown that specific handcrafted redundant constraints are required to obtain tight relaxations and thus global optimality. These constraints are formulation-dependent and typically identified through… ▽ More

    Submitted 15 April, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

    Comments: 20 pages, 22 figures

  18. arXiv:2308.01320  [pdf, other

    cs.LG cs.AI cs.CL

    DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

    Authors: Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He

    Abstract: ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: 14 pages, 7 figures

  19. arXiv:2307.05194  [pdf, other

    stat.ML cs.AI cs.CR cs.LG math.ST

    Differentially Private Statistical Inference through $β$-Divergence One Posterior Sampling

    Authors: Jack Jewson, Sahra Ghalebikesabi, Chris Holmes

    Abstract: Differential privacy guarantees allow the results of a statistical analysis involving sensitive data to be released without compromising the privacy of any individual taking part. Achieving such guarantees generally requires the injection of noise, either directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian post… ▽ More

    Submitted 27 October, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

  20. arXiv:2306.14672  [pdf, other

    stat.ML cs.LG

    PWSHAP: A Path-Wise Explanation Model for Targeted Variables

    Authors: Lucile Ter-Minassian, Oscar Clivio, Karla Diaz-Ordaz, Robin J. Evans, Chris Holmes

    Abstract: Predictive black-box models can exhibit high accuracy but their opaque nature hinders their uptake in safety-critical deployment environments. Explanation methods (XAI) can provide confidence for decision-making through increased transparency. However, existing XAI methods are not tailored towards models in sensitive domains where one predictor is of special interest, such as a treatment effect in… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Journal ref: International Conference on Machine Learning 2023

  21. arXiv:2306.10209  [pdf, other

    cs.DC cs.AI cs.LG cs.PF

    ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

    Authors: Guanhua Wang, Heyang Qin, Sam Ade Jacobs, Connor Holmes, Samyam Rajbhandari, Olatunji Ruwase, Feng Yan, Lei Yang, Yuxiong He

    Abstract: Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability. However, when training on low-bandwidth clusters, or at scale which forces batch size per GPU to be small, ZeRO's effective throughput is limited because of high communication volume from gathering weights in forward pass,… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: 12 pages

  22. arXiv:2305.19638  [pdf, other

    stat.ML cs.CV cs.LG eess.IV

    A Unified Framework for U-Net Design and Analysis

    Authors: Christopher Williams, Fabian Falck, George Deligiannidis, Chris Holmes, Arnaud Doucet, Saifuddin Syed

    Abstract: U-Nets are a go-to, state-of-the-art neural architecture across numerous tasks for continuous signals on a square such as images and Partial Differential Equations (PDE), however their design and architecture is understudied. In this paper, we provide a framework for designing and analysing general U-Net architectures. We present theoretical results which characterise the role of the encoder and d… ▽ More

    Submitted 10 January, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

  23. arXiv:2304.01429  [pdf, other

    stat.ML cs.LG

    Learning from data with structured missingness

    Authors: Robin Mitra, Sarah F. McGough, Tapabrata Chakraborti, Chris Holmes, Ryan Copping, Niels Hagenbuch, Stefanie Biedermann, Jack Noonan, Brieuc Lehmann, Aditi Shenvi, Xuan Vinh Doan, David Leslie, Ginestra Bianconi, Ruben Sanchez-Garcia, Alisha Davies, Maxine Mackintosh, Eleni-Rosalina Andrinopoulou, Anahid Basiri, Chris Harbron, Ben D. MacArthur

    Abstract: Missing data are an unavoidable complication in many machine learning tasks. When data are `missing at random' there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious, and seek to learn from ever-larger volumes of heterogeneous data, an increasingly encountered problem arises in which missing values exhibit an association or st… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

  24. arXiv:2301.08187  [pdf, other

    stat.ML cs.CV cs.LG eess.SP

    A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs

    Authors: Fabian Falck, Christopher Williams, Dominic Danks, George Deligiannidis, Christopher Yau, Chris Holmes, Arnaud Doucet, Matthew Willetts

    Abstract: U-Net architectures are ubiquitous in state-of-the-art deep learning, however their regularisation properties and relationship to wavelets are understudied. In this paper, we formulate a multi-resolution framework which identifies U-Nets as finite-dimensional truncations of models on an infinite-dimensional function space. We provide theoretical results which prove that average pooling corresponds… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

    Comments: NeurIPS 2022 (selected as oral)

  25. arXiv:2301.07210  [pdf, other

    stat.ME cs.CE cs.LG stat.AP

    Causal Falsification of Digital Twins

    Authors: Rob Cornish, Muhammad Faaiz Taufiq, Arnaud Doucet, Chris Holmes

    Abstract: Digital twins are virtual systems designed to predict how a real-world process will evolve in response to interventions. This modelling paradigm holds substantial promise in many applications, but rigorous procedures for assessing their accuracy are essential for safety-critical settings. We consider how to assess the accuracy of a digital twin using real-world data. We formulate this as causal in… ▽ More

    Submitted 2 November, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

  26. arXiv:2301.04525  [pdf, other

    eess.IV cs.CV

    Clustering disease trajectories in contrastive feature space for biomarker discovery in age-related macular degeneration

    Authors: Robbie Holland, Oliver Leingang, Christopher Holmes, Philipp Anders, Rebecca Kaye, Sophie Riedl, Johannes C. Paetzold, Ivan Ezhov, Hrvoje Bogunović, Ursula Schmidt-Erfurth, Lars Fritsche, Hendrik P. N. Scholl, Sobha Sivaprasad, Andrew J. Lotery, Daniel Rueckert, Martin J. Menten

    Abstract: Age-related macular degeneration (AMD) is the leading cause of blindness in the elderly. Current grading systems based on imaging biomarkers only coarsely group disease stages into broad categories and are unable to predict future disease progression. It is widely believed that this is due to their focus on a single point in time, disregarding the dynamic nature of the disease. In this work, we pr… ▽ More

    Submitted 20 March, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: Submitted to MICCAI2023

  27. Towards Open World NeRF-Based SLAM

    Authors: Daniil Lisus, Connor Holmes, Steven Waslander

    Abstract: Neural Radiance Fields (NeRFs) offer versatility and robustness in map representations for Simultaneous Localization and Mapping (SLAM) tasks. This paper extends NICE-SLAM, a recent state-of-the-art NeRF-based SLAM algorithm capable of producing high quality NeRF maps. However, depending on the hardware used, the required number of iterations to produce these maps often makes NICE-SLAM run at less… ▽ More

    Submitted 11 September, 2023; v1 submitted 8 January, 2023; originally announced January 2023.

    Comments: Presented at Conference on Robots and Vision (CRV) 2023. 8 pages, 2 figures, 2 tables

    Journal ref: 2023 20th Conference on Robots and Vision (CRV), Montreal, QC, Canada, 2023, pp. 37-44

  28. arXiv:2212.08571  [pdf, other

    cs.SD cs.LG eess.AS stat.AP

    Statistical Design and Analysis for Robust Machine Learning: A Case Study from COVID-19

    Authors: Davide Pigoli, Kieran Baker, Jobie Budd, Lorraine Butler, Harry Coppock, Sabrina Egglestone, Steven G. Gilmour, Chris Holmes, David Hurley, Radka Jersakova, Ivan Kiskin, Vasiliki Koutra, Jonathon Mellor, George Nicholson, Joe Packham, Selina Patel, Richard Payne, Stephen J. Roberts, Björn W. Schuller, Ana Tendero-Cañadas, Tracey Thornley, Alexander Titcomb

    Abstract: Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously ass… ▽ More

    Submitted 27 February, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

  29. arXiv:2212.08570  [pdf, other

    cs.SD cs.LG eess.AS

    Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers

    Authors: Harry Coppock, George Nicholson, Ivan Kiskin, Vasiliki Koutra, Kieran Baker, Jobie Budd, Richard Payne, Emma Karoune, David Hurley, Alexander Titcomb, Sabrina Egglestone, Ana Tendero Cañadas, Lorraine Butler, Radka Jersakova, Jonathon Mellor, Selina Patel, Tracey Thornley, Peter Diggle, Sylvia Richardson, Josef Packham, Björn W. Schuller, Davide Pigoli, Steven Gilmour, Stephen Roberts, Chris Holmes

    Abstract: Recent work has reported that AI classifiers trained on audio recordings can accurately predict severe acute respiratory syndrome coronavirus 2 (SARSCoV2) infection status. Here, we undertake a large scale study of audio-based deep learning classifiers, as part of the UK governments pandemic response. We collect and analyse a dataset of audio recordings from 67,842 individuals with linked metadata… ▽ More

    Submitted 2 March, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

  30. arXiv:2212.07738  [pdf

    cs.SD cs.LG eess.AS

    A large-scale and PCR-referenced vocal audio dataset for COVID-19

    Authors: Jobie Budd, Kieran Baker, Emma Karoune, Harry Coppock, Selina Patel, Ana Tendero Cañadas, Alexander Titcomb, Richard Payne, David Hurley, Sabrina Egglestone, Lorraine Butler, Jonathon Mellor, George Nicholson, Ivan Kiskin, Vasiliki Koutra, Radka Jersakova, Rachel A. McKendry, Peter Diggle, Sylvia Richardson, Björn W. Schuller, Steven Gilmour, Davide Pigoli, Stephen Roberts, Josef Packham, Tracey Thornley , et al. (1 additional authors not shown)

    Abstract: The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmi… ▽ More

    Submitted 3 November, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: 39 pages, 4 figures

  31. arXiv:2212.03597  [pdf, other

    cs.LG cs.AI

    DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

    Authors: Conglong Li, Zhewei Yao, Xiaoxia Wu, Minjia Zhang, Connor Holmes, Cheng Li, Yuxiong He

    Abstract: Recent advances on deep learning models come at the price of formidable training cost. The increasing model size is one of the root causes, but another less-emphasized fact is that data scale is actually increasing at a similar speed as model scale, and the training cost is proportional to both of them. Compared to the rapidly evolving model architecture, how to efficiently use the training data (… ▽ More

    Submitted 14 January, 2024; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: Published in AAAI 2024 Main Technical Track. Equal contribution by the first 3 authors. Code has been released as a part of https://github.com/microsoft/DeepSpeed. Part of this paper is from our previous arxiv report (arXiv:2211.11586)

  32. arXiv:2211.11586  [pdf, other

    cs.CL cs.LG

    Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

    Authors: Zhewei Yao, Xiaoxia Wu, Conglong Li, Connor Holmes, Minjia Zhang, Cheng Li, Yuxiong He

    Abstract: Large-scale transformer models have become the de-facto architectures for various machine learning applications, e.g., CV and NLP. However, those large models also introduce prohibitive training costs. To mitigate this issue, we propose a novel random and layerwise token dropping method (random-LTD), which skips the computation of a subset of the input tokens at all middle layers. Particularly, ra… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: 22 pages

  33. Safe and Smooth: Certified Continuous-Time Range-Only Localization

    Authors: Frederike Dümbgen, Connor Holmes, Timothy D. Barfoot

    Abstract: A common approach to localize a mobile robot is by measuring distances to points of known positions, called anchors. Locating a device from distance measurements is typically posed as a non-convex optimization problem, stemming from the nonlinearity of the measurement model. Non-convex optimization problems may yield suboptimal solutions when local iterative solvers such as Gauss-Newton are employ… ▽ More

    Submitted 29 September, 2023; v1 submitted 9 September, 2022; originally announced September 2022.

    Comments: 10 pages, 7 figures, accepted to IEEE Robotics and Automation Letters (this arXiv version contains supplementary appendix)

  34. arXiv:2206.15014  [pdf, ps, other

    cs.CL cs.AI

    Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding

    Authors: Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu

    Abstract: In recent years, large pre-trained Transformer networks have demonstrated dramatic improvements in many natural language understanding tasks. However, the huge size of these models brings significant challenges to their fine-tuning and online deployment due to latency and cost constraints. New hardware supporting both N:M semi-structured sparsity and low-precision integer computation is a promisin… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

  35. arXiv:2206.12961  [pdf, other

    cs.RO

    An Efficient Global Optimality Certificate for Landmark-Based SLAM

    Authors: Connor Holmes, Timothy D. Barfoot

    Abstract: Modern state estimation is often formulated as an optimization problem and solved using efficient local search methods. These methods at best guarantee convergence to local minima, but, in some cases, global optimality can also be certified. Although such global optimality certificates have been well established for 3D \textit{pose-graph optimization}, the details have yet to be worked out for the… ▽ More

    Submitted 23 November, 2022; v1 submitted 26 June, 2022; originally announced June 2022.

    Comments: 10 pages, 7 figures

  36. arXiv:2206.06462  [pdf, other

    stat.ML cs.LG stat.ME

    Quasi-Bayesian Nonparametric Density Estimation via Autoregressive Predictive Updates

    Authors: Sahra Ghalebikesabi, Chris Holmes, Edwin Fong, Brieuc Lehmann

    Abstract: Bayesian methods are a popular choice for statistical inference in small-data regimes due to the regularization effect induced by the prior. In the context of density estimation, the standard nonparametric Bayesian approach is to target the posterior predictive of the Dirichlet process mixture model. In general, direct estimation of the posterior predictive is intractable and so methods typically… ▽ More

    Submitted 18 February, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

  37. arXiv:2206.05082  [pdf, other

    cs.RO math.OC

    A Fine Line: Total Least-Squares Line Fitting as QCQP Optimization

    Authors: Timothy D Barfoot, Connor Holmes, Frederike Dumbgen

    Abstract: This note uses the Total Least-Squares (TLS) line-fitting problem as a canvas to explore some modern optimization tools. The contribution is meant to be tutorial in nature. The TLS problem has a lot of mathematical similarities to important problems in robotics and computer vision but is easier to visualize and understand. We demonstrate how to turn this problem into a Quadratically Constrained Qu… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: 11 pages, 5 figures

  38. arXiv:2203.00554  [pdf, other

    stat.ML cs.LG

    Neural Score Matching for High-Dimensional Causal Inference

    Authors: Oscar Clivio, Fabian Falck, Brieuc Lehmann, George Deligiannidis, Chris Holmes

    Abstract: Traditional methods for matching in causal inference are impractical for high-dimensional datasets. They suffer from the curse of dimensionality: exact matching and coarsened exact matching find exponentially fewer matches as the input dimension grows, and propensity score matching may match highly unrelated units together. To overcome this problem, we develop theoretical results which motivate th… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: To appear in AISTATS 2022

  39. arXiv:2202.00813  [pdf, other

    cs.LG cs.CV eess.IV q-bio.CB q-bio.QM

    A Graph Based Neural Network Approach to Immune Profiling of Multiplexed Tissue Samples

    Authors: Natalia Garcia Martin, Stefano Malacrino, Marta Wojciechowska, Leticia Campo, Helen Jones, David C. Wedge, Chris Holmes, Korsuk Sirinukunwattana, Heba Sailem, Clare Verrill, Jens Rittscher

    Abstract: Multiplexed immunofluorescence provides an unprecedented opportunity for studying specific cell-to-cell and cell microenvironment interactions. We employ graph neural networks to combine features obtained from tissue morphology with measurements of protein expression to profile the tumour microenvironment associated with different tumour stages. Our framework presents a new approach to analysing a… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

    Journal ref: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2022, pp. 3063-3067

  40. arXiv:2110.15766  [pdf, other

    cs.CL cs.AI

    NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

    Authors: Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu

    Abstract: Natural Language Processing (NLP) has recently achieved success by using huge pre-trained Transformer networks. However, these models often contain hundreds of millions or even billions of parameters, bringing challenges to online deployment due to latency constraints. Recently, hardware manufacturers have introduced dedicated hardware for NxM sparsity to provide the flexibility of unstructured pr… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

  41. arXiv:2108.10934  [pdf, other

    stat.ML cs.CR cs.LG

    Mitigating Statistical Bias within Differentially Private Synthetic Data

    Authors: Sahra Ghalebikesabi, Harrison Wilde, Jack Jewson, Arnaud Doucet, Sebastian Vollmer, Chris Holmes

    Abstract: Increasing interest in privacy-preserving machine learning has led to new and evolved approaches for generating private synthetic data from undisclosed real data. However, mechanisms of privacy preservation can significantly reduce the utility of synthetic data, which in turn impacts downstream tasks such as learning predictive models or inference. We propose several re-weighting strategies using… ▽ More

    Submitted 19 May, 2022; v1 submitted 24 August, 2021; originally announced August 2021.

  42. arXiv:2106.14648  [pdf, other

    cs.LG stat.CO stat.ME stat.ML

    On Locality of Local Explanation Models

    Authors: Sahra Ghalebikesabi, Lucile Ter-Minassian, Karla Diaz-Ordaz, Chris Holmes

    Abstract: Shapley values provide model agnostic feature attributions for model outcome at a particular instance by simulating feature absence under a global population distribution. The use of a global population can lead to potentially misleading results when local model behaviour is of interest. Hence we consider the formulation of neighbourhood reference distributions that improve the local interpretabil… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: Submitted to NeurIPS 2021

  43. arXiv:2106.05241  [pdf, other

    stat.ML cs.CV cs.LG stat.ME

    Multi-Facet Clustering Variational Autoencoders

    Authors: Fabian Falck, Haoting Zhang, Matthew Willetts, George Nicholson, Christopher Yau, Chris Holmes

    Abstract: Work in deep clustering focuses on finding a single partition of data. However, high-dimensional data, such as images, typically feature multiple interesting characteristics one could cluster over. For example, images of objects against a background could be clustered over the shape of the object and separately by the colour of the background. In this paper, we introduce Multi-Facet Clustering Var… ▽ More

    Submitted 29 October, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

    Comments: Advances in Neural Information Processing Systems 34 (NeurIPS 2021)

  44. arXiv:2103.03532  [pdf, other

    stat.ML cs.LG

    Deep Generative Pattern-Set Mixture Models for Nonignorable Missingness

    Authors: Sahra Ghalebikesabi, Rob Cornish, Luke J. Kelly, Chris Holmes

    Abstract: We propose a variational autoencoder architecture to model both ignorable and nonignorable missing data using pattern-set mixtures as proposed by Little (1993). Our model explicitly learns to cluster the missing data into missingness pattern sets based on the observed data and missingness masks. Underpinning our approach is the assumption that the data distribution under missingness is probabilist… ▽ More

    Submitted 5 March, 2021; originally announced March 2021.

    Comments: International Conference on Artificial Intelligence and Statistics (AISTATS)

    Journal ref: International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

  45. arXiv:2102.07006  [pdf, other

    stat.ML cs.LG

    Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections

    Authors: Alexander Camuto, Xiaoyu Wang, Lingjiong Zhu, Chris Holmes, Mert Gürbüzbalaban, Umut Şimşekli

    Abstract: Gaussian noise injections (GNIs) are a family of simple and widely-used regularisation methods for training neural networks, where one injects additive or multiplicative Gaussian noise to the network activations at every iteration of the optimisation algorithm, which is typically chosen as stochastic gradient descent (SGD). In this paper we focus on the so-called `implicit effect' of GNIs, which i… ▽ More

    Submitted 10 June, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: Main paper of 12 pages, followed by appendix

  46. arXiv:2011.08299  [pdf, other

    cs.LG stat.AP stat.ME stat.ML

    Foundations of Bayesian Learning from Synthetic Data

    Authors: Harrison Wilde, Jack Jewson, Sebastian Vollmer, Chris Holmes

    Abstract: There is significant growth and interest in the use of synthetic data as an enabler for machine learning in environments where the release of real data is restricted due to privacy or availability constraints. Despite a large number of methods for synthetic data generation, there are comparatively few results on the statistical properties of models learnt on synthetic data, and fewer still for sit… ▽ More

    Submitted 24 November, 2020; v1 submitted 16 November, 2020; originally announced November 2020.

    Comments: 43 pages (10 main text, 33 supplement), 32 figures (4 main text, 28 supplement)

  47. arXiv:2007.08222  [pdf, other

    cs.SE cs.PL

    Inheritance software metrics on smart contracts

    Authors: Ashish Rajendra Sai, Conor Holmes, Jim Buckley, Andrew Le Gear

    Abstract: Blockchain systems have gained substantial traction recently, partly due to the potential of decentralized immutable mediation of economic activities. Ethereum is a prominent example that has the provision for executing stateful computing scripts known as Smart Contracts. These smart contracts resemble traditional programs, but with immutability being the core differentiating factor. Given their i… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

    Comments: Accepted by International Conference on Program Comprehension (ICPC 2020)

  48. arXiv:2007.07368  [pdf, other

    stat.ML cs.LG

    Explicit Regularisation in Gaussian Noise Injections

    Authors: Alexander Camuto, Matthew Willetts, Umut Şimşekli, Stephen Roberts, Chris Holmes

    Abstract: We study the regularisation induced in neural networks by Gaussian noise injections (GNIs). Though such injections have been extensively studied when applied to data, there have been few studies on understanding the regularising effect they induce when applied to network activations. Here we derive the explicit regulariser of GNIs, obtained by marginalising out the injected noise, and show that it… ▽ More

    Submitted 19 January, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

    Journal ref: Advances in Neural Information Processing Systems 34 (2020)

  49. arXiv:2007.07365  [pdf, other

    stat.ML cs.LG

    Towards a Theoretical Understanding of the Robustness of Variational Autoencoders

    Authors: Alexander Camuto, Matthew Willetts, Stephen Roberts, Chris Holmes, Tom Rainforth

    Abstract: We make inroads into understanding the robustness of Variational Autoencoders (VAEs) to adversarial attacks and other input perturbations. While previous work has developed algorithmic approaches to attacking and defending VAEs, there remains a lack of formalization for what it means for a VAE to be robust. To address this, we develop a novel criterion for robustness in probabilistic models: $r$-r… ▽ More

    Submitted 29 January, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

    Comments: 8 pages

    Journal ref: AISTATS 2021

  50. arXiv:2007.07307  [pdf, other

    stat.ML cs.CV cs.LG

    Relaxed-Responsibility Hierarchical Discrete VAEs

    Authors: Matthew Willetts, Xenia Miscouridou, Stephen Roberts, Chris Holmes

    Abstract: Successfully training Variational Autoencoders (VAEs) with a hierarchy of discrete latent variables remains an area of active research. Vector-Quantised VAEs are a powerful approach to discrete VAEs, but naive hierarchical extensions can be unstable when training. Leveraging insights from classical methods of inference we introduce \textit{Relaxed-Responsibility Vector-Quantisation}, a novel way… ▽ More

    Submitted 4 February, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

    Comments: 10 Pages