-
What Makes and Breaks Safety Fine-tuning? Mechanistic Study
Authors:
Samyak Jain,
Ekdeep Singh Lubana,
Kemal Oksuz,
Tom Joy,
Philip H. S. Torr,
Amartya Sanyal,
Puneet K. Dokania
Abstract:
Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment. To better understand the underlying factors that make models safe via safety fine-tuning, we design a synthetic data generation framework that captures salient aspects of an unsafe input by modeling the interaction between the task the model is asked to perform (e.g., ``design'') versus th…
▽ More
Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment. To better understand the underlying factors that make models safe via safety fine-tuning, we design a synthetic data generation framework that captures salient aspects of an unsafe input by modeling the interaction between the task the model is asked to perform (e.g., ``design'') versus the specific concepts the task is asked to be performed upon (e.g., a ``cycle'' vs. a ``bomb''). Using this, we investigate three well-known safety fine-tuning methods -- supervised safety fine-tuning, direct preference optimization, and unlearning -- and provide significant evidence demonstrating that these methods minimally transform MLP weights to specifically align unsafe inputs into its weights' null space. This yields a clustering of inputs based on whether the model deems them safe or not. Correspondingly, when an adversarial input (e.g., a jailbreak) is provided, its activations are closer to safer samples, leading to the model processing such an input as if it were safe. We validate our findings, wherever possible, on real-world models -- specifically, Llama-2 7B and Llama-3 8B.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
StyleSplat: 3D Object Style Transfer with Gaussian Splatting
Authors:
Sahil Jain,
Avik Kuthiala,
Prabhdeep Singh Sethi,
Prakanshul Saxena
Abstract:
Recent advancements in radiance fields have opened new avenues for creating high-quality 3D assets and scenes. Style transfer can enhance these 3D assets with diverse artistic styles, transforming creative expression. However, existing techniques are often slow or unable to localize style transfer to specific objects. We introduce StyleSplat, a lightweight method for stylizing 3D objects in scenes…
▽ More
Recent advancements in radiance fields have opened new avenues for creating high-quality 3D assets and scenes. Style transfer can enhance these 3D assets with diverse artistic styles, transforming creative expression. However, existing techniques are often slow or unable to localize style transfer to specific objects. We introduce StyleSplat, a lightweight method for stylizing 3D objects in scenes represented by 3D Gaussians from reference style images. Our approach first learns a photorealistic representation of the scene using 3D Gaussian splatting while jointly segmenting individual 3D objects. We then use a nearest-neighbor feature matching loss to finetune the Gaussians of the selected objects, aligning their spherical harmonic coefficients with the style image to ensure consistency and visual appeal. StyleSplat allows for quick, customizable style transfer and localized stylization of multiple objects within a scene, each with a different style. We demonstrate its effectiveness across various 3D scenes and styles, showcasing enhanced control and customization in 3D creation.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Fair Federated Data Clustering through Personalization: Bridging the Gap between Diverse Data Distributions
Authors:
Shivam Gupta,
Tarushi,
Tsering Wangzes,
Shweta Jain
Abstract:
The rapid growth of data from edge devices has catalyzed the performance of machine learning algorithms. However, the data generated resides at client devices thus there are majorly two challenge faced by traditional machine learning paradigms - centralization of data for training and secondly for most the generated data the class labels are missing and there is very poor incentives to clients to…
▽ More
The rapid growth of data from edge devices has catalyzed the performance of machine learning algorithms. However, the data generated resides at client devices thus there are majorly two challenge faced by traditional machine learning paradigms - centralization of data for training and secondly for most the generated data the class labels are missing and there is very poor incentives to clients to manually label their data owing to high cost and lack of expertise. To overcome these issues, there have been initial attempts to handle unlabelled data in a privacy preserving distributed manner using unsupervised federated data clustering. The goal is partition the data available on clients into $k$ partitions (called clusters) without actual exchange of data. Most of the existing algorithms are highly dependent on data distribution patterns across clients or are computationally expensive. Furthermore, due to presence of skewed nature of data across clients in most of practical scenarios existing models might result in clients suffering high clustering cost making them reluctant to participate in federated process. To this, we are first to introduce the idea of personalization in federated clustering. The goal is achieve balance between achieving lower clustering cost and at same time achieving uniform cost across clients. We propose p-FClus that addresses these goal in a single round of communication between server and clients. We validate the efficacy of p-FClus against variety of federated datasets showcasing it's data independence nature, applicability to any finite $\ell$-norm, while simultaneously achieving lower cost and variance.
△ Less
Submitted 12 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Nonlinear Model Reduction to Random Spectral Submanifolds in Random Vibrations
Authors:
Zhenwei Xu,
Roshan S. Kaundinya,
Shobhit Jain,
George Haller
Abstract:
Dynamical systems in engineering and physics are often subject to irregular excitations that are best modeled as random. Monte Carlo simulations are routinely performed on such random models to obtain statistics on their long-term response. Such simulations, however, are prohibitively expensive and time consuming for high-dimensional nonlinear systems. Here we propose to decrease this numerical bu…
▽ More
Dynamical systems in engineering and physics are often subject to irregular excitations that are best modeled as random. Monte Carlo simulations are routinely performed on such random models to obtain statistics on their long-term response. Such simulations, however, are prohibitively expensive and time consuming for high-dimensional nonlinear systems. Here we propose to decrease this numerical burden significantly by reducing the full system to very low-dimensional, attracting, random invariant manifolds in its phase space and performing the Monte Carlo simulations on that reduced dynamical system. The random spectral submanifolds (SSMs) we construct for this purpose generalize the concept of SSMs from deterministic systems under uniformly bounded random forcing. We illustrate the accuracy and speed of random SSM reduction by computing the SSM-reduced power spectral density of the randomly forced mechanical systems that range from simple oscillator chains to finite-element models of beams and plates.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Probing the connection between IceCube neutrinos and MOJAVE AGN
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
N. M. Amin,
K. Andeen,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
L. Ausborm,
S. N. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
S. Bash,
V. Basu,
R. Bay,
J. J. Beatty,
J. Becker Tjus,
J. Beise,
C. Bellenghi
, et al. (399 additional authors not shown)
Abstract:
Active Galactic Nuclei (AGN) are prime candidate sources of the high-energy, astrophysical neutrinos detected by IceCube. This is demonstrated by the real-time multi-messenger detection of the blazar TXS 0506+056 and the recent evidence of neutrino emission from NGC 1068 from a separate time-averaged study. However, the production mechanism of the astrophysical neutrinos in AGN is not well establi…
▽ More
Active Galactic Nuclei (AGN) are prime candidate sources of the high-energy, astrophysical neutrinos detected by IceCube. This is demonstrated by the real-time multi-messenger detection of the blazar TXS 0506+056 and the recent evidence of neutrino emission from NGC 1068 from a separate time-averaged study. However, the production mechanism of the astrophysical neutrinos in AGN is not well established which can be resolved via correlation studies with photon observations. For neutrinos produced due to photohadronic interactions in AGN, in addition to a correlation of neutrinos with high-energy photons, there would also be a correlation of neutrinos with photons emitted at radio wavelengths. In this work, we perform an in-depth stacking study of the correlation between 15 GHz radio observations of AGN reported in the MOJAVE XV catalog, and ten years of neutrino data from IceCube. We also use a time-dependent approach which improves the statistical power of the stacking analysis. No significant correlation was found for both analyses and upper limits are reported. When compared to the IceCube diffuse flux, at 100 TeV and for a spectral index of 2.5, the upper limits derived are $\sim3\%$ and $\sim9\%$ for the time-averaged and time-dependent case, respectively.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Search for a light sterile neutrino with 7.5 years of IceCube DeepCore data
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
N. M. Amin,
K. Andeen,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
L. Ausborm,
S. N. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
S. Bash,
V. Basu,
R. Bay,
J. J. Beatty,
J. Becker Tjus,
J. Beise,
C. Bellenghi
, et al. (399 additional authors not shown)
Abstract:
We present a search for an eV-scale sterile neutrino using 7.5 years of data from the IceCube DeepCore detector. The analysis uses a sample of 21,914 events with energies between 5 and 150 GeV to search for sterile neutrinos through atmospheric muon neutrino disappearance. Improvements in event selection and treatment of systematic uncertainties provide greater statistical power compared to previo…
▽ More
We present a search for an eV-scale sterile neutrino using 7.5 years of data from the IceCube DeepCore detector. The analysis uses a sample of 21,914 events with energies between 5 and 150 GeV to search for sterile neutrinos through atmospheric muon neutrino disappearance. Improvements in event selection and treatment of systematic uncertainties provide greater statistical power compared to previous DeepCore sterile neutrino searches. Our results are compatible with the absence of mixing between active and sterile neutrino states, and we place constraints on the mixing matrix elements $|U_{μ4}|^2 < 0.0534$ and $|U_{τ4}|^2 < 0.0574$ at 90% CL under the assumption that $Δm^2_{41}\geq 1\;\mathrm{eV^2}$. These null results add to the growing tension between anomalous appearance results and constraints from disappearance searches in the 3+1 sterile neutrino landscape.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
LiveBench: A Challenging, Contamination-Free LLM Benchmark
Authors:
Colin White,
Samuel Dooley,
Manley Roberts,
Arka Pal,
Ben Feuer,
Siddhartha Jain,
Ravid Shwartz-Ziv,
Neel Jain,
Khalid Saifullah,
Siddartha Naidu,
Chinmay Hegde,
Yann LeCun,
Tom Goldstein,
Willie Neiswanger,
Micah Goldblum
Abstract:
Test set contamination, wherein test data from a benchmark ends up in a newer model's training set, is a well-documented obstacle for fair LLM evaluation and can quickly render benchmarks obsolete. To mitigate this, many recent benchmarks crowdsource new prompts and evaluations from human or LLM judges; however, these can introduce significant biases, and break down when scoring hard questions. In…
▽ More
Test set contamination, wherein test data from a benchmark ends up in a newer model's training set, is a well-documented obstacle for fair LLM evaluation and can quickly render benchmarks obsolete. To mitigate this, many recent benchmarks crowdsource new prompts and evaluations from human or LLM judges; however, these can introduce significant biases, and break down when scoring hard questions. In this work, we introduce a new benchmark for LLMs designed to be immune to both test set contamination and the pitfalls of LLM judging and human crowdsourcing. We release LiveBench, the first benchmark that (1) contains frequently-updated questions from recent information sources, (2) scores answers automatically according to objective ground-truth values, and (3) contains a wide variety of challenging tasks, spanning math, coding, reasoning, language, instruction following, and data analysis. To achieve this, LiveBench contains questions that are based on recently-released math competitions, arXiv papers, news articles, and datasets, and it contains harder, contamination-free versions of tasks from previous benchmarks such as Big-Bench Hard, AMPS, and IFEval. We evaluate many prominent closed-source models, as well as dozens of open-source models ranging from 0.5B to 110B in size. LiveBench is difficult, with top models achieving below 65% accuracy. We release all questions, code, and model answers. Questions will be added and updated on a monthly basis, and we will release new tasks and harder versions of tasks over time so that LiveBench can distinguish between the capabilities of LLMs as they improve in the future. We welcome community engagement and collaboration for expanding the benchmark tasks and models.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Inherent Challenges of Post-Hoc Membership Inference for Large Language Models
Authors:
Matthieu Meeus,
Shubham Jain,
Marek Rei,
Yves-Alexandre de Montjoye
Abstract:
Large Language Models (LLMs) are often trained on vast amounts of undisclosed data, motivating the development of post-hoc Membership Inference Attacks (MIAs) to gain insight into their training data composition. However, in this paper, we identify inherent challenges in post-hoc MIA evaluation due to potential distribution shifts between collected member and non-member datasets. Using a simple ba…
▽ More
Large Language Models (LLMs) are often trained on vast amounts of undisclosed data, motivating the development of post-hoc Membership Inference Attacks (MIAs) to gain insight into their training data composition. However, in this paper, we identify inherent challenges in post-hoc MIA evaluation due to potential distribution shifts between collected member and non-member datasets. Using a simple bag-of-words classifier, we demonstrate that datasets used in recent post-hoc MIAs suffer from significant distribution shifts, in some cases achieving near-perfect distinction between members and non-members. This implies that previously reported high MIA performance may be largely attributable to these shifts rather than model memorization. We confirm that randomized, controlled setups eliminate such shifts and thus enable the development and fair evaluation of new MIAs. However, we note that such randomized setups are rarely available for the latest LLMs, making post-hoc data collection still required to infer membership for real-world LLMs. As a potential solution, we propose a Regression Discontinuity Design (RDD) approach for post-hoc data collection, which substantially mitigates distribution shifts. Evaluating various MIA methods on this RDD setup yields performance barely above random guessing, in stark contrast to previously reported results. Overall, our findings highlight the challenges in accurately measuring LLM memorization and the need for careful experimental design in (post-hoc) membership inference tasks.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Constraints on local processes
Authors:
Abhijit Gadde,
Shraiyance Jain,
Harshal Kulkarni
Abstract:
If we want to transform the quantum of state of a system to another using local processes, what is the probability of success? It turns out that this probability can be bounded by quantifying entanglement within both the states. In this paper, we construct a family of multipartite entanglement measures that are monotonic under local operations and classical communication on average. The measures a…
▽ More
If we want to transform the quantum of state of a system to another using local processes, what is the probability of success? It turns out that this probability can be bounded by quantifying entanglement within both the states. In this paper, we construct a family of multipartite entanglement measures that are monotonic under local operations and classical communication on average. The measures are constructed out of local unitary invariant polynomials of the state and its conjugate, and hence are easy to compute for pure states. Using these measures we bound the success probability of transforming a given state into another state using local quantum operations and classical communication.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Time Non-locality in Dark Matter and LSS
Authors:
Arhum Ansari,
Arka Banerjee,
Sachin Jain,
Shaunak Padhyegurjar
Abstract:
We explore the intriguing phenomenon of time non-locality in the evolution of dark matter and Large Scale Structure (LSS). Recently in\,\cite{Donath:2023sav}, it was shown that time non-locality emerges in bias tracer fluctuations, which are $SO(3)$ scalars in real space, at fifth order in the perturbation expansion in dark matter overdensity. We demonstrate that by breaking the symmetry down to…
▽ More
We explore the intriguing phenomenon of time non-locality in the evolution of dark matter and Large Scale Structure (LSS). Recently in\,\cite{Donath:2023sav}, it was shown that time non-locality emerges in bias tracer fluctuations, which are $SO(3)$ scalars in real space, at fifth order in the perturbation expansion in dark matter overdensity. We demonstrate that by breaking the symmetry down to $SO(2)$, which is the case whenever line-of-sight effects become important, such as for flux fluctuations in the Lyman $α$ forest, the temporal non-locality appears at the third order in expansion. Additionally, within the framework of EFTofLSS, we demonstrate that time non-locality manifests in the effective stress tensor of dark matter, which is a second rank tensor under $SO(3)$ transformations, again at the third order in dark matter overdensity. Furthermore, we highlight the effectiveness of the standard $Π$ basis\,\cite{Mirbabayi:2014zca} in handling time non-local operators.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection
Authors:
Saachi Jain,
Kimia Hamidieh,
Kristian Georgiev,
Andrew Ilyas,
Marzyeh Ghassemi,
Aleksander Madry
Abstract:
Machine learning models can fail on subgroups that are underrepresented during training. While techniques such as dataset balancing can improve performance on underperforming groups, they require access to training group annotations and can end up removing large portions of the dataset. In this paper, we introduce Data Debiasing with Datamodels (D3M), a debiasing approach which isolates and remove…
▽ More
Machine learning models can fail on subgroups that are underrepresented during training. While techniques such as dataset balancing can improve performance on underperforming groups, they require access to training group annotations and can end up removing large portions of the dataset. In this paper, we introduce Data Debiasing with Datamodels (D3M), a debiasing approach which isolates and removes specific training examples that drive the model's failures on minority groups. Our approach enables us to efficiently train debiased classifiers while removing only a small number of examples, and does not require training group annotations or additional hyperparameter tuning.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Emergent Dynamics in Heterogeneous Life-Like Cellular Automata
Authors:
Aarati Shrestha,
Felix Reimers,
Sanyam Jain,
Paolo Baldini,
Michele Braccini,
Andrea Roli,
Stefano Nichele
Abstract:
The Game of Life (GoL), one well known 2D cellular automaton, does not typically ensure interesting long-term phenotypic dynamics. Therefore, while being Turing complete, GoL cannot be said to be open-ended. In this work, we extend GoL with the opportunity for local mutations, thus enabling a heterogeneous life-like cellular automaton guided by an evolutionary inner loop. Additionally, we introduc…
▽ More
The Game of Life (GoL), one well known 2D cellular automaton, does not typically ensure interesting long-term phenotypic dynamics. Therefore, while being Turing complete, GoL cannot be said to be open-ended. In this work, we extend GoL with the opportunity for local mutations, thus enabling a heterogeneous life-like cellular automaton guided by an evolutionary inner loop. Additionally, we introduce the concept of cell ageing to ensure that cell aliveness (activated by inheritance with variation, and controlled by ageing) and actual cell computation (governed by life-like rules on local neighborhoods) are kept conceptually separated. We conduct an experimental campaign to identify suitable parameters that produce long-term phenotypic dynamics and favor genotypic innovations.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Using graph neural networks to reconstruct charged pion showers in the CMS High Granularity Calorimeter
Authors:
M. Aamir,
B. Acar,
G. Adamov,
T. Adams,
C. Adloff,
S. Afanasiev,
C. Agrawal,
C. Agrawal,
A. Ahmad,
H. A. Ahmed,
S. Akbar,
N. Akchurin,
B. Akgul,
B. Akgun,
R. O. Akpinar,
E. Aktas,
A. AlKadhim,
V. Alexakhin,
J. Alimena,
J. Alison,
A. Alpana,
W. Alshehri,
P. Alvarez Dominguez,
M. Alyari,
C. Amendola
, et al. (550 additional authors not shown)
Abstract:
A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadr…
▽ More
A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadronic section. The shower reconstruction method is based on graph neural networks and it makes use of a dynamic reduction network architecture. It is shown that the algorithm is able to capture and mitigate the main effects that normally hinder the reconstruction of hadronic showers using classical reconstruction methods, by compensating for fluctuations in the multiplicity, energy, and spatial distributions of the shower's constituents. The performance of the algorithm is evaluated using test beam data collected in 2018 prototype of the CMS HGCAL accompanied by a section of the CALICE AHCAL prototype. The capability of the method to mitigate the impact of energy leakage from the calorimeter is also demonstrated.
△ Less
Submitted 30 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
The Reasonable Effectiveness of Speaker Embeddings for Violence Detection
Authors:
Sarthak Jain,
Orchid Chetia Phukan,
Arun Balaji Buduru,
Rajesh Sharma
Abstract:
In this paper, we focus on audio violence detection (AVD). AVD is necessary for several reasons, especially in the context of maintaining safety, preventing harm, and ensuring security in various environments. This calls for accurate AVD systems. Like many related applications in audio processing, the most common approach for improving the performance, would be by leveraging self-supervised (SSL)…
▽ More
In this paper, we focus on audio violence detection (AVD). AVD is necessary for several reasons, especially in the context of maintaining safety, preventing harm, and ensuring security in various environments. This calls for accurate AVD systems. Like many related applications in audio processing, the most common approach for improving the performance, would be by leveraging self-supervised (SSL) pre-trained models (PTMs). However, as these SSL models are very large models with million of parameters and this can hinder real-world deployment especially in compute-constraint environment. To resolve this, we propose the usage of speaker recognition models which are much smaller compared to the SSL models. Experimentation with speaker recognition model embeddings with SVM & Random Forest as classifiers, we show that speaker recognition model embeddings perform the best in comparison to state-of-the-art (SOTA) SSL models and achieve SOTA results.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
PERSONA: An Application for Emotion Recognition, Gender Recognition and Age Estimation
Authors:
Devyani Koshal,
Orchid Chetia Phukan,
Sarthak Jain,
Arun Balaji Buduru,
Rajesh Sharma
Abstract:
Emotion Recognition (ER), Gender Recognition (GR), and Age Estimation (AE) constitute paralinguistic tasks that rely not on the spoken content but primarily on speech characteristics such as pitch and tone. While previous research has made significant strides in developing models for each task individually, there has been comparatively less emphasis on concurrently learning these tasks, despite th…
▽ More
Emotion Recognition (ER), Gender Recognition (GR), and Age Estimation (AE) constitute paralinguistic tasks that rely not on the spoken content but primarily on speech characteristics such as pitch and tone. While previous research has made significant strides in developing models for each task individually, there has been comparatively less emphasis on concurrently learning these tasks, despite their inherent interconnectedness. As such in this demonstration, we present PERSONA, an application for predicting ER, GR, and AE with a single model in the backend. One notable point is we show that representations from speaker recognition pre-trained model (PTM) is better suited for such a multi-task learning format than the state-of-the-art (SOTA) self-supervised (SSL) PTM by carrying out a comparative study. Our methodology obviates the need for deploying separate models for each task and can potentially conserve resources and time during the training and deployment phases.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection
Authors:
Orchid Chetia Phukan,
Sarthak Jain,
Shubham Singh,
Muskaan Singh,
Arun Balaji Buduru,
Rajesh Sharma
Abstract:
In this work, we focus on the detection of depression through speech analysis. Previous research has widely explored features extracted from pre-trained models (PTMs) primarily trained for paralinguistic tasks. Although these features have led to sufficient advances in speech-based depression detection, their performance declines in real-world settings. To address this, in this paper, we introduce…
▽ More
In this work, we focus on the detection of depression through speech analysis. Previous research has widely explored features extracted from pre-trained models (PTMs) primarily trained for paralinguistic tasks. Although these features have led to sufficient advances in speech-based depression detection, their performance declines in real-world settings. To address this, in this paper, we introduce ComFeAT, an application that employs a CNN model trained on a combination of features extracted from PTMs, a.k.a. neural features and spectral features to enhance depression detection. Spectral features are robust to domain variations, but, they are not as good as neural features in performance, suprisingly, combining them shows complementary behavior and improves over both neural and spectral features individually. The proposed method also improves over previous state-of-the-art (SOTA) works on E-DAIC benchmark.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Search for neutrino emission from hard X-ray AGN with IceCube
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
N. M. Amin,
K. Andeen,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
L. Ausborm,
S. N. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
S. Bash,
V. Basu,
R. Bay,
J. J. Beatty,
J. Becker Tjus,
J. Beise,
C. Bellenghi
, et al. (401 additional authors not shown)
Abstract:
Active Galactic Nuclei (AGN) are promising candidate sources of high-energy astrophysical neutrinos since they provide environments rich in matter and photon targets where cosmic ray interactions may lead to the production of gamma rays and neutrinos. We searched for high-energy neutrino emission from AGN using the $\textit{Swift}$-BAT Spectroscopic Survey (BASS) catalog of hard X-ray sources and…
▽ More
Active Galactic Nuclei (AGN) are promising candidate sources of high-energy astrophysical neutrinos since they provide environments rich in matter and photon targets where cosmic ray interactions may lead to the production of gamma rays and neutrinos. We searched for high-energy neutrino emission from AGN using the $\textit{Swift}$-BAT Spectroscopic Survey (BASS) catalog of hard X-ray sources and 12 years of IceCube muon track data. First, upon performing a stacked search, no significant emission was found. Second, we searched for neutrinos from a list of 43 candidate sources and found an excess from the direction of two sources, Seyfert galaxies NGC 1068 and NGC 4151. We observed NGC 1068 at flux $φ_{ν_μ+\barν_μ}$ = $4.02_{-1.52}^{+1.58} \times 10^{-11}$ TeV$^{-1}$ cm$^{-2}$ s$^{-1}$ normalized at 1 TeV, with power-law spectral index, $γ$ = 3.10$^{+0.26}_{-0.22}$, consistent with previous IceCube results. The observation of a neutrino excess from the direction of NGC 4151 is at a post-trial significance of 2.9$σ$. If interpreted as an astrophysical signal, the excess observed from NGC 4151 corresponds to a flux $φ_{ν_μ+\barν_μ}$ = $1.51_{-0.81}^{+0.99} \times 10^{-11}$ TeV$^{-1}$ cm$^{-2}$ s$^{-1}$ normalized at 1 TeV and $γ$ = 2.83$^{+0.35}_{-0.28}$.
△ Less
Submitted 12 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies
Authors:
Junlin Wang,
Siddhartha Jain,
Dejiao Zhang,
Baishakhi Ray,
Varun Kumar,
Ben Athiwaratkun
Abstract:
A diverse array of reasoning strategies has been proposed to elicit the capabilities of large language models. However, in this paper, we point out that traditional evaluations which focus solely on performance metrics miss a key factor: the increased effectiveness due to additional compute. By overlooking this aspect, a skewed view of strategy efficiency is often presented. This paper introduces…
▽ More
A diverse array of reasoning strategies has been proposed to elicit the capabilities of large language models. However, in this paper, we point out that traditional evaluations which focus solely on performance metrics miss a key factor: the increased effectiveness due to additional compute. By overlooking this aspect, a skewed view of strategy efficiency is often presented. This paper introduces a framework that incorporates the compute budget into the evaluation, providing a more informative comparison that takes into account both performance metrics and computational cost. In this budget-aware perspective, we find that complex reasoning strategies often don't surpass simpler baselines purely due to algorithmic ingenuity, but rather due to the larger computational resources allocated. When we provide a simple baseline like chain-of-thought self-consistency with comparable compute resources, it frequently outperforms reasoning strategies proposed in the literature. In this scale-aware perspective, we find that unlike self-consistency, certain strategies such as multi-agent debate or Reflexion can become worse if more compute budget is utilized.
△ Less
Submitted 14 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Autonomous Robotic Assembly: From Part Singulation to Precise Assembly
Authors:
Kei Ota,
Devesh K. Jha,
Siddarth Jain,
Bill Yerazunis,
Radu Corcodel,
Yash Shukla,
Antonia Bronars,
Diego Romeres
Abstract:
Imagine a robot that can assemble a functional product from the individual parts presented in any configuration to the robot. Designing such a robotic system is a complex problem which presents several open challenges. To bypass these challenges, the current generation of assembly systems is built with a lot of system integration effort to provide the structure and precision necessary for assembly…
▽ More
Imagine a robot that can assemble a functional product from the individual parts presented in any configuration to the robot. Designing such a robotic system is a complex problem which presents several open challenges. To bypass these challenges, the current generation of assembly systems is built with a lot of system integration effort to provide the structure and precision necessary for assembly. These systems are mostly responsible for part singulation, part kitting, and part detection, which is accomplished by intelligent system design. In this paper, we present autonomous assembly of a gear box with minimum requirements on structure. The assembly parts are randomly placed in a two-dimensional work environment for the robot. The proposed system makes use of several different manipulation skills such as sliding for grasping, in-hand manipulation, and insertion to assemble the gear box. All these tasks are run in a closed-loop fashion using vision, tactile, and Force-Torque (F/T) sensors. We perform extensive hardware experiments to show the robustness of the proposed methods as well as the overall system. See supplementary video at https://www.youtube.com/watch?v=cZ9M1DQ23OI.
△ Less
Submitted 11 June, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
Exploration of mass splitting and muon/tau mixing parameters for an eV-scale sterile neutrino with IceCube
Authors:
R. Abbasi,
M. Ackermann,
J. Adams,
S. K. Agarwalla,
J. A. Aguilar,
M. Ahlers,
J. M. Alameddine,
N. M. Amin,
K. Andeen,
C. Argüelles,
Y. Ashida,
S. Athanasiadou,
L. Ausborm,
S. N. Axani,
X. Bai,
A. Balagopal V.,
M. Baricevic,
S. W. Barwick,
S. Bash,
V. Basu,
R. Bay,
J. J. Beatty,
J. Becker Tjus,
J. Beise,
C. Bellenghi
, et al. (400 additional authors not shown)
Abstract:
We present the first three-parameter fit to a 3+1 sterile neutrino model using 7.634 years of data from the IceCube Neutrino Observatory on $ν_μ+\overlineν_μ$ charged-current interactions in the energy range 500-9976 GeV. Our analysis is sensitive to the mass-squared splitting between the heaviest and lightest mass state ($Δm_{41}^2$), the mixing matrix element connecting muon flavor to the fourth…
▽ More
We present the first three-parameter fit to a 3+1 sterile neutrino model using 7.634 years of data from the IceCube Neutrino Observatory on $ν_μ+\overlineν_μ$ charged-current interactions in the energy range 500-9976 GeV. Our analysis is sensitive to the mass-squared splitting between the heaviest and lightest mass state ($Δm_{41}^2$), the mixing matrix element connecting muon flavor to the fourth mass state ($|U_{\mu4}|^2$), and the element connecting tau flavor to the fourth mass state ($|U_{\tau4}|^2$). Predicted propagation effects in matter enhance the signature through a resonance as atmospheric neutrinos from the Northern Hemisphere traverse the Earth to the IceCube detector at the South Pole. The result is consistent with the no-sterile neutrino hypothesis with a probability of 4.3 %. Profiling the likelihood of each parameter yields the 90 % confidence levels: $ 2.4\,\mathrm{eV}^{2} < Δm_{41}^2 <9.6\,\mathrm{eV}^{2} $ , $0.0081 < |U_{\mu4}|^2 < 0.10$ , and $|U_{\tau4}|^2< 0.035$, which narrows the allowed parameter-space for $|U_{\tau4}|^2$. However, the primary result of this analysis is the first map of the 3+1 parameter space exploring the interdependence of $Δm_{41}^2$, $|U_{\mu4}|^2$, and $|U_{\tau4}|^2$.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Latent Intrinsics Emerge from Training to Relight
Authors:
Xiao Zhang,
William Gao,
Seemandhar Jain,
Michael Maire,
David. A. Forsyth,
Anand Bhattad
Abstract:
Image relighting is the task of showing what a scene from a source image would look like if illuminated differently. Inverse graphics schemes recover an explicit representation of geometry and a set of chosen intrinsics, then relight with some form of renderer. However error control for inverse graphics is difficult, and inverse graphics methods can represent only the effects of the chosen intrins…
▽ More
Image relighting is the task of showing what a scene from a source image would look like if illuminated differently. Inverse graphics schemes recover an explicit representation of geometry and a set of chosen intrinsics, then relight with some form of renderer. However error control for inverse graphics is difficult, and inverse graphics methods can represent only the effects of the chosen intrinsics. This paper describes a relighting method that is entirely data-driven, where intrinsics and lighting are each represented as latent variables. Our approach produces SOTA relightings of real scenes, as measured by standard metrics. We show that albedo can be recovered from our latent intrinsics without using any example albedos, and that the albedos recovered are competitive with SOTA methods.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Improved Convex Decomposition with Ensembling and Boolean Primitives
Authors:
Vaibhav Vavilala,
Florian Kluger,
Seemandhar Jain,
Bodo Rosenhahn,
David Forsyth
Abstract:
Describing a scene in terms of primitives -- geometrically simple shapes that offer a parsimonious but accurate abstraction of structure -- is an established vision problem. This is a good model of a difficult fitting problem: different scenes require different numbers of primitives and primitives interact strongly, but any proposed solution can be evaluated at inference time. The state of the art…
▽ More
Describing a scene in terms of primitives -- geometrically simple shapes that offer a parsimonious but accurate abstraction of structure -- is an established vision problem. This is a good model of a difficult fitting problem: different scenes require different numbers of primitives and primitives interact strongly, but any proposed solution can be evaluated at inference time. The state of the art method involves a learned regression procedure to predict a start point consisting of a fixed number of primitives, followed by a descent method to refine the geometry and remove redundant primitives. Methods are evaluated by accuracy in depth and normal prediction and in scene segmentation. This paper shows that very significant improvements in accuracy can be obtained by (a) incorporating a small number of negative primitives and (b) ensembling over a number of different regression procedures. Ensembling is by refining each predicted start point, then choosing the best by fitting loss. Extensive experiments on a standard dataset confirm that negative primitives are useful in a large fraction of images, and that our refine-then-choose strategy outperforms choose-then-refine, confirming that the fitting problem is very difficult.
△ Less
Submitted 9 June, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Towards Fairness in Provably Communication-Efficient Federated Recommender Systems
Authors:
Kirandeep Kaur,
Sujit Gujar,
Shweta Jain
Abstract:
To reduce the communication overhead caused by parallel training of multiple clients, various federated learning (FL) techniques use random client sampling. Nonetheless, ensuring the efficacy of random sampling and determining the optimal number of clients to sample in federated recommender systems (FRSs) remains challenging due to the isolated nature of each user as a separate client. This challe…
▽ More
To reduce the communication overhead caused by parallel training of multiple clients, various federated learning (FL) techniques use random client sampling. Nonetheless, ensuring the efficacy of random sampling and determining the optimal number of clients to sample in federated recommender systems (FRSs) remains challenging due to the isolated nature of each user as a separate client. This challenge is exacerbated in models where public and private features can be separated, and FL allows communication of only public features (item gradients). In this study, we establish sample complexity bounds that dictate the ideal number of clients required for improved communication efficiency and retained accuracy in such models. In line with our theoretical findings, we empirically demonstrate that RS-FairFRS reduces communication cost (~47%). Second, we demonstrate the presence of class imbalance among clients that raises a substantial equity concern for FRSs. Unlike centralized machine learning, clients in FRS can not share raw data, including sensitive attributes. For this, we introduce RS-FairFRS, first fairness under unawareness FRS built upon random sampling based FRS. While random sampling improves communication efficiency, we propose a novel two-phase dual-fair update technique to achieve fairness without revealing protected attributes of active clients participating in training. Our results on real-world datasets and different sensitive features illustrate a significant reduction in demographic bias (~approx40\%), offering a promising path to achieving fairness and communication efficiency in FRSs without compromising the overall accuracy of FRS.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
As an AI Language Model, "Yes I Would Recommend Calling the Police'': Norm Inconsistency in LLM Decision-Making
Authors:
Shomik Jain,
D Calacci,
Ashia Wilson
Abstract:
We investigate the phenomenon of norm inconsistency: where LLMs apply different norms in similar situations. Specifically, we focus on the high-risk application of deciding whether to call the police in Amazon Ring home surveillance videos. We evaluate the decisions of three state-of-the-art LLMs -- GPT-4, Gemini 1.0, and Claude 3 Sonnet -- in relation to the activities portrayed in the videos, th…
▽ More
We investigate the phenomenon of norm inconsistency: where LLMs apply different norms in similar situations. Specifically, we focus on the high-risk application of deciding whether to call the police in Amazon Ring home surveillance videos. We evaluate the decisions of three state-of-the-art LLMs -- GPT-4, Gemini 1.0, and Claude 3 Sonnet -- in relation to the activities portrayed in the videos, the subjects' skin-tone and gender, and the characteristics of the neighborhoods where the videos were recorded. Our analysis reveals significant norm inconsistencies: (1) a discordance between the recommendation to call the police and the actual presence of criminal activity, and (2) biases influenced by the racial demographics of the neighborhoods. These results highlight the arbitrariness of model decisions in the surveillance context and the limitations of current bias detection and mitigation strategies in normative decision-making.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Push and Pull: A Framework for Measuring Attentional Agency
Authors:
Zachary Wojtowicz,
Shrey Jain,
Nicholas Vincent
Abstract:
We propose a framework for measuring attentional agency - the ability to allocate one's attention according to personal desires, goals, and intentions - on digital platforms. Platforms extend people's limited powers of attention by extrapolating their preferences to large collections of previously unconsidered informational objects. However, platforms typically also allow people to influence one a…
▽ More
We propose a framework for measuring attentional agency - the ability to allocate one's attention according to personal desires, goals, and intentions - on digital platforms. Platforms extend people's limited powers of attention by extrapolating their preferences to large collections of previously unconsidered informational objects. However, platforms typically also allow people to influence one another's attention. We introduce a formal framework for measuring how much a given platform empowers people to both pull information into their own attentional field and push information into the attentional fields of others. We also use these definitions to shed light on the implications of generative foundation models, which enable users to bypass the implicit "attentional bargain" that underlies embedded advertising and other methods for capturing economic value from informational goods. We conclude with a set of policy strategies that can be used to understand and reshape the distribution of attentional agency online.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts
Authors:
Raghu Prabhakar,
Ram Sivaramakrishnan,
Darshan Gandhi,
Yun Du,
Mingran Wang,
Xiangyu Song,
Kejie Zhang,
Tianren Gao,
Angela Wang,
Karen Li,
Yongning Sheng,
Joshua Brot,
Denis Sokolov,
Apurv Vivek,
Calvin Leung,
Arjun Sabnis,
Jiayu Bai,
Tuowen Zhao,
Mark Gottscho,
David Jackson,
Mark Luttrell,
Manish K. Shah,
Edison Chen,
Kaizhao Liang,
Swayambhoo Jain
, et al. (5 additional authors not shown)
Abstract:
Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Expert…
▽ More
Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Experts (CoE) is an alternative modular approach that lowers the cost and complexity of training and serving. However, this approach presents two key challenges when using conventional hardware: (1) without fused operations, smaller models have lower operational intensity, which makes high utilization more challenging to achieve; and (2) hosting a large number of models can be either prohibitively expensive or slow when dynamically switching between them.
In this paper, we describe how combining CoE, streaming dataflow, and a three-tier memory system scales the AI memory wall. We describe Samba-CoE, a CoE system with 150 experts and a trillion total parameters. We deploy Samba-CoE on the SambaNova SN40L Reconfigurable Dataflow Unit (RDU) - a commercial dataflow accelerator architecture that has been co-designed for enterprise inference and training applications. The chip introduces a new three-tier memory system with on-chip distributed SRAM, on-package HBM, and off-package DDR DRAM. A dedicated inter-RDU network enables scaling up and out over multiple sockets. We demonstrate speedups ranging from 2x to 13x on various benchmarks running on eight RDU sockets compared with an unfused baseline. We show that for CoE inference deployments, the 8-socket RDU Node reduces machine footprint by up to 19x, speeds up model switching time by 15x to 31x, and achieves an overall speedup of 3.7x over a DGX H100 and 6.6x over a DGX A100.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
NurtureNet: A Multi-task Video-based Approach for Newborn Anthropometry
Authors:
Yash Khandelwal,
Mayur Arvind,
Sriram Kumar,
Ashish Gupta,
Sachin Kumar Danisetty,
Piyush Bagad,
Anish Madan,
Mayank Lunayach,
Aditya Annavajjala,
Abhishek Maiti,
Sansiddh Jain,
Aman Dalmia,
Namrata Deka,
Jerome White,
Jigar Doshi,
Angjoo Kanazawa,
Rahul Panicker,
Alpan Raval,
Srinivas Rana,
Makarand Tapaswi
Abstract:
Malnutrition among newborns is a top public health concern in developing countries. Identification and subsequent growth monitoring are key to successful interventions. However, this is challenging in rural communities where health systems tend to be inaccessible and under-equipped, with poor adherence to protocol. Our goal is to equip health workers and public health systems with a solution for c…
▽ More
Malnutrition among newborns is a top public health concern in developing countries. Identification and subsequent growth monitoring are key to successful interventions. However, this is challenging in rural communities where health systems tend to be inaccessible and under-equipped, with poor adherence to protocol. Our goal is to equip health workers and public health systems with a solution for contactless newborn anthropometry in the community.
We propose NurtureNet, a multi-task model that fuses visual information (a video taken with a low-cost smartphone) with tabular inputs to regress multiple anthropometry estimates including weight, length, head circumference, and chest circumference. We show that visual proxy tasks of segmentation and keypoint prediction further improve performance. We establish the efficacy of the model through several experiments and achieve a relative error of 3.9% and mean absolute error of 114.3 g for weight estimation. Model compression to 15 MB also allows offline deployment to low-cost smartphones.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Collecting Consistently High Quality Object Tracks with Minimal Human Involvement by Using Self-Supervised Learning to Detect Tracker Errors
Authors:
Samreen Anjum,
Suyog Jain,
Danna Gurari
Abstract:
We propose a hybrid framework for consistently producing high-quality object tracks by combining an automated object tracker with little human input. The key idea is to tailor a module for each dataset to intelligently decide when an object tracker is failing and so humans should be brought in to re-localize an object for continued tracking. Our approach leverages self-supervised learning on unlab…
▽ More
We propose a hybrid framework for consistently producing high-quality object tracks by combining an automated object tracker with little human input. The key idea is to tailor a module for each dataset to intelligently decide when an object tracker is failing and so humans should be brought in to re-localize an object for continued tracking. Our approach leverages self-supervised learning on unlabeled videos to learn a tailored representation for a target object that is then used to actively monitor its tracked region and decide when the tracker fails. Since labeled data is not needed, our approach can be applied to novel object categories. Experiments on three datasets demonstrate our method outperforms existing approaches, especially for small, fast moving, or occluded objects.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
A District level Flood Severity Index for India
Authors:
Manabendra Saharia,
Sharad K Jain,
Ved Prakash,
Harshul Malik,
O P Sreejith
Abstract:
India is one of the worst affected countries in the world in terms of fatalities and economic damage due to natural disasters, particularly floods. For planning flood mitigating and relief measures, granular historical information on a pan-India basis is required, which has been missing. Through recent efforts, a few national scale datasets have been created, but they lack the requisite informatio…
▽ More
India is one of the worst affected countries in the world in terms of fatalities and economic damage due to natural disasters, particularly floods. For planning flood mitigating and relief measures, granular historical information on a pan-India basis is required, which has been missing. Through recent efforts, a few national scale datasets have been created, but they lack the requisite information on fatalities and damages, which has limited the ability to develop a flood severity index. This paper describes the development of the India Flood Inventory with Impacts (IFI-Impacts) database, which contains death and damage statistics, and combines population and historically flooded area information sourced from a national hydrologic-hydrodynamic modeling system. We also propose a novel District Flood Severity Index (DFSI), which accounts for the historical severity of floods in India based on the number of people they have affected and the spread and duration of such floods. Districts being the administrative units of the government, this novel index fulfills a major need and gap in currently available flood management tools. The dataset as well as the index is expected to significantly advance disaster preparedness towards floods in the country. DFSI can be improved further by collecting and incorporating additional variables, e.g., economic losses and by improving the reliability/robustness of the data of other variables. Based on DFSI, actions need to be addressed to mitigate flood damages, beginning with the districts with the high DFSI values.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Authors:
Gerald Shen,
Zhilin Wang,
Olivier Delalleau,
Jiaqi Zeng,
Yi Dong,
Daniel Egert,
Shengyang Sun,
Jimmy Zhang,
Sahil Jain,
Ali Taghibakhshi,
Markel Sanz Ausin,
Ashwath Aithal,
Oleksii Kuchaiev
Abstract:
Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment can be challenging, especially for the largest and most competent LLMs which often contain tens or hundreds of billions of parameters. We create NeMo-Aligner, a toolkit for model alignment that can efficiently scale to using h…
▽ More
Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment can be challenging, especially for the largest and most competent LLMs which often contain tens or hundreds of billions of parameters. We create NeMo-Aligner, a toolkit for model alignment that can efficiently scale to using hundreds of GPUs for training. NeMo-Aligner comes with highly optimized and scalable implementations for major paradigms of model alignment such as: Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), SteerLM, and Self-Play Fine-Tuning (SPIN). Additionally, our toolkit supports running most of the alignment techniques in a Parameter Efficient Fine-Tuning (PEFT) setting. NeMo-Aligner is designed for extensibility, allowing support for other alignment techniques with minimal effort. It is open-sourced with Apache 2.0 License and we invite community contributions at https://github.com/NVIDIA/NeMo-Aligner
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Hidden sectors of Chern-Simons Matter theories and Exact Holography
Authors:
Sachin Jain,
Dhruva K. S,
Evgeny Skvortsov
Abstract:
Chiral higher-spin gravity is a higher-spin extension of both self-dual Yang-Mills and self-dual gravity and is a unique local higher-spin gravity in four dimensions. Its existence implies that there are two closed subsectors in Chern-Simons matter theories. We make first steps in identifying these (anti-)chiral subsectors directly on the CFT side, which should result in a holographically dual pai…
▽ More
Chiral higher-spin gravity is a higher-spin extension of both self-dual Yang-Mills and self-dual gravity and is a unique local higher-spin gravity in four dimensions. Its existence implies that there are two closed subsectors in Chern-Simons matter theories. We make first steps in identifying these (anti-)chiral subsectors directly on the CFT side, which should result in a holographically dual pair where both sides are nontrivial, complete, yet exactly soluble. We also discuss closely related theories: self-dual Yang-Mills (SDYM) and self-dual gravity (SDGR) in the holographic context.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Spectroscopic Investigation of Nebular Gas (SING): Instrument Design, Assembly and Calibration
Authors:
Bharat Chandra P,
Binukumar G. Nair,
Shubham Jankiram Ghatul,
Shubhangi Jain,
S. Sriram,
Mahesh Babu S.,
Rekhesh Mohan,
Margarita Safonova,
Jayant Murthy,
Mikhail Sachkov
Abstract:
The Spectroscopic Investigation of Nebular Gas (SING) is a near-ultraviolet (NUV) low-resolution spectrograph payload designed to operate in the NUV range, 1400 $\unicode{x212B}$ -- 2700 $\unicode{x212B}$, from a stable space platform. SING telescope has a primary aperture of 298 mm, feeding the light to the long-slit UV spectrograph. SING has a field of view (FOV) of 1$^{\circ}$, achieving a spat…
▽ More
The Spectroscopic Investigation of Nebular Gas (SING) is a near-ultraviolet (NUV) low-resolution spectrograph payload designed to operate in the NUV range, 1400 $\unicode{x212B}$ -- 2700 $\unicode{x212B}$, from a stable space platform. SING telescope has a primary aperture of 298 mm, feeding the light to the long-slit UV spectrograph. SING has a field of view (FOV) of 1$^{\circ}$, achieving a spatial resolution of 1.33 arc minute and spectral resolution of 3.7 $\unicode{x212B}$ ($R\sim600$) at the central wavelength. SING employs a micro-channel plate (MCP) with a CMOS readout-based photon-counting detector. The instrument is designed to observe diffuse sources such as nebulae, supernova remnants, and the interstellar medium (ISM) to understand their chemistry. SING was selected by the United Nations Office for Outer Space Affairs to be hosted on the Chinese Space Station. The instrument will undergo qualification tests as per the launch requirements. In this paper, we describe the hardware design, optomechanical assembly, and calibration of the instrument.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Scarce Resource Allocations That Rely On Machine Learning Should Be Randomized
Authors:
Shomik Jain,
Kathleen Creel,
Ashia Wilson
Abstract:
Contrary to traditional deterministic notions of algorithmic fairness, this paper argues that fairly allocating scarce resources using machine learning often requires randomness. We address why, when, and how to randomize by proposing stochastic procedures that more adequately account for all of the claims that individuals have to allocations of social goods or opportunities.
Contrary to traditional deterministic notions of algorithmic fairness, this paper argues that fairly allocating scarce resources using machine learning often requires randomness. We address why, when, and how to randomize by proposing stochastic procedures that more adequately account for all of the claims that individuals have to allocations of social goods or opportunities.
△ Less
Submitted 19 June, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
A new approach to construct minimal linear codes over $\mathbb{F}_{3}$
Authors:
Wajid M. Shaikh,
Rupali S. Jain,
B. Surendranath Reddy,
Bhagyashri S. Patil,
Sahar M. A. Maqbol
Abstract:
In this article, we present two new approaches to construct minimal linear codes of dimension $n+1$ over $\mathbb{F}_{3}$ using characteristic and ternary functions. We also obtain the weight distributions of these constructed minimal linear codes. We further show that a specific class of these codes violates Ashikhmin-Barg condition.
In this article, we present two new approaches to construct minimal linear codes of dimension $n+1$ over $\mathbb{F}_{3}$ using characteristic and ternary functions. We also obtain the weight distributions of these constructed minimal linear codes. We further show that a specific class of these codes violates Ashikhmin-Barg condition.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
A Lightweight Measure of Classification Difficulty from Application Dataset Characteristics
Authors:
Bryan Bo Cao,
Abhinav Sharma,
Lawrence O'Gorman,
Michael Coss,
Shubham Jain
Abstract:
Despite accuracy and computation benchmarks being widely available to help choose among neural network models, these are usually trained on datasets with many classes, and do not give a precise idea of performance for applications of few (< 10) classes. The conventional procedure to predict performance is to train and test repeatedly on the different models and dataset variations of interest. Howe…
▽ More
Despite accuracy and computation benchmarks being widely available to help choose among neural network models, these are usually trained on datasets with many classes, and do not give a precise idea of performance for applications of few (< 10) classes. The conventional procedure to predict performance is to train and test repeatedly on the different models and dataset variations of interest. However, this is computationally expensive. We propose an efficient classification difficulty measure that is calculated from the number of classes and intra- and inter-class similarity metrics of the dataset. After a single stage of training and testing per model family, relative performance for different datasets and models of the same family can be predicted by comparing difficulty measures - without further training and testing. We show how this measure can help a practitioner select a computationally efficient model for a small dataset 6 to 29x faster than through repeated training and testing. We give an example of use of the measure for an industrial application in which options are identified to select a model 42% smaller than the baseline YOLOv5-nano model, and if class merging from 3 to 2 classes meets requirements, 85% smaller.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Memory Sharing with CXL: Hardware and Software Design Approaches
Authors:
Sunita Jain,
Nagaradhesh Yeleswarapu,
Hasan Al Maruf,
Rita Gupta
Abstract:
Compute Express Link (CXL) is a rapidly emerging coherent interconnect standard that provides opportunities for memory pooling and sharing. Memory sharing is a well-established software feature that improves memory utilization by avoiding unnecessary data movement. In this paper, we discuss multiple approaches to enable memory sharing with different generations of CXL protocol (i.e., CXL 2.0 and C…
▽ More
Compute Express Link (CXL) is a rapidly emerging coherent interconnect standard that provides opportunities for memory pooling and sharing. Memory sharing is a well-established software feature that improves memory utilization by avoiding unnecessary data movement. In this paper, we discuss multiple approaches to enable memory sharing with different generations of CXL protocol (i.e., CXL 2.0 and CXL 3.0) considering the challenges with each of the architectures from the device hardware and software viewpoint.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
NLP at UC Santa Cruz at SemEval-2024 Task 5: Legal Answer Validation using Few-Shot Multi-Choice QA
Authors:
Anish Pahilajani,
Samyak Rajesh Jain,
Devasha Trivedi
Abstract:
This paper presents our submission to the SemEval 2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure. We present two approaches to solving the task of legal answer validation, given an introduction to the case, a question and an answer candidate. Firstly, we fine-tuned pre-trained BERT-based models and found that models trained on domain knowledge perform better. Secondly, we perfor…
▽ More
This paper presents our submission to the SemEval 2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure. We present two approaches to solving the task of legal answer validation, given an introduction to the case, a question and an answer candidate. Firstly, we fine-tuned pre-trained BERT-based models and found that models trained on domain knowledge perform better. Secondly, we performed few-shot prompting on GPT models and found that reformulating the answer validation task to be a multiple-choice QA task remarkably improves the performance of the model. Our best submission is a BERT-based model that achieved the 7th place out of 20.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Piecewise Contractions
Authors:
Sakshi Jain,
Carlangelo Liverani
Abstract:
We study piecewise injective, but not necessarily globally injective, contracting maps on a compact subset of \(\bR^d\). We prove that generically the attractor and the set of discontinuities are disjoint, and hence the attractor consists of periodic orbits. In addition, we prove that piecewise injective contractions are generically topologically stable.
We study piecewise injective, but not necessarily globally injective, contracting maps on a compact subset of \(\bR^d\). We prove that generically the attractor and the set of discontinuities are disjoint, and hence the attractor consists of periodic orbits. In addition, we prove that piecewise injective contractions are generically topologically stable.
△ Less
Submitted 15 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Video Interpolation with Diffusion Models
Authors:
Siddhant Jain,
Daniel Watson,
Eric Tabellion,
Aleksander Hołyński,
Ben Poole,
Janne Kontkanen
Abstract:
We present VIDIM, a generative model for video interpolation, which creates short videos given a start and end frame. In order to achieve high fidelity and generate motions unseen in the input data, VIDIM uses cascaded diffusion models to first generate the target video at low resolution, and then generate the high-resolution video conditioned on the low-resolution generated video. We compare VIDI…
▽ More
We present VIDIM, a generative model for video interpolation, which creates short videos given a start and end frame. In order to achieve high fidelity and generate motions unseen in the input data, VIDIM uses cascaded diffusion models to first generate the target video at low resolution, and then generate the high-resolution video conditioned on the low-resolution generated video. We compare VIDIM to previous state-of-the-art methods on video interpolation, and demonstrate how such works fail in most settings where the underlying motion is complex, nonlinear, or ambiguous while VIDIM can easily handle such cases. We additionally demonstrate how classifier-free guidance on the start and end frame and conditioning the super-resolution model on the original high-resolution frames without additional parameters unlocks high-fidelity results. VIDIM is fast to sample from as it jointly denoises all the frames to be generated, requires less than a billion parameters per diffusion model to produce compelling results, and still enjoys scalability and improved quality at larger parameter counts.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Incubating Advances in Integrated Photonics with Emerging Sensing and Computational Capabilities
Authors:
Sourabh Jain,
May Hlaing,
Kang Chieh Fan,
Jason Midkiff,
Shupeng Ning,
Chenghao Feng,
Po Yu Hsiao,
Patrick Camp,
Ray Chen
Abstract:
As photonic technologies continue to grow in multidimensional aspects, integrated photonics holds a unique position and continuously presents enormous possibilities to research communities. Applications span across data centers, environmental monitoring, medical diagnosis, and highly compact communication components, with further possibilities growing endlessly. Here, we provide a review of state…
▽ More
As photonic technologies continue to grow in multidimensional aspects, integrated photonics holds a unique position and continuously presents enormous possibilities to research communities. Applications span across data centers, environmental monitoring, medical diagnosis, and highly compact communication components, with further possibilities growing endlessly. Here, we provide a review of state of the art integrated photonic sensors operating in near and mid infrared wavelength regions on various material platforms. Among different materials, architectures, and technologies leading the way for on chip sensors, we discuss optical sensing principles commonly applied to biochemical and gas sensing. Our focus is particularly on passive and active optical waveguides, including dispersion engineered metamaterial based structures an essential approach for enhancing the interaction between light and analytes in chip scale sensors. We harness a diverse array of cutting edge sensing technologies, heralding a revolutionary on chip sensing paradigm. Our arsenal includes refractive index based sensing, plasmonic, and spectroscopy, forging an unparalleled foundation for innovation and precision. Furthermore, we include a brief discussion of recent trends and computational concepts incorporating Artificial Intelligence & Machine Learning (AI/ML) and deep learning approaches over the past few years to improve the qualitative and quantitative analysis of sensor measurements.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Fisher Information Approach for Masking the Sensing Plan: Applications in Multifunction Radars
Authors:
Shashwat Jain,
Vikram Krishnamurthy,
Muralidhar Rangaswamy,
Bosung Kang,
Sandeep Gogineni
Abstract:
How to design a Markov Decision Process (MDP) based radar controller that makes small sacrifices in performance to mask its sensing plan from an adversary? The radar controller purposefully minimizes the Fisher information of its emissions so that an adversary cannot identify the controller's model parameters accurately. Unlike classical open loop statistical inference, where the Fisher informatio…
▽ More
How to design a Markov Decision Process (MDP) based radar controller that makes small sacrifices in performance to mask its sensing plan from an adversary? The radar controller purposefully minimizes the Fisher information of its emissions so that an adversary cannot identify the controller's model parameters accurately. Unlike classical open loop statistical inference, where the Fisher information serves as a lower bound for the achievable covariance, this paper employs the Fisher information as a design constraint for a closed loop radar controller to mask its sensing plan. We analytically derive a closed-form expression for the determinant of the Fisher Information Matrix (FIM) pertaining to the parameters of the MDP-based controller. Subsequently, we constrain the MDP with respect to the determinant of the FIM. Numerical results show that the introduction of minor perturbations to the MDP's transition kernel and the total operation cost can reduce the Fisher Information of the emissions. Consequently, this reduction amplifies the variability in policy and transition kernel estimation errors, thwarting the adversary's accuracy in estimating the controller's sensing plan.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
RakutenAI-7B: Extending Large Language Models for Japanese
Authors:
Rakuten Group,
Aaron Levine,
Connie Huang,
Chenguang Wang,
Eduardo Batista,
Ewa Szymanska,
Hongyi Ding,
Hou Wei Chou,
Jean-François Pessiot,
Johanes Effendi,
Justin Chiu,
Kai Torben Ohlhus,
Karan Chopra,
Keiji Shinzato,
Koji Murakami,
Lee Xiong,
Lei Chen,
Maki Kubota,
Maksim Tkachenko,
Miroku Lee,
Naoki Takahashi,
Prathyusha Jwalapuram,
Ryutaro Tatsushima,
Saurabh Jain,
Sunil Kumar Yadav
, et al. (5 additional authors not shown)
Abstract:
We introduce RakutenAI-7B, a suite of Japanese-oriented large language models that achieve the best performance on the Japanese LM Harness benchmarks among the open 7B models. Along with the foundation model, we release instruction- and chat-tuned models, RakutenAI-7B-instruct and RakutenAI-7B-chat respectively, under the Apache 2.0 license.
We introduce RakutenAI-7B, a suite of Japanese-oriented large language models that achieve the best performance on the Japanese LM Harness benchmarks among the open 7B models. Along with the foundation model, we release instruction- and chat-tuned models, RakutenAI-7B-instruct and RakutenAI-7B-chat respectively, under the Apache 2.0 license.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Photonic-Electronic Integrated Circuits for High-Performance Computing and AI Accelerators
Authors:
Shupeng Ning,
Hanqing Zhu,
Chenghao Feng,
Jiaqi Gu,
Zhixing Jiang,
Zhoufeng Ying,
Jason Midkiff,
Sourabh Jain,
May H. Hlaing,
David Z. Pan,
Ray T. Chen
Abstract:
In recent decades, the demand for computational power has surged, particularly with the rapid expansion of artificial intelligence (AI). As we navigate the post-Moore's law era, the limitations of traditional electrical digital computing, including process bottlenecks and power consumption issues, are propelling the search for alternative computing paradigms. Among various emerging technologies, i…
▽ More
In recent decades, the demand for computational power has surged, particularly with the rapid expansion of artificial intelligence (AI). As we navigate the post-Moore's law era, the limitations of traditional electrical digital computing, including process bottlenecks and power consumption issues, are propelling the search for alternative computing paradigms. Among various emerging technologies, integrated photonics stands out as a promising solution for next-generation high-performance computing, thanks to the inherent advantages of light, such as low latency, high bandwidth, and unique multiplexing techniques. Furthermore, the progress in photonic integrated circuits (PICs), which are equipped with abundant photoelectronic components, positions photonic-electronic integrated circuits as a viable solution for high-performance computing and hardware AI accelerators. In this review, we survey recent advancements in both PIC-based digital and analog computing for AI, exploring the principal benefits and obstacles of implementation. Additionally, we propose a comprehensive analysis of photonic AI from the perspectives of hardware implementation, accelerator architecture, and software-hardware co-design. In the end, acknowledging the existing challenges, we underscore potential strategies for overcoming these issues and offer insights into the future drivers for optical computing.
△ Less
Submitted 11 July, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
HyperGALE: ASD Classification via Hypergraph Gated Attention with Learnable Hyperedges
Authors:
Mehul Arora,
Chirag Shantilal Jain,
Lalith Bharadwaj Baru,
Kamalaker Dadi,
Bapi Raju Surampudi
Abstract:
Autism Spectrum Disorder (ASD) is a neurodevelopmental condition characterized by varied social cognitive challenges and repetitive behavioral patterns. Identifying reliable brain imaging-based biomarkers for ASD has been a persistent challenge due to the spectrum's diverse symptomatology. Existing baselines in the field have made significant strides in this direction, yet there remains room for i…
▽ More
Autism Spectrum Disorder (ASD) is a neurodevelopmental condition characterized by varied social cognitive challenges and repetitive behavioral patterns. Identifying reliable brain imaging-based biomarkers for ASD has been a persistent challenge due to the spectrum's diverse symptomatology. Existing baselines in the field have made significant strides in this direction, yet there remains room for improvement in both performance and interpretability. We propose \emph{HyperGALE}, which builds upon the hypergraph by incorporating learned hyperedges and gated attention mechanisms. This approach has led to substantial improvements in the model's ability to interpret complex brain graph data, offering deeper insights into ASD biomarker characterization. Evaluated on the extensive ABIDE II dataset, \emph{HyperGALE} not only improves interpretability but also demonstrates statistically significant enhancements in key performance metrics compared to both previous baselines and the foundational hypergraph model. The advancement \emph{HyperGALE} brings to ASD research highlights the potential of sophisticated graph-based techniques in neurodevelopmental studies. The source code and implementation instructions are available at GitHub:https://github.com/mehular0ra/HyperGALE.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Construction of Minimal Binary Linear Codes of dimension $n+3$
Authors:
Wajid M. Shaikh,
Rupali S. Jain,
B. Surendranath Reddy,
Bhagyashri S. Patil
Abstract:
In this paper, we will give the generic construction of a binary linear code of dimension $n+3$ and derive the necessary and sufficient conditions for the constructed code to be minimal. Using generic construction, a new family of minimal binary linear code will be constructed from a special class of Boolean functions violating the Ashikhmin-Barg condition. We also obtain the weight distribution o…
▽ More
In this paper, we will give the generic construction of a binary linear code of dimension $n+3$ and derive the necessary and sufficient conditions for the constructed code to be minimal. Using generic construction, a new family of minimal binary linear code will be constructed from a special class of Boolean functions violating the Ashikhmin-Barg condition. We also obtain the weight distribution of the constructed minimal binary linear code.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Sparsity-Constrained Community-Based Group Testing
Authors:
Sarthak Jain,
Martina Cardone,
Soheil Mohajer
Abstract:
In this work, we consider the sparsity-constrained community-based group testing problem, where the population follows a community structure. In particular, the community consists of $F$ families, each with $M$ members. A number $k_f$ out of the $F$ families are infected, and a family is said to be infected if $k_m$ out of its $M$ members are infected. Furthermore, the sparsity constraint allows a…
▽ More
In this work, we consider the sparsity-constrained community-based group testing problem, where the population follows a community structure. In particular, the community consists of $F$ families, each with $M$ members. A number $k_f$ out of the $F$ families are infected, and a family is said to be infected if $k_m$ out of its $M$ members are infected. Furthermore, the sparsity constraint allows at most $ρ_T$ individuals to be grouped in each test. For this sparsity-constrained community model, we propose a probabilistic group testing algorithm that can identify the infected population with a vanishing probability of error and we provide an upper-bound on the number of tests. When $k_m = Θ(M)$ and $M \gg \log(FM)$, our bound outperforms the existing sparsity-constrained group testing results trivially applied to the community model. If the sparsity constraint is relaxed, our achievable bound reduces to existing bounds for community-based group testing. Moreover, our scheme can also be applied to the classical dilution model, where it outperforms existing noise-level-independent schemes in the literature.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Inflationary non-Gaussianities in alpha vacua and consistency with conformal symmetries
Authors:
Arhum Ansari,
Pinak Banerjee,
Prateksh Dhivakar,
Sachin Jain,
Nilay Kundu
Abstract:
We study the conformal invariance of inflationary non-Gaussianities associated with scalar fluctuations in a non-Bunch-Davies initial state, known as the $α$-vacuum, in single-field slow-roll inflation. The $α$-vacuum is a one-parameter family of states, including the Bunch-Davies one, that preserves the conformal symmetry of inflationary dynamics in a nearly de-Sitter space-time. Working within t…
▽ More
We study the conformal invariance of inflationary non-Gaussianities associated with scalar fluctuations in a non-Bunch-Davies initial state, known as the $α$-vacuum, in single-field slow-roll inflation. The $α$-vacuum is a one-parameter family of states, including the Bunch-Davies one, that preserves the conformal symmetry of inflationary dynamics in a nearly de-Sitter space-time. Working within the leading slow-roll approximation, we compute the four-point scalar correlator (the trispectrum) in $α$-vacuum using the in-in formalism. We check that the conformal Ward identities are met between the three and four-point scalar $α$-vacua correlators. Surprisingly, this contrasts the previously reported negative result of the Ward identities being violated between the two and the three-point correlators. We have also extended the wave-functional method, previously used for correlators with Bunch-Davies initial condition, to compute the three and four-point scalar correlators in $α$-vacua. The results obtained from the wave-function method match the corresponding in-in results, adding further justification to our check of Ward identities with $α$-vacua correlators.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
The algebraic structure of hyperbolic graph braid groups
Authors:
B. Appiah,
P. Dani,
W. Ge,
C. Hudson,
S. Jain,
M. Lemoine,
J. Murphy,
J. Murray,
A. Pandikkadan,
K. Schreve,
H. Vo
Abstract:
Genevois recently classified which graph braid groups on $\ge 3$ strands are word hyperbolic. In the $3$-strand case, he asked whether all such word hyperbolic groups are actually free; this reduced to checking two infinite classes of graphs: sun and pulsar graphs. We prove that $3$-strand braid groups of sun graphs are free. On the other hand, it was known to experts that $3$-strand braid groups…
▽ More
Genevois recently classified which graph braid groups on $\ge 3$ strands are word hyperbolic. In the $3$-strand case, he asked whether all such word hyperbolic groups are actually free; this reduced to checking two infinite classes of graphs: sun and pulsar graphs. We prove that $3$-strand braid groups of sun graphs are free. On the other hand, it was known to experts that $3$-strand braid groups of most pulsar graphs contain surface subgroups. We provide a simple proof of this and prove an additional structure theorem for these groups.
△ Less
Submitted 21 March, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Standing on FURM ground -- A framework for evaluating Fair, Useful, and Reliable AI Models in healthcare systems
Authors:
Alison Callahan,
Duncan McElfresh,
Juan M. Banda,
Gabrielle Bunney,
Danton Char,
Jonathan Chen,
Conor K. Corbin,
Debadutta Dash,
Norman L. Downing,
Sneha S. Jain,
Nikesh Kotecha,
Jonathan Masterson,
Michelle M. Mello,
Keith Morse,
Srikar Nallan,
Abby Pandya,
Anurang Revri,
Aditya Sharma,
Christopher Sharp,
Rahul Thapa,
Michael Wornow,
Alaa Youssef,
Michael A. Pfeffer,
Nigam H. Shah
Abstract:
The impact of using artificial intelligence (AI) to guide patient care or operational processes is an interplay of the AI model's output, the decision-making protocol based on that output, and the capacity of the stakeholders involved to take the necessary subsequent action. Estimating the effects of this interplay before deployment, and studying it in real time afterwards, are essential to bridge…
▽ More
The impact of using artificial intelligence (AI) to guide patient care or operational processes is an interplay of the AI model's output, the decision-making protocol based on that output, and the capacity of the stakeholders involved to take the necessary subsequent action. Estimating the effects of this interplay before deployment, and studying it in real time afterwards, are essential to bridge the chasm between AI model development and achievable benefit. To accomplish this, the Data Science team at Stanford Health Care has developed a Testing and Evaluation (T&E) mechanism to identify fair, useful and reliable AI models (FURM) by conducting an ethical review to identify potential value mismatches, simulations to estimate usefulness, financial projections to assess sustainability, as well as analyses to determine IT feasibility, design a deployment strategy, and recommend a prospective monitoring and evaluation plan. We report on FURM assessments done to evaluate six AI guided solutions for potential adoption, spanning clinical and operational settings, each with the potential to impact from several dozen to tens of thousands of patients each year. We describe the assessment process, summarize the six assessments, and share our framework to enable others to conduct similar assessments. Of the six solutions we assessed, two have moved into a planning and implementation phase. Our novel contributions - usefulness estimates by simulation, financial projections to quantify sustainability, and a process to do ethical assessments - as well as their underlying methods and open source tools, are available for other healthcare systems to conduct actionable evaluations of candidate AI solutions.
△ Less
Submitted 14 March, 2024; v1 submitted 26 February, 2024;
originally announced March 2024.
-
IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages
Authors:
Mohammed Safi Ur Rahman Khan,
Priyam Mehta,
Ananth Sankar,
Umashankar Kumaravelan,
Sumanth Doddapaneni,
Suriyaprasaad G,
Varun Balan G,
Sparsh Jain,
Anoop Kunchukuttan,
Pratyush Kumar,
Raj Dabre,
Mitesh M. Khapra
Abstract:
Despite the considerable advancements in English LLMs, the progress in building comparable models for other languages has been hindered due to the scarcity of tailored resources. Our work aims to bridge this divide by introducing an expansive suite of resources specifically designed for the development of Indic LLMs, covering 22 languages, containing a total of 251B tokens and 74.8M instruction-re…
▽ More
Despite the considerable advancements in English LLMs, the progress in building comparable models for other languages has been hindered due to the scarcity of tailored resources. Our work aims to bridge this divide by introducing an expansive suite of resources specifically designed for the development of Indic LLMs, covering 22 languages, containing a total of 251B tokens and 74.8M instruction-response pairs. Recognizing the importance of both data quality and quantity, our approach combines highly curated manually verified data, unverified yet valuable data, and synthetic data. We build a clean, open-source pipeline for curating pre-training data from diverse sources, including websites, PDFs, and videos, incorporating best practices for crawling, cleaning, flagging, and deduplication. For instruction-fine tuning, we amalgamate existing Indic datasets, translate/transliterate English datasets into Indian languages, and utilize LLaMa2 and Mixtral models to create conversations grounded in articles from Indian Wikipedia and Wikihow. Additionally, we address toxicity alignment by generating toxic prompts for multiple scenarios and then generate non-toxic responses by feeding these toxic prompts to an aligned LLaMa2 model. We hope that the datasets, tools, and resources released as a part of this work will not only propel the research and development of Indic LLMs but also establish an open-source blueprint for extending such efforts to other languages. The data and other artifacts created as part of this work are released with permissive licenses.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.