-
Just How Flexible are Neural Networks in Practice?
Authors:
Ravid Shwartz-Ziv,
Micah Goldblum,
Arpit Bansal,
C. Bayan Bruss,
Yann LeCun,
Andrew Gordon Wilson
Abstract:
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function c…
▽ More
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function class, built into an architecture, shapes its loss surface and impacts the minima we find. In this work, we examine the ability of neural networks to fit data in practice. Our findings indicate that: (1) standard optimizers find minima where the model can only fit training sets with significantly fewer samples than it has parameters; (2) convolutional networks are more parameter-efficient than MLPs and ViTs, even on randomly labeled data; (3) while stochastic training is thought to have a regularizing effect, SGD actually finds minima that fit more training data than full-batch gradient descent; (4) the difference in capacity to fit correctly labeled and incorrectly labeled samples can be predictive of generalization; (5) ReLU activation functions result in finding minima that fit more data despite being designed to avoid vanishing and exploding gradients in deep architectures.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Online learning of a panoply of quantum objects
Authors:
Akshay Bansal,
Ian George,
Soumik Ghosh,
Jamie Sikora,
Alice Zheng
Abstract:
In many quantum tasks, there is an unknown quantum object that one wishes to learn. An online strategy for this task involves adaptively refining a hypothesis to reproduce such an object or its measurement statistics. A common evaluation metric for such a strategy is its regret, or roughly the accumulated errors in hypothesis statistics. We prove a sublinear regret bound for learning over general…
▽ More
In many quantum tasks, there is an unknown quantum object that one wishes to learn. An online strategy for this task involves adaptively refining a hypothesis to reproduce such an object or its measurement statistics. A common evaluation metric for such a strategy is its regret, or roughly the accumulated errors in hypothesis statistics. We prove a sublinear regret bound for learning over general subsets of positive semidefinite matrices via the regularized-follow-the-leader algorithm and apply it to various settings where one wishes to learn quantum objects. For concrete applications, we present a sublinear regret bound for learning quantum states, effects, channels, interactive measurements, strategies, co-strategies, and the collection of inner products of pure states. Our bound applies to many other quantum objects with compact, convex representations. In proving our regret bound, we establish various matrix analysis results useful in quantum information theory. This includes a generalization of Pinsker's inequality for arbitrary positive semidefinite operators with possibly different traces, which may be of independent interest and applicable to more general classes of divergences.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Multilingual Text Style Transfer: Datasets & Models for Indian Languages
Authors:
Sourabrata Mukherjee,
Atul Kr. Ojha,
Akanksha Bansal,
Deepak Alok,
John P. McCrae,
Ondřej Dušek
Abstract:
Text style transfer (TST) involves altering the linguistic style of a text while preserving its core content. This paper focuses on sentiment transfer, a vital TST subtask (Mukherjee et al., 2022a), across a spectrum of Indian languages: Hindi, Magahi, Malayalam, Marathi, Punjabi, Odia, Telugu, and Urdu, expanding upon previous work on English-Bangla sentiment transfer (Mukherjee et al., 2023). We…
▽ More
Text style transfer (TST) involves altering the linguistic style of a text while preserving its core content. This paper focuses on sentiment transfer, a vital TST subtask (Mukherjee et al., 2022a), across a spectrum of Indian languages: Hindi, Magahi, Malayalam, Marathi, Punjabi, Odia, Telugu, and Urdu, expanding upon previous work on English-Bangla sentiment transfer (Mukherjee et al., 2023). We introduce dedicated datasets of 1,000 positive and 1,000 negative style-parallel sentences for each of these eight languages. We then evaluate the performance of various benchmark models categorized into parallel, non-parallel, cross-lingual, and shared learning approaches, including the Llama2 and GPT-3.5 large language models (LLMs). Our experiments highlight the significance of parallel data in TST and demonstrate the effectiveness of the Masked Style Filling (MSF) approach (Mukherjee et al., 2023) in non-parallel techniques. Moreover, cross-lingual and joint multilingual learning methods show promise, offering insights into selecting optimal models tailored to the specific language and task requirements. To the best of our knowledge, this work represents the first comprehensive exploration of the TST task as sentiment transfer across a diverse set of languages.
△ Less
Submitted 9 June, 2024; v1 submitted 31 May, 2024;
originally announced May 2024.
-
Programmer Visual Attention During Context-Aware Code Summarization
Authors:
Aakash Bansal,
Robert Wallace,
Zachary Karas,
Ningzhi Tang,
Yu Huang,
Toby Jia-Jun Li,
Collin McMillan
Abstract:
Abridged: Programmer attention represents the visual focus of programmers on parts of the source code in pursuit of programming tasks. We conducted an in-depth human study with XY Java programmers, where each programmer generated summaries for 40 methods from five large Java projects over five one-hour sessions. We used eye-tracking equipment to map the visual attention of programmers while they w…
▽ More
Abridged: Programmer attention represents the visual focus of programmers on parts of the source code in pursuit of programming tasks. We conducted an in-depth human study with XY Java programmers, where each programmer generated summaries for 40 methods from five large Java projects over five one-hour sessions. We used eye-tracking equipment to map the visual attention of programmers while they wrote the summaries. We also rate the quality of each summary. We found eye-gaze patterns and metrics that define common behaviors between programmer attention during context-aware code summarization. Specifically, we found that programmers need to read significantly (p<0.01) fewer words and make significantly fewer revisits to words (p\textless0.03) as they summarize more methods during a session, while maintaining the quality of summaries. We also found that the amount of source code a participant looks at correlates with a higher quality summary, but this trend follows a bell-shaped curve, such that after a threshold reading more source code leads to a significant decrease (p<0.01) in the quality of summaries. We also gathered insight into the type of methods in the project that provide the most contextual information for code summarization based on programmer attention. Specifically, we observed that programmers spent a majority of their time looking at methods inside the same class as the target method to be summarized. Surprisingly, we found that programmers spent significantly less time looking at methods in the call graph of the target method. We discuss how our empirical observations may aid future studies towards modeling programmer attention and improving context-aware automatic source code summarization.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Transformers Can Do Arithmetic with the Right Embeddings
Authors:
Sean McLeish,
Arpit Bansal,
Alex Stein,
Neel Jain,
John Kirchenbauer,
Brian R. Bartoldson,
Bhavya Kailkhura,
Abhinav Bhatele,
Jonas Geiping,
Avi Schwarzschild,
Tom Goldstein
Abstract:
The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix ena…
▽ More
The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further.
With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
A Study on Developer Behaviors for Validating and Repairing LLM-Generated Code Using Eye Tracking and IDE Actions
Authors:
Ningzhi Tang,
Meng Chen,
Zheng Ning,
Aakash Bansal,
Yu Huang,
Collin McMillan,
Toby Jia-Jun Li
Abstract:
The increasing use of large language model (LLM)-powered code generation tools, such as GitHub Copilot, is transforming software engineering practices. This paper investigates how developers validate and repair code generated by Copilot and examines the impact of code provenance awareness during these processes. We conducted a lab study with 28 participants, who were tasked with validating and rep…
▽ More
The increasing use of large language model (LLM)-powered code generation tools, such as GitHub Copilot, is transforming software engineering practices. This paper investigates how developers validate and repair code generated by Copilot and examines the impact of code provenance awareness during these processes. We conducted a lab study with 28 participants, who were tasked with validating and repairing Copilot-generated code in three software projects. Participants were randomly divided into two groups: one informed about the provenance of LLM-generated code and the other not. We collected data on IDE interactions, eye-tracking, cognitive workload assessments, and conducted semi-structured interviews. Our results indicate that, without explicit information, developers often fail to identify the LLM origin of the code. Developers generally employ similar validation and repair strategies for LLM-generated code, but exhibit behaviors such as frequent switching between code and comments, different attentional focus, and a tendency to delete and rewrite code. Being aware of the code's provenance led to improved performance, increased search efforts, more frequent Copilot usage, and higher cognitive workload. These findings enhance our understanding of how developers interact with LLM-generated code and carry implications for designing tools that facilitate effective human-LLM collaboration in software development.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion
Authors:
Hossein Souri,
Arpit Bansal,
Hamid Kazemi,
Liam Fowl,
Aniruddha Saha,
Jonas Geiping,
Andrew Gordon Wilson,
Rama Chellappa,
Tom Goldstein,
Micah Goldblum
Abstract:
Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clea…
▽ More
Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clean data, called base samples, and then modify those samples to craft poisons. However, some base samples may be significantly more amenable to poisoning than others. As a result, we may be able to craft more potent poisons by carefully choosing the base samples. In this work, we use guided diffusion to synthesize base samples from scratch that lead to significantly more potent poisons and backdoors than previous state-of-the-art attacks. Our Guided Diffusion Poisoning (GDP) base samples can be combined with any downstream poisoning or backdoor attack to boost its effectiveness. Our implementation code is publicly available at: https://github.com/hsouri/GDP .
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
On the composable security of weak coin flipping
Authors:
Jiawei Wu,
Yanglin Hu,
Akshay Bansal,
Marco Tomamichel
Abstract:
Weak coin flipping is a cryptographic primitive in which two mutually distrustful parties generate a shared random bit to agree on a winner via remote communication. While a stand-alone secure weak coin flipping protocol can be constructed from noiseless communication channels, its composability has not been explored. In this work, we demonstrate that no weak coin flipping protocol can be abstract…
▽ More
Weak coin flipping is a cryptographic primitive in which two mutually distrustful parties generate a shared random bit to agree on a winner via remote communication. While a stand-alone secure weak coin flipping protocol can be constructed from noiseless communication channels, its composability has not been explored. In this work, we demonstrate that no weak coin flipping protocol can be abstracted into a black box resource with composable security. Despite this, we also establish the overall stand-alone security of weak coin flipping protocols under sequential composition.
△ Less
Submitted 21 June, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
EyeTrans: Merging Human and Machine Attention for Neural Code Summarization
Authors:
Yifan Zhang,
Jiliang Li,
Zachary Karas,
Aakash Bansal,
Toby Jia-Jun Li,
Collin McMillan,
Kevin Leach,
Yu Huang
Abstract:
Neural code summarization leverages deep learning models to automatically generate brief natural language summaries of code snippets. The development of Transformer models has led to extensive use of attention during model design. While existing work has primarily and almost exclusively focused on static properties of source code and related structural representations like the Abstract Syntax Tree…
▽ More
Neural code summarization leverages deep learning models to automatically generate brief natural language summaries of code snippets. The development of Transformer models has led to extensive use of attention during model design. While existing work has primarily and almost exclusively focused on static properties of source code and related structural representations like the Abstract Syntax Tree (AST), few studies have considered human attention, that is, where programmers focus while examining and comprehending code. In this paper, we develop a method for incorporating human attention into machine attention to enhance neural code summarization. To facilitate this incorporation and vindicate this hypothesis, we introduce EyeTrans, which consists of three steps: (1) we conduct an extensive eye-tracking human study to collect and pre-analyze data for model training, (2) we devise a data-centric approach to integrate human attention with machine attention in the Transformer architecture, and (3) we conduct comprehensive experiments on two code summarization tasks to demonstrate the effectiveness of incorporating human attention into Transformers. Integrating human attention leads to an improvement of up to 29.91% in Functional Summarization and up to 6.39% in General Code Summarization performance, demonstrating the substantial benefits of this combination. We further explore performance in terms of robustness and efficiency by creating challenging summarization scenarios in which EyeTrans exhibits interesting properties. We also visualize the attention map to depict the simplifying effect of machine attention in the Transformer by incorporating human attention. This work has the potential to propel AI research in software engineering by introducing more human-centered approaches and data.
△ Less
Submitted 29 February, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Text Detoxification as Style Transfer in English and Hindi
Authors:
Sourabrata Mukherjee,
Akanksha Bansal,
Atul Kr. Ojha,
John P. McCrae,
Ondřej Dušek
Abstract:
This paper focuses on text detoxification, i.e., automatically converting toxic text into non-toxic text. This task contributes to safer and more respectful online communication and can be considered a Text Style Transfer (TST) task, where the text style changes while its content is preserved. We present three approaches: knowledge transfer from a similar task, multi-task learning approach, combin…
▽ More
This paper focuses on text detoxification, i.e., automatically converting toxic text into non-toxic text. This task contributes to safer and more respectful online communication and can be considered a Text Style Transfer (TST) task, where the text style changes while its content is preserved. We present three approaches: knowledge transfer from a similar task, multi-task learning approach, combining sequence-to-sequence modeling with various toxicity classification tasks, and delete and reconstruct approach. To support our research, we utilize a dataset provided by Dementieva et al.(2021), which contains multiple versions of detoxified texts corresponding to toxic texts. In our experiments, we selected the best variants through expert human annotators, creating a dataset where each toxic sentence is paired with a single, appropriate detoxified version. Additionally, we introduced a small Hindi parallel dataset, aligning with a part of the English dataset, suitable for evaluation purposes. Our results demonstrate that our approach effectively balances text detoxication while preserving the actual content and maintaining fluency.
△ Less
Submitted 9 June, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Amazon Locker Capacity Management
Authors:
Samyukta Sethuraman,
Ankur Bansal,
Setareh Mardan,
Mauricio G. C. Resende,
Timothy L. Jacobs
Abstract:
Amazon Locker is a self-service delivery or pickup location where customers can pick up packages and drop off returns. A basic first-come-first-served policy for accepting package delivery requests to lockers results in lockers becoming full with standard shipping speed (3-5 day shipping) packages, and leaving no space left for expedited packages which are mostly Next-Day or Two-Day shipping. This…
▽ More
Amazon Locker is a self-service delivery or pickup location where customers can pick up packages and drop off returns. A basic first-come-first-served policy for accepting package delivery requests to lockers results in lockers becoming full with standard shipping speed (3-5 day shipping) packages, and leaving no space left for expedited packages which are mostly Next-Day or Two-Day shipping. This paper proposes a solution to the problem of determining how much locker capacity to reserve for different ship-option packages. Yield management is a much researched field with popular applications in the airline, car rental, and hotel industries. However, Amazon Locker poses a unique challenge in this field since the number of days a package will wait in a locker (package dwell time) is, in general, unknown. The proposed solution combines machine learning techniques to predict locker demand and package dwell time, and linear programming to maximize throughput in lockers. The decision variables from this optimization provide optimal capacity reservation values for different ship options. This resulted in a year-over-year increase of 9% in Locker throughput worldwide during holiday season of 2018, impacting millions of customers.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Analysis of Linux-PRNG (Pseudo Random Number Generator)
Authors:
Ayush Bansal,
Pramod Subramanyan,
Satyadev Nandakumar
Abstract:
The Linux pseudorandom number generator (PRNG) is a PRNG with entropy inputs and is widely used in many security-related applications and protocols. This PRNG is written as an open-source code which is subject to regular changes. It has been analysed in the works of Gutterman et al., Lacharme et al., while in the meantime, several changes have been applied to the code, to counter the attacks prese…
▽ More
The Linux pseudorandom number generator (PRNG) is a PRNG with entropy inputs and is widely used in many security-related applications and protocols. This PRNG is written as an open-source code which is subject to regular changes. It has been analysed in the works of Gutterman et al., Lacharme et al., while in the meantime, several changes have been applied to the code, to counter the attacks presented since then. Our work describes the Linux PRNG of kernel versions 5.3 and upwards. We discuss the PRNG architecture briefly and in detail about the entropy mixing function.
Our goal is to study the entropy mixing function and analyse it over two properties, namely, injectivity and length of the longest chain. For this purpose, we will be using SAT solving and model counting over targetted formulas involving multiple states of the Linux entropy store.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Synergistic Perception and Control Simplex for Verifiable Safe Vertical Landing
Authors:
Ayoosh Bansal,
Yang Zhao,
James Zhu,
Sheng Cheng,
Yuliang Gu,
Hyung-Jin Yoon,
Hunmin Kim,
Naira Hovakimyan,
Lui Sha
Abstract:
Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomou…
▽ More
Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomous air mobility vehicles. We improve upon this system by replacing static assumptions of control capabilities with dynamic confirmation, i.e., real-time confirmation of control limitations of the system, ensuring reliable fulfillment of safety maneuvers and overrides, without dependence on overly pessimistic assumptions. Parameters defining control system capabilities and limitations, e.g., maximum deceleration, are continuously tracked within the system and used to make safety-critical decisions. We apply these techniques to propose a verifiable collision avoidance solution for autonomous aerial mobility vehicles operating in cluttered and potentially unsafe environments.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
VR-NeRF: High-Fidelity Virtualized Walkable Spaces
Authors:
Linning Xu,
Vasu Agrawal,
William Laney,
Tony Garcia,
Aayush Bansal,
Changil Kim,
Samuel Rota Bulò,
Lorenzo Porzi,
Peter Kontschieder,
Aljaž Božič,
Dahua Lin,
Michael Zollhöfer,
Christian Richardt
Abstract:
We present an end-to-end system for the high-fidelity capture, model reconstruction, and real-time rendering of walkable spaces in virtual reality using neural radiance fields. To this end, we designed and built a custom multi-camera rig to densely capture walkable spaces in high fidelity and with multi-view high dynamic range images in unprecedented quality and density. We extend instant neural g…
▽ More
We present an end-to-end system for the high-fidelity capture, model reconstruction, and real-time rendering of walkable spaces in virtual reality using neural radiance fields. To this end, we designed and built a custom multi-camera rig to densely capture walkable spaces in high fidelity and with multi-view high dynamic range images in unprecedented quality and density. We extend instant neural graphics primitives with a novel perceptual color space for learning accurate HDR appearance, and an efficient mip-mapping mechanism for level-of-detail rendering with anti-aliasing, while carefully optimizing the trade-off between quality and speed. Our multi-GPU renderer enables high-fidelity volume rendering of our neural radiance field model at the full VR resolution of dual 2K$\times$2K at 36 Hz on our custom demo machine. We demonstrate the quality of our results on our challenging high-fidelity datasets, and compare our method and datasets to existing baselines. We release our dataset on our project website.
△ Less
Submitted 4 November, 2023;
originally announced November 2023.
-
Revisiting File Context for Source Code Summarization
Authors:
Aakash Bansal,
Chia-Yi Su,
Collin McMillan
Abstract:
Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code summarization is the encoder-decoder neural architecture, and the encoder input is almost always a single subroutine or other short code snippet. The problem wi…
▽ More
Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code summarization is the encoder-decoder neural architecture, and the encoder input is almost always a single subroutine or other short code snippet. The problem with this setup is that the information needed to describe the code is often not present in the code itself -- that information often resides in other nearby code. In this paper, we revisit the idea of ``file context'' for code summarization. File context is the idea of encoding select information from other subroutines in the same file. We propose a novel modification of the Transformer architecture that is purpose-built to encode file context and demonstrate its improvement over several baselines. We find that file context helps on a subset of challenging examples where traditional approaches struggle.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Modeling Programmer Attention as Scanpath Prediction
Authors:
Aakash Bansal,
Chia-Yi Su,
Zachary Karas,
Yifan Zhang,
Yu Huang,
Toby Jia-Jun Li,
Collin McMillan
Abstract:
This paper launches a new effort at modeling programmer attention by predicting eye movement scanpaths. Programmer attention refers to what information people intake when performing programming tasks. Models of programmer attention refer to machine prediction of what information is important to people. Models of programmer attention are important because they help researchers build better interfac…
▽ More
This paper launches a new effort at modeling programmer attention by predicting eye movement scanpaths. Programmer attention refers to what information people intake when performing programming tasks. Models of programmer attention refer to machine prediction of what information is important to people. Models of programmer attention are important because they help researchers build better interfaces, assistive technologies, and more human-like AI. For many years, researchers in SE have built these models based on features such as mouse clicks, key logging, and IDE interactions. Yet the holy grail in this area is scanpath prediction -- the prediction of the sequence of eye fixations a person would take over a visual stimulus. A person's eye movements are considered the most concrete evidence that a person is taking in a piece of information. Scanpath prediction is a notoriously difficult problem, but we believe that the emergence of lower-cost, higher-accuracy eye tracking equipment and better large language models of source code brings a solution within grasp. We present an eye tracking experiment with 27 programmers and a prototype scanpath predictor to present preliminary results and obtain early community feedback.
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
Statement-based Memory for Neural Source Code Summarization
Authors:
Aakash Bansal,
Siyuan Jiang,
Sakib Haque,
Collin McMillan
Abstract:
Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program quickly without having to read the code itself. Lately, neural source code summarization has emerged as the frontier of research into automated code summarizati…
▽ More
Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program quickly without having to read the code itself. Lately, neural source code summarization has emerged as the frontier of research into automated code summarization techniques. By far the most popular targets for summarization are program subroutines. The idea, in a nutshell, is to train an encoder-decoder neural architecture using large sets of examples of subroutines extracted from code repositories. The encoder represents the code and the decoder represents the summary. However, most current approaches attempt to treat the subroutine as a single unit. For example, by taking the entire subroutine as input to a Transformer or RNN-based encoder. But code behavior tends to depend on the flow from statement to statement. Normally dynamic analysis may shed light on this flow, but dynamic analysis on hundreds of thousands of examples in large datasets is not practical. In this paper, we present a statement-based memory encoder that learns the important elements of flow during training, leading to a statement-based subroutine representation without the need for dynamic analysis. We implement our encoder for code summarization and demonstrate a significant improvement over the state-of-the-art.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
DocTr: Document Transformer for Structured Information Extraction in Documents
Authors:
Haofu Liao,
Aruni RoyChowdhury,
Weijian Li,
Ankan Bansal,
Yuting Zhang,
Zhuowen Tu,
Ravi Kumar Satzoda,
R. Manmatha,
Vijay Mahadevan
Abstract:
We present a new formulation for structured information extraction (SIE) from visually rich documents. It aims to address the limitations of existing IOB tagging or graph-based formulations, which are either overly reliant on the correct ordering of input text or struggle with decoding a complex graph. Instead, motivated by anchor-based object detectors in vision, we represent an entity as an anch…
▽ More
We present a new formulation for structured information extraction (SIE) from visually rich documents. It aims to address the limitations of existing IOB tagging or graph-based formulations, which are either overly reliant on the correct ordering of input text or struggle with decoding a complex graph. Instead, motivated by anchor-based object detectors in vision, we represent an entity as an anchor word and a bounding box, and represent entity linking as the association between anchor words. This is more robust to text ordering, and maintains a compact graph for entity linking. The formulation motivates us to introduce 1) a DOCument TRansformer (DocTr) that aims at detecting and associating entity bounding boxes in visually rich documents, and 2) a simple pre-training strategy that helps learn entity detection in the context of language. Evaluations on three SIE benchmarks show the effectiveness of the proposed formulation, and the overall approach outperforms existing solutions.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
Nitsche method for Navier-Stokes equations with slip boundary conditions: Convergence analysis and VMS-LES stabilization
Authors:
Aparna Bansal,
Nicolás Alejandro Barnafi,
Dwijendra Narain Pandey
Abstract:
In this paper, we analyze the Nitsche's method for the stationary Navier-Stokes equations on Lipschitz domains under minimal regularity assumptions. Our analysis provides a robust formulation for implementing slip (i.e. Navier) boundary conditions in arbitrarily complex boundaries. The well-posedness of the discrete problem is established using the Banach Nečas Babuška and the Banach fixed point t…
▽ More
In this paper, we analyze the Nitsche's method for the stationary Navier-Stokes equations on Lipschitz domains under minimal regularity assumptions. Our analysis provides a robust formulation for implementing slip (i.e. Navier) boundary conditions in arbitrarily complex boundaries. The well-posedness of the discrete problem is established using the Banach Nečas Babuška and the Banach fixed point theorems under standard small data assumptions, and we also provide optimal convergence rates for the approximation error. Furthermore, we propose a VMS-LES stabilized formulation, which allows the simulation of incompressible fluids at high Reynolds numbers. We validate our theory through numerous numerical tests in well established benchmark problems.
△ Less
Submitted 18 July, 2023; v1 submitted 7 July, 2023;
originally announced July 2023.
-
EgoHumans: An Egocentric 3D Multi-Human Benchmark
Authors:
Rawal Khirodkar,
Aayush Bansal,
Lingni Ma,
Richard Newcombe,
Minh Vo,
Kris Kitani
Abstract:
We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking. Existing egocentric benchmarks either capture single subject or indoor-only scenarios, which limit the generalization of computer vision algorithms for real-world applications. We propose a novel 3D capture setup to construct a comprehensive egocen…
▽ More
We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking. Existing egocentric benchmarks either capture single subject or indoor-only scenarios, which limit the generalization of computer vision algorithms for real-world applications. We propose a novel 3D capture setup to construct a comprehensive egocentric multi-human benchmark in the wild with annotations to support diverse tasks such as human detection, tracking, 2D/3D pose estimation, and mesh recovery. We leverage consumer-grade wearable camera-equipped glasses for the egocentric view, which enables us to capture dynamic activities like playing tennis, fencing, volleyball, etc. Furthermore, our multi-view setup generates accurate 3D ground truth even under severe or complete occlusion. The dataset consists of more than 125k egocentric images, spanning diverse scenes with a particular focus on challenging and unchoreographed multi-human activities and fast-moving egocentric views. We rigorously evaluate existing state-of-the-art methods and highlight their limitations in the egocentric scenario, specifically on multi-human tracking. To address such limitations, we propose EgoFormer, a novel approach with a multi-stream transformer architecture and explicit 3D spatial reasoning to estimate and track the human pose. EgoFormer significantly outperforms prior art by 13.6% IDF1 on the EgoHumans dataset.
△ Less
Submitted 18 August, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Towards Modeling Human Attention from Eye Movements for Neural Source Code Summarization
Authors:
Aakash Bansal,
Bonita Sharif,
Collin McMillan
Abstract:
Neural source code summarization is the task of generating natural language descriptions of source code behavior using neural networks. A fundamental component of most neural models is an attention mechanism. The attention mechanism learns to connect features in source code to specific words to use when generating natural language descriptions. Humans also pay attention to some features in code mo…
▽ More
Neural source code summarization is the task of generating natural language descriptions of source code behavior using neural networks. A fundamental component of most neural models is an attention mechanism. The attention mechanism learns to connect features in source code to specific words to use when generating natural language descriptions. Humans also pay attention to some features in code more than others. This human attention reflects experience and high-level cognition well beyond the capability of any current neural model. In this paper, we use data from published eye-tracking experiments to create a model of this human attention. The model predicts which words in source code are the most important for code summarization. Next, we augment a baseline neural code summarization approach using our model of human attention. We observe an improvement in prediction performance of the augmented approach in line with other bio-inspired neural models.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
A Language Model of Java Methods with Train/Test Deduplication
Authors:
Chia-Yi Su,
Aakash Bansal,
Vijayanta Jain,
Sepideh Ghanavati,
Collin McMillan
Abstract:
This tool demonstration presents a research toolkit for a language model of Java source code. The target audience includes researchers studying problems at the granularity level of subroutines, statements, or variables in Java. In contrast to many existing language models, we prioritize features for researchers including an open and easily-searchable training set, a held out test set with differen…
▽ More
This tool demonstration presents a research toolkit for a language model of Java source code. The target audience includes researchers studying problems at the granularity level of subroutines, statements, or variables in Java. In contrast to many existing language models, we prioritize features for researchers including an open and easily-searchable training set, a held out test set with different levels of deduplication from the training set, infrastructure for deduplicating new examples, and an implementation platform suitable for execution on equipment accessible to a relatively modest budget. Our model is a GPT2-like architecture with 350m parameters. Our training set includes 52m Java methods (9b tokens) and 13m StackOverflow threads (10.5b tokens). To improve accessibility of research to more members of the community, we limit local resource requirements to GPUs with 16GB video memory. We provide a test set of held out Java methods that include descriptive comments, including the entire Java projects for those methods. We also provide deduplication tools using precomputed hash tables at various similarity thresholds to help researchers ensure that their own test examples are not in the training set. We make all our tools and data open source and available via Huggingface and Github.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces
Authors:
Shuhei Watanabe,
Archit Bansal,
Frank Hutter
Abstract:
The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA).…
▽ More
The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this issue, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form calculation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient.
△ Less
Submitted 26 May, 2023; v1 submitted 20 April, 2023;
originally announced April 2023.
-
Label Smoothing Improves Neural Source Code Summarization
Authors:
Sakib Haque,
Aakash Bansal,
Collin McMillan
Abstract:
Label smoothing is a regularization technique for neural networks. Normally neural models are trained to an output distribution that is a vector with a single 1 for the correct prediction, and 0 for all other elements. Label smoothing converts the correct prediction location to something slightly less than 1, then distributes the remainder to the other elements such that they are slightly greater…
▽ More
Label smoothing is a regularization technique for neural networks. Normally neural models are trained to an output distribution that is a vector with a single 1 for the correct prediction, and 0 for all other elements. Label smoothing converts the correct prediction location to something slightly less than 1, then distributes the remainder to the other elements such that they are slightly greater than 0. A conceptual explanation behind label smoothing is that it helps prevent a neural model from becoming "overconfident" by forcing it to consider alternatives, even if only slightly. Label smoothing has been shown to help several areas of language generation, yet typically requires considerable tuning and testing to achieve the optimal results. This tuning and testing has not been reported for neural source code summarization - a growing research area in software engineering that seeks to generate natural language descriptions of source code behavior. In this paper, we demonstrate the effect of label smoothing on several baselines in neural code summarization, and conduct an experiment to find good parameters for label smoothing and make recommendations for its use.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Universal Guidance for Diffusion Models
Authors:
Arpit Bansal,
Hong-Min Chu,
Avi Schwarzschild,
Soumyadip Sengupta,
Micah Goldblum,
Jonas Geiping,
Tom Goldstein
Abstract:
Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully…
▽ More
Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, and classifier signals. Code is available at https://github.com/arpitbansal297/Universal-Guided-Diffusion.
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
Diversity Analysis of Multi-Aperture UWOC System over EGG Channel with Pointing Errors
Authors:
Ziyaur Rahman,
Ankur Bansal,
S. M. Zafaruddin
Abstract:
Single aperture reception for underwater wireless optical communication (UWOC) is insufficient to deal with oceanic turbulence caused by the combined effect of temperature gradient and air bubbles. This paper analyzes the performance of multi-aperture reception for UWOC under channel irradiance fluctuations characterized by the mixture exponential generalized gamma (EGG) distribution. We analyze t…
▽ More
Single aperture reception for underwater wireless optical communication (UWOC) is insufficient to deal with oceanic turbulence caused by the combined effect of temperature gradient and air bubbles. This paper analyzes the performance of multi-aperture reception for UWOC under channel irradiance fluctuations characterized by the mixture exponential generalized gamma (EGG) distribution. We analyze the system performance by employing both selection combining (SC) and maximum ratio combining (MRC) receivers. In particular, we derive the exact outage probability expression for the SC-based multi-aperture UWOC receiver and obtain an upper bound on the outage probability for the MRC-based multi-aperture UWOC receiver. With the help of the derived results, we analytically obtain the diversity order of the considered multi-aperture UWOC system.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Atrous Space Bender U-Net (ASBU-Net/LogiNet)
Authors:
Anurag Bansal,
Oleg Ostap,
Miguel Maestre Trueba,
Kristopher Perry
Abstract:
$ $With recent advances in CNNs, exceptional improvements have been made in semantic segmentation of high resolution images in terms of accuracy and latency. However, challenges still remain in detecting objects in crowded scenes, large scale variations, partial occlusion, and distortions, while still maintaining mobility and latency. We introduce a fast and efficient convolutional neural network,…
▽ More
$ $With recent advances in CNNs, exceptional improvements have been made in semantic segmentation of high resolution images in terms of accuracy and latency. However, challenges still remain in detecting objects in crowded scenes, large scale variations, partial occlusion, and distortions, while still maintaining mobility and latency. We introduce a fast and efficient convolutional neural network, ASBU-Net, for semantic segmentation of high resolution images that addresses these problems and uses no novelty layers for ease of quantization and embedded hardware support. ASBU-Net is based on a new feature extraction module, atrous space bender layer (ASBL), which is efficient in terms of computation and memory. The ASB layers form a building block that is used to make ASBNet. Since this network does not use any special layers it can be easily implemented, quantized and deployed on FPGAs and other hardware with limited memory. We present experiments on resource and accuracy trade-offs and show strong performance compared to other popular models.
△ Less
Submitted 27 April, 2023; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Tools for Extracting Spatio-Temporal Patterns in Meteorological Image Sequences: From Feature Engineering to Attention-Based Neural Networks
Authors:
Akansha Singh Bansal,
Yoonjin Lee,
Kyle Hilburn,
Imme Ebert-Uphoff
Abstract:
Atmospheric processes involve both space and time. This is why human analysis of atmospheric imagery can often extract more information from animated loops of image sequences than from individual images. Automating such an analysis requires the ability to identify spatio-temporal patterns in image sequences which is a very challenging task, because of the endless possibilities of patterns in both…
▽ More
Atmospheric processes involve both space and time. This is why human analysis of atmospheric imagery can often extract more information from animated loops of image sequences than from individual images. Automating such an analysis requires the ability to identify spatio-temporal patterns in image sequences which is a very challenging task, because of the endless possibilities of patterns in both space and time. In this paper we review different concepts and techniques that are useful to extract spatio-temporal context specifically for meteorological applications. In this survey we first motivate the need for these approaches in meteorology using two applications, solar forecasting and detecting convection from satellite imagery. Then we provide an overview of many different concepts and techniques that are helpful for the interpretation of meteorological image sequences, such as (1) feature engineering methods to strengthen the desired signal in the input, using meteorological knowledge, classic image processing, harmonic analysis and topological data analysis (2) explain how different convolution filters (2D/3D/LSTM-convolution) can be utilized strategically in convolutional neural network architectures to find patterns in both space and time (3) discuss the powerful new concept of 'attention' in neural networks and the powerful abilities it brings to the interpretation of image sequences (4) briefly survey strategies from unsupervised, self-supervised and transfer learning to reduce the need for large labeled datasets. We hope that presenting an overview of these tools - many of which are underutilized - will help accelerate progress in this area.
△ Less
Submitted 24 October, 2022; v1 submitted 21 October, 2022;
originally announced October 2022.
-
Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries
Authors:
Yuxin Wen,
Arpit Bansal,
Hamid Kazemi,
Eitan Borgnia,
Micah Goldblum,
Jonas Geiping,
Tom Goldstein
Abstract:
As industrial applications are increasingly automated by machine learning models, enforcing personal data ownership and intellectual property rights requires tracing training data back to their rightful owners. Membership inference algorithms approach this problem by using statistical techniques to discern whether a target sample was included in a model's training set. However, existing methods on…
▽ More
As industrial applications are increasingly automated by machine learning models, enforcing personal data ownership and intellectual property rights requires tracing training data back to their rightful owners. Membership inference algorithms approach this problem by using statistical techniques to discern whether a target sample was included in a model's training set. However, existing methods only utilize the unaltered target sample or simple augmentations of the target to compute statistics. Such a sparse sampling of the model's behavior carries little information, leading to poor inference capabilities. In this work, we use adversarial tools to directly optimize for queries that are discriminative and diverse. Our improvements achieve significantly more accurate membership inference than existing methods, especially in offline scenarios and in the low false-positive regime which is critical in legal settings. Code is available at https://github.com/YuxinWenRick/canary-in-a-coalmine.
△ Less
Submitted 1 June, 2023; v1 submitted 19 October, 2022;
originally announced October 2022.
-
Adaptation of domain-specific transformer models with text oversampling for sentiment analysis of social media posts on Covid-19 vaccines
Authors:
Anmol Bansal,
Arjun Choudhry,
Anubhav Sharma,
Seba Susan
Abstract:
Covid-19 has spread across the world and several vaccines have been developed to counter its surge. To identify the correct sentiments associated with the vaccines from social media posts, we fine-tune various state-of-the-art pre-trained transformer models on tweets associated with Covid-19 vaccines. Specifically, we use the recently introduced state-of-the-art pre-trained transformer models RoBE…
▽ More
Covid-19 has spread across the world and several vaccines have been developed to counter its surge. To identify the correct sentiments associated with the vaccines from social media posts, we fine-tune various state-of-the-art pre-trained transformer models on tweets associated with Covid-19 vaccines. Specifically, we use the recently introduced state-of-the-art pre-trained transformer models RoBERTa, XLNet and BERT, and the domain-specific transformer models CT-BERT and BERTweet that are pre-trained on Covid-19 tweets. We further explore the option of text augmentation by oversampling using Language Model based Oversampling Technique (LMOTE) to improve the accuracies of these models, specifically, for small sample datasets where there is an imbalanced class distribution among the positive, negative and neutral sentiment classes. Our results summarize our findings on the suitability of text oversampling for imbalanced small sample datasets that are used to fine-tune state-of-the-art pre-trained transformer models, and the utility of domain-specific transformer models for the classification task.
△ Less
Submitted 13 January, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
Perception Simplex: Verifiable Collision Avoidance in Autonomous Vehicles Amidst Obstacle Detection Faults
Authors:
Ayoosh Bansal,
Hunmin Kim,
Simon Yu,
Bo Li,
Naira Hovakimyan,
Marco Caccamo,
Lui Sha
Abstract:
Advances in deep learning have revolutionized cyber-physical applications, including the development of Autonomous Vehicles. However, real-world collisions involving autonomous control of vehicles have raised significant safety concerns regarding the use of Deep Neural Networks (DNN) in safety-critical tasks, particularly Perception. The inherent unverifiability of DNNs poses a key challenge in en…
▽ More
Advances in deep learning have revolutionized cyber-physical applications, including the development of Autonomous Vehicles. However, real-world collisions involving autonomous control of vehicles have raised significant safety concerns regarding the use of Deep Neural Networks (DNN) in safety-critical tasks, particularly Perception. The inherent unverifiability of DNNs poses a key challenge in ensuring their safe and reliable operation.
In this work, we propose Perception Simplex (PS), a fault-tolerant application architecture designed for obstacle detection and collision avoidance. We analyze an existing LiDAR-based classical obstacle detection algorithm to establish strict bounds on its capabilities and limitations. Such analysis and verification have not been possible for deep learning-based perception systems yet. By employing verifiable obstacle detection algorithms, PS identifies obstacle existence detection faults in the output of unverifiable DNN-based object detectors. When faults with potential collision risks are detected, appropriate corrective actions are initiated. Through extensive analysis and software-in-the-loop simulations, we demonstrate that PS provides predictable and deterministic fault tolerance against obstacle existence detection faults, establishing a robust safety guarantee.
△ Less
Submitted 28 November, 2023; v1 submitted 4 September, 2022;
originally announced September 2022.
-
Verifiable Obstacle Detection
Authors:
Ayoosh Bansal,
Hunmin Kim,
Simon Yu,
Bo Li,
Naira Hovakimyan,
Marco Caccamo,
Lui Sha
Abstract:
Perception of obstacles remains a critical safety concern for autonomous vehicles. Real-world collisions have shown that the autonomy faults leading to fatal collisions originate from obstacle existence detection. Open source autonomous driving implementations show a perception pipeline with complex interdependent Deep Neural Networks. These networks are not fully verifiable, making them unsuitabl…
▽ More
Perception of obstacles remains a critical safety concern for autonomous vehicles. Real-world collisions have shown that the autonomy faults leading to fatal collisions originate from obstacle existence detection. Open source autonomous driving implementations show a perception pipeline with complex interdependent Deep Neural Networks. These networks are not fully verifiable, making them unsuitable for safety-critical tasks.
In this work, we present a safety verification of an existing LiDAR based classical obstacle detection algorithm. We establish strict bounds on the capabilities of this obstacle detection algorithm. Given safety standards, such bounds allow for determining LiDAR sensor properties that would reliably satisfy the standards. Such analysis has as yet been unattainable for neural network based perception systems. We provide a rigorous analysis of the obstacle detection system with empirical results based on real-world sensor data.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
Authors:
Arpit Bansal,
Eitan Borgnia,
Hong-Min Chu,
Jie S. Li,
Hamid Kazemi,
Furong Huang,
Micah Goldblum,
Jonas Geiping,
Tom Goldstein
Abstract:
Standard diffusion models involve an image transform -- adding Gaussian noise -- and an image restoration operator that inverts this degradation. We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice. Even when using completely deterministi…
▽ More
Standard diffusion models involve an image transform -- adding Gaussian noise -- and an image restoration operator that inverts this degradation. We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice. Even when using completely deterministic degradations (e.g., blur, masking, and more), the training and test-time update rules that underlie diffusion models can be easily generalized to create generative models. The success of these fully deterministic models calls into question the community's understanding of diffusion models, which relies on noise in either gradient Langevin dynamics or variational inference, and paves the way for generalized diffusion models that invert arbitrary processes. Our code is available at https://github.com/arpitbansal297/Cold-Diffusion-Models
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
Ellipsis: Towards Efficient System Auditing for Real-Time Systems
Authors:
Ayoosh Bansal,
Anant Kandikuppa,
Chien-Ying Chen,
Monowar Hasan,
Adam Bates,
Sibin Mohan
Abstract:
System auditing is a powerful tool that provides insight into the nature of suspicious events in computing systems, allowing machine operators to detect and subsequently investigate security incidents. While auditing has proven invaluable to the security of traditional computers, existing audit frameworks are rarely designed with consideration for Real-Time Systems (RTS). The transparency provided…
▽ More
System auditing is a powerful tool that provides insight into the nature of suspicious events in computing systems, allowing machine operators to detect and subsequently investigate security incidents. While auditing has proven invaluable to the security of traditional computers, existing audit frameworks are rarely designed with consideration for Real-Time Systems (RTS). The transparency provided by system auditing would be of tremendous benefit in a variety of security-critical RTS domains, (e.g., autonomous vehicles); however, if audit mechanisms are not carefully integrated into RTS, auditing can be rendered ineffectual and violate the real-world temporal requirements of the RTS.
In this paper, we demonstrate how to adapt commodity audit frameworks to RTS. Using Linux Audit as a case study, we first demonstrate that the volume of audit events generated by commodity frameworks is unsustainable within the temporal and resource constraints of real-time (RT) applications. To address this, we present Ellipsis, a set of kernel-based reduction techniques that leverage the periodic repetitive nature of RT applications to aggressively reduce the costs of system-level auditing. Ellipsis generates succinct descriptions of RT applications' expected activity while retaining a detailed record of unexpected activities, enabling analysis of suspicious activity while meeting temporal constraints. Our evaluation of Ellipsis, using ArduPilot (an open-source autopilot application suite) demonstrates up to 93% reduction in audit log generation.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
Neural Pixel Composition: 3D-4D View Synthesis from Multi-Views
Authors:
Aayush Bansal,
Michael Zollhoefer
Abstract:
We present Neural Pixel Composition (NPC), a novel approach for continuous 3D-4D view synthesis given only a discrete set of multi-view observations as input. Existing state-of-the-art approaches require dense multi-view supervision and an extensive computational budget. The proposed formulation reliably operates on sparse and wide-baseline multi-view imagery and can be trained efficiently within…
▽ More
We present Neural Pixel Composition (NPC), a novel approach for continuous 3D-4D view synthesis given only a discrete set of multi-view observations as input. Existing state-of-the-art approaches require dense multi-view supervision and an extensive computational budget. The proposed formulation reliably operates on sparse and wide-baseline multi-view imagery and can be trained efficiently within a few seconds to 10 minutes for hi-res (12MP) content, i.e., 200-400X faster convergence than existing methods. Crucial to our approach are two core novelties: 1) a representation of a pixel that contains color and depth information accumulated from multi-views for a particular location and time along a line of sight, and 2) a multi-layer perceptron (MLP) that enables the composition of this rich information provided for a pixel location to obtain the final color output. We experiment with a large variety of multi-view sequences, compare to existing approaches, and achieve better results in diverse and challenging settings. Finally, our approach enables dense 3D reconstruction from sparse multi-views, where COLMAP, a state-of-the-art 3D reconstruction approach, struggles.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
Certified Neural Network Watermarks with Randomized Smoothing
Authors:
Arpit Bansal,
Ping-yeh Chiang,
Michael Curry,
Rajiv Jain,
Curtis Wigington,
Varun Manjunatha,
John P Dickerson,
Tom Goldstein
Abstract:
Watermarking is a commonly used strategy to protect creators' rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models -- in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking m…
▽ More
Watermarking is a commonly used strategy to protect creators' rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models -- in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking methods that claim to be empirically resistant to different types of removal attacks, but these new techniques often fail in the face of new or better-tuned adversaries. In this paper, we propose a certifiable watermarking method. Using the randomized smoothing technique proposed in Chiang et al., we show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain l2 threshold. In addition to being certifiable, our watermark is also empirically more robust compared to previous watermarking methods. Our experiments can be reproduced with code at https://github.com/arpitbansal297/Certified_Watermarks
△ Less
Submitted 16 July, 2022;
originally announced July 2022.
-
Transfer Learning with Deep Tabular Models
Authors:
Roman Levin,
Valeriia Cherepanova,
Avi Schwarzschild,
Arpit Bansal,
C. Bayan Bruss,
Tom Goldstein,
Andrew Gordon Wilson,
Micah Goldblum
Abstract:
Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage of neural models is that they learn reusable features and are easily fine-tuned in new domains. This property is often exploited in computer vision and natural language applica…
▽ More
Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage of neural models is that they learn reusable features and are easily fine-tuned in new domains. This property is often exploited in computer vision and natural language applications, where transfer learning is indispensable when task-specific training data is scarce. In this work, we demonstrate that upstream data gives tabular neural networks a decisive advantage over widely used GBDT models. We propose a realistic medical diagnosis benchmark for tabular transfer learning, and we present a how-to guide for using upstream data to boost performance with a variety of tabular neural network architectures. Finally, we propose a pseudo-feature method for cases where the upstream and downstream feature sets differ, a tabular-specific problem widespread in real-world applications. Our code is available at https://github.com/LevinRoman/tabular-transfer-learning .
△ Less
Submitted 7 August, 2023; v1 submitted 30 June, 2022;
originally announced June 2022.
-
KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints
Authors:
Marko Mihajlovic,
Aayush Bansal,
Michael Zollhoefer,
Siyu Tang,
Shunsuke Saito
Abstract:
Image-based volumetric humans using pixel-aligned features promise generalization to unseen poses and identities. Prior work leverages global spatial encodings and multi-view geometric consistency to reduce spatial ambiguity. However, global encodings often suffer from overfitting to the distribution of the training data, and it is difficult to learn multi-view consistent reconstruction from spars…
▽ More
Image-based volumetric humans using pixel-aligned features promise generalization to unseen poses and identities. Prior work leverages global spatial encodings and multi-view geometric consistency to reduce spatial ambiguity. However, global encodings often suffer from overfitting to the distribution of the training data, and it is difficult to learn multi-view consistent reconstruction from sparse views. In this work, we investigate common issues with existing spatial encodings and propose a simple yet highly effective approach to modeling high-fidelity volumetric humans from sparse views. One of the key ideas is to encode relative spatial 3D information via sparse 3D keypoints. This approach is robust to the sparsity of viewpoints and cross-dataset domain gap. Our approach outperforms state-of-the-art methods for head reconstruction. On human body reconstruction for unseen subjects, we also achieve performance comparable to prior work that uses a parametric human body model and temporal feature aggregation. Our experiments show that a majority of errors in prior work stem from an inappropriate choice of spatial encoding and thus we suggest a new direction for high-fidelity image-based human modeling. https://markomih.github.io/KeypointNeRF
△ Less
Submitted 21 July, 2022; v1 submitted 10 May, 2022;
originally announced May 2022.
-
These Deals Won't Last! Longevity, Uniformity and Bias in Product Badge Assignment in E-Commerce Platforms
Authors:
Archit Bansal,
Kunal Banerjee,
Abhijnan Chakraborty
Abstract:
Product badges are ubiquitous in e-commerce platforms, acting as effective psychological triggers to nudge customers to buy specific products, boosting revenues. However, to the best of our knowledge, there has been no attempt to systematically study these badges and their several idiosyncrasies - we intend to close this gap in our current work. Specifically, we try to answer questions such as: Ho…
▽ More
Product badges are ubiquitous in e-commerce platforms, acting as effective psychological triggers to nudge customers to buy specific products, boosting revenues. However, to the best of our knowledge, there has been no attempt to systematically study these badges and their several idiosyncrasies - we intend to close this gap in our current work. Specifically, we try to answer questions such as: How long does a product retain a badge on a given platform? If a product is sold on different platforms, then does it receive similar badges? How do the products that receive badges differ from those which do not, in terms of price, customer rating, etc. We collect longitudinal data from several e-commerce platforms over 45 days, and find that although most of the badges are short-lived, there are several permanent badge assignments and that too for badges meant to denote urgency or scarcity. Furthermore, it is unclear how the badge assignments are done, and we find evidence that highly-rated products are missing out on badges compared to lower quality ones. Our work calls for greater transparency in the badge assignment process to inform customers, as well as to reduce dissatisfaction among the sellers dependent on the platforms for their revenues.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
COAP: Compositional Articulated Occupancy of People
Authors:
Marko Mihajlovic,
Shunsuke Saito,
Aayush Bansal,
Michael Zollhoefer,
Siyu Tang
Abstract:
We present a novel neural implicit representation for articulated human bodies. Compared to explicit template meshes, neural implicit body representations provide an efficient mechanism for modeling interactions with the environment, which is essential for human motion reconstruction and synthesis in 3D scenes. However, existing neural implicit bodies suffer from either poor generalization on high…
▽ More
We present a novel neural implicit representation for articulated human bodies. Compared to explicit template meshes, neural implicit body representations provide an efficient mechanism for modeling interactions with the environment, which is essential for human motion reconstruction and synthesis in 3D scenes. However, existing neural implicit bodies suffer from either poor generalization on highly articulated poses or slow inference time. In this work, we observe that prior knowledge about the human body's shape and kinematic structure can be leveraged to improve generalization and efficiency. We decompose the full-body geometry into local body parts and employ a part-aware encoder-decoder architecture to learn neural articulated occupancy that models complex deformations locally. Our local shape encoder represents the body deformation of not only the corresponding body part but also the neighboring body parts. The decoder incorporates the geometric constraints of local body shape which significantly improves pose generalization. We demonstrate that our model is suitable for resolving self-intersections and collisions with 3D environments. Quantitative and qualitative experiments show that our method largely outperforms existing solutions in terms of both efficiency and accuracy. The code and models are available at https://neuralbodies.github.io/COAP/index.html
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
Semantic Similarity Metrics for Evaluating Source Code Summarization
Authors:
Sakib Haque,
Zachary Eberhart,
Aakash Bansal,
Collin McMillan
Abstract:
Source code summarization involves creating brief descriptions of source code in natural language. These descriptions are a key component of software documentation such as JavaDocs. Automatic code summarization is a prized target of software engineering research, due to the high value summaries have to programmers and the simultaneously high cost of writing and maintaining documentation by hand. C…
▽ More
Source code summarization involves creating brief descriptions of source code in natural language. These descriptions are a key component of software documentation such as JavaDocs. Automatic code summarization is a prized target of software engineering research, due to the high value summaries have to programmers and the simultaneously high cost of writing and maintaining documentation by hand. Current work is almost all based on machine models trained via big data input. Large datasets of examples of code and summaries of that code are used to train an e.g. encoder-decoder neural model. Then the output predictions of the model are evaluated against a set of reference summaries. The input is code not seen by the model, and the prediction is compared to a reference. The means by which a prediction is compared to a reference is essentially word overlap, calculated via a metric such as BLEU or ROUGE. The problem with using word overlap is that not all words in a sentence have the same importance, and many words have synonyms. The result is that calculated similarity may not match the perceived similarity by human readers. In this paper, we conduct an experiment to measure the degree to which various word overlap metrics correlate to human-rated similarity of predicted and reference summaries. We evaluate alternatives based on current work in semantic similarity metrics and propose recommendations for evaluation of source code summarization.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective
Authors:
Gowthami Somepalli,
Liam Fowl,
Arpit Bansal,
Ping Yeh-Chiang,
Yehuda Dar,
Richard Baraniuk,
Micah Goldblum,
Tom Goldstein
Abstract:
We discuss methods for visualizing neural network decision boundaries and decision regions. We use these visualizations to investigate issues related to reproducibility and generalization in neural network training. We observe that changes in model architecture (and its associate inductive bias) cause visible changes in decision boundaries, while multiple runs with the same architecture yield resu…
▽ More
We discuss methods for visualizing neural network decision boundaries and decision regions. We use these visualizations to investigate issues related to reproducibility and generalization in neural network training. We observe that changes in model architecture (and its associate inductive bias) cause visible changes in decision boundaries, while multiple runs with the same architecture yield results with strong similarities, especially in the case of wide architectures. We also use decision boundary methods to visualize double descent phenomena. We see that decision boundary reproducibility depends strongly on model width. Near the threshold of interpolation, neural network decision boundaries become fragmented into many small decision regions, and these regions are non-reproducible. Meanwhile, very narrows and very wide networks have high levels of reproducibility in their decision boundaries with relatively few decision regions. We discuss how our observations relate to the theory of double descent phenomena in convex models. Code is available at https://github.com/somepago/dbViz
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
A framework for spatial heat risk assessment using a generalized similarity measure
Authors:
Akshay Bansal,
Ayda Kianmehr
Abstract:
In this study, we develop a novel framework to assess health risks due to heat hazards across various localities (zip codes) across the state of Maryland with the help of two commonly used indicators i.e. exposure and vulnerability. Our approach quantifies each of the two aforementioned indicators by developing their corresponding feature vectors and subsequently computes indicator-specific refere…
▽ More
In this study, we develop a novel framework to assess health risks due to heat hazards across various localities (zip codes) across the state of Maryland with the help of two commonly used indicators i.e. exposure and vulnerability. Our approach quantifies each of the two aforementioned indicators by developing their corresponding feature vectors and subsequently computes indicator-specific reference vectors that signify a high risk environment by clustering the data points at the tail-end of an empirical risk spectrum. The proposed framework circumvents the information-theoretic entropy based aggregation methods whose usage varies with different views of entropy that are subjective in nature and more importantly generalizes the notion of risk-valuation using cosine similarity with unknown reference points.
△ Less
Submitted 20 October, 2023; v1 submitted 20 February, 2022;
originally announced February 2022.
-
End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking
Authors:
Arpit Bansal,
Avi Schwarzschild,
Eitan Borgnia,
Zeyad Emam,
Furong Huang,
Micah Goldblum,
Tom Goldstein
Abstract:
Machine learning systems perform well on pattern matching tasks, but their ability to perform algorithmic or logical reasoning is not well understood. One important reasoning capability is algorithmic extrapolation, in which models trained only on small/simple reasoning problems can synthesize complex strategies for large/complex problems at test time. Algorithmic extrapolation can be achieved thr…
▽ More
Machine learning systems perform well on pattern matching tasks, but their ability to perform algorithmic or logical reasoning is not well understood. One important reasoning capability is algorithmic extrapolation, in which models trained only on small/simple reasoning problems can synthesize complex strategies for large/complex problems at test time. Algorithmic extrapolation can be achieved through recurrent systems, which can be iterated many times to solve difficult reasoning problems. We observe that this approach fails to scale to highly complex problems because behavior degenerates when many iterations are applied -- an issue we refer to as "overthinking." We propose a recall architecture that keeps an explicit copy of the problem instance in memory so that it cannot be forgotten. We also employ a progressive training routine that prevents the model from learning behaviors that are specific to iteration number and instead pushes it to learn behaviors that can be repeated indefinitely. These innovations prevent the overthinking problem, and enable recurrent systems to solve extremely hard extrapolation tasks.
△ Less
Submitted 14 October, 2022; v1 submitted 11 February, 2022;
originally announced February 2022.
-
A Moment in the Sun: Solar Nowcasting from Multispectral Satellite Data using Self-Supervised Learning
Authors:
Akansha Singh Bansal,
Trapit Bansal,
David Irwin
Abstract:
Solar energy is now the cheapest form of electricity in history. Unfortunately, significantly increasing the grid's fraction of solar energy remains challenging due to its variability, which makes balancing electricity's supply and demand more difficult. While thermal generators' ramp rate -- the maximum rate that they can change their output -- is finite, solar's ramp rate is essentially infinite…
▽ More
Solar energy is now the cheapest form of electricity in history. Unfortunately, significantly increasing the grid's fraction of solar energy remains challenging due to its variability, which makes balancing electricity's supply and demand more difficult. While thermal generators' ramp rate -- the maximum rate that they can change their output -- is finite, solar's ramp rate is essentially infinite. Thus, accurate near-term solar forecasting, or nowcasting, is important to provide advance warning to adjust thermal generator output in response to solar variations to ensure a balanced supply and demand. To address the problem, this paper develops a general model for solar nowcasting from abundant and readily available multispectral satellite data using self-supervised learning. Specifically, we develop deep auto-regressive models using convolutional neural networks (CNN) and long short-term memory networks (LSTM) that are globally trained across multiple locations to predict raw future observations of the spatio-temporal data collected by the recently launched GOES-R series of satellites. Our model estimates a location's future solar irradiance based on satellite observations, which we feed to a regression model trained on smaller site-specific solar data to provide near-term solar photovoltaic (PV) forecasts that account for site-specific characteristics. We evaluate our approach for different coverage areas and forecast horizons across 25 solar sites and show that our approach yields errors close to that of a model using ground-truth observations.
△ Less
Submitted 27 December, 2021;
originally announced December 2021.
-
Object-Aware Cropping for Self-Supervised Learning
Authors:
Shlok Mishra,
Anshul Shah,
Ankan Bansal,
Abhyuday Jagannatha,
Janit Anjaria,
Abhishek Sharma,
David Jacobs,
Dilip Krishnan
Abstract:
A core component of the recent success of self-supervised learning is cropping data augmentation, which selects sub-regions of an image to be used as positive views in the self-supervised loss. The underlying assumption is that randomly cropped and resized regions of a given image share information about the objects of interest, which the learned representation will capture. This assumption is mos…
▽ More
A core component of the recent success of self-supervised learning is cropping data augmentation, which selects sub-regions of an image to be used as positive views in the self-supervised loss. The underlying assumption is that randomly cropped and resized regions of a given image share information about the objects of interest, which the learned representation will capture. This assumption is mostly satisfied in datasets such as ImageNet where there is a large, centered object, which is highly likely to be present in random crops of the full image. However, in other datasets such as OpenImages or COCO, which are more representative of real world uncurated data, there are typically multiple small objects in an image. In this work, we show that self-supervised learning based on the usual random cropping performs poorly on such datasets. We propose replacing one or both of the random crops with crops obtained from an object proposal algorithm. This encourages the model to learn both object and scene level semantic representations. Using this approach, which we call object-aware cropping, results in significant improvements over scene cropping on classification and object detection benchmarks. For example, on OpenImages, our approach achieves an improvement of 8.8% mAP over random scene-level cropping using MoCo-v2 based pre-training. We also show significant improvements on COCO and PASCAL-VOC object detection and segmentation tasks over the state-of-the-art self-supervised learning approaches. Our approach is efficient, simple and general, and can be used in most existing contrastive and non-contrastive self-supervised learning frameworks.
△ Less
Submitted 6 April, 2023; v1 submitted 1 December, 2021;
originally announced December 2021.
-
LiDAR Cluster First and Camera Inference Later: A New Perspective Towards Autonomous Driving
Authors:
Jiyang Chen,
Simon Yu,
Rohan Tabish,
Ayoosh Bansal,
Shengzhong Liu,
Tarek Abdelzaher,
Lui Sha
Abstract:
Object detection in state-of-the-art Autonomous Vehicles (AV) framework relies heavily on deep neural networks. Typically, these networks perform object detection uniformly on the entire camera LiDAR frames. However, this uniformity jeopardizes the safety of the AV by giving the same priority to all objects in the scenes regardless of their risk of collision to the AV. In this paper, we present a…
▽ More
Object detection in state-of-the-art Autonomous Vehicles (AV) framework relies heavily on deep neural networks. Typically, these networks perform object detection uniformly on the entire camera LiDAR frames. However, this uniformity jeopardizes the safety of the AV by giving the same priority to all objects in the scenes regardless of their risk of collision to the AV. In this paper, we present a new end-to-end pipeline for AV that introduces the concept of LiDAR cluster first and camera inference later to detect and classify objects. The benefits of our proposed framework are twofold. First, our pipeline prioritizes detecting objects that pose a higher risk of collision to the AV, giving more time for the AV to react to unsafe conditions. Second, it also provides, on average, faster inference speeds compared to popular deep neural network pipelines. We design our framework using the real-world datasets, the Waymo Open Dataset, solving challenges arising from the limitations of LiDAR sensors and object detection algorithms. We show that our novel object detection pipeline prioritizes the detection of higher risk objects while simultaneously achieving comparable accuracy and a 25% higher average speed compared to camera inference only.
△ Less
Submitted 19 November, 2021; v1 submitted 18 November, 2021;
originally announced November 2021.
-
A practical analysis of ROP attacks
Authors:
Ayush Bansal,
Debadatta Mishra
Abstract:
Control Flow Hijacking attacks have posed a serious threat to the security of applications for a long time where an attacker can damage the control Flow Integrity of the program and execute arbitrary code. These attacks can be performed by injecting code in the program's memory or reusing already existing code in the program (also known as Code-Reuse Attacks). Code-Reuse Attacks in the form of Ret…
▽ More
Control Flow Hijacking attacks have posed a serious threat to the security of applications for a long time where an attacker can damage the control Flow Integrity of the program and execute arbitrary code. These attacks can be performed by injecting code in the program's memory or reusing already existing code in the program (also known as Code-Reuse Attacks). Code-Reuse Attacks in the form of Return-into-libc Attacks or Return-Oriented Programming Attacks are said to be Turing Complete, providing a guarantee that there will always exist code segments (also called ROP gadgets) within a binary allowing an attacker to perform any kind of function by building a suitable ROP chain (chain of ROP gadgets). Our goal is to study different techniques of performing ROP Attacks and find the difficulties encountered to perform such attacks. For this purpose, we have designed an automated tool which works on 64-bit systems and generates a ROP chain from ROP gadgets to execute arbitrary system calls.
△ Less
Submitted 5 November, 2021;
originally announced November 2021.
-
SmartSplit: Latency-Energy-Memory Optimisation for CNN Splitting on Smartphone Environment
Authors:
Ishan Prakash,
Aniruddh Bansal,
Rohit Verma,
Rajeev Shorey
Abstract:
Artificial Intelligence has now taken centre stage in the smartphone industry owing to the need of bringing all processing close to the user and addressing privacy concerns. Convolution Neural Networks (CNNs), which are used by several AI applications, are highly resource and computation intensive. Although new generation smartphones come with AI-enabled chips, minimal memory and energy utilisatio…
▽ More
Artificial Intelligence has now taken centre stage in the smartphone industry owing to the need of bringing all processing close to the user and addressing privacy concerns. Convolution Neural Networks (CNNs), which are used by several AI applications, are highly resource and computation intensive. Although new generation smartphones come with AI-enabled chips, minimal memory and energy utilisation is essential as many applications are run concurrently on a smartphone. In light of this, optimising the workload on the smartphone by offloading a part of the processing to a cloud server is an important direction of research. In this paper, we analyse the feasibility of splitting CNNs between smartphones and cloud server by formulating a multi-objective optimisation problem that optimises the end-to-end latency, memory utilisation, and energy consumption. We design SmartSplit, a Genetic Algorithm with decision analysis based approach to solve the optimisation problem. Our experiments run with multiple CNN models show that splitting a CNN between a smartphone and a cloud server is feasible. The proposed approach, SmartSplit fares better when compared to other state-of-the-art approaches.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Flexible Accuracy for Differential Privacy
Authors:
Aman Bansal,
Rahul Chunduru,
Deepesh Data,
Manoj Prabhakaran
Abstract:
Differential Privacy (DP) has become a gold standard in privacy-preserving data analysis. While it provides one of the most rigorous notions of privacy, there are many settings where its applicability is limited.
Our main contribution is in augmenting differential privacy with {\em Flexible Accuracy}, which allows small distortions in the input (e.g., dropping outliers) before measuring accuracy…
▽ More
Differential Privacy (DP) has become a gold standard in privacy-preserving data analysis. While it provides one of the most rigorous notions of privacy, there are many settings where its applicability is limited.
Our main contribution is in augmenting differential privacy with {\em Flexible Accuracy}, which allows small distortions in the input (e.g., dropping outliers) before measuring accuracy of the output, allowing one to extend DP mechanisms to high-sensitivity functions. We present mechanisms that can help in achieving this notion for functions that had no meaningful differentially private mechanisms previously. In particular, we illustrate an application to differentially private histograms, which in turn yields mechanisms for revealing the support of a dataset or the extremal values in the data. Analyses of our constructions exploit new versatile composition theorems that facilitate modular design.
All the above extensions use our new definitional framework, which is in terms of "lossy Wasserstein distance" -- a 2-parameter error measure for distributions. This may be of independent interest.
△ Less
Submitted 18 October, 2021;
originally announced October 2021.