-
iMIV: in-Memory Integrity Verification for NVM
Authors:
Rajat Jain,
Aravinda Prasad,
Sreenivas Subramoney,
Arkaprava Basu
Abstract:
Non-volatile Memory (NVM) could bridge the gap between memory and storage. However, NVMs are susceptible to data remanence attacks. Thus, multiple security metadata must persist along with the data to protect the confidentiality and integrity of NVM-resident data. Persisting Bonsai Merkel Tree (BMT) nodes, critical for data integrity, can add significant overheads due to need to write large amount…
▽ More
Non-volatile Memory (NVM) could bridge the gap between memory and storage. However, NVMs are susceptible to data remanence attacks. Thus, multiple security metadata must persist along with the data to protect the confidentiality and integrity of NVM-resident data. Persisting Bonsai Merkel Tree (BMT) nodes, critical for data integrity, can add significant overheads due to need to write large amounts of metadata off-chip to the bandwidth-constrained NVMs.
We propose iMIV for low-overhead, fine-grained integrity verification through in-memory computing. We argue that memory-intensive integrity verification operations (BMT updates and verification) should be employed close to the NVM to limit off-chip data movement. We design iMIV based on typical NVDIMM designs that have an onboard logic chip with a trusted encryption engine, separate from the untrusted storage media. iMIV reduces the performance overheads from 205% to 55% when integrity verification operations are offloaded to NVM compared to when all the security operations are employed at the memory controller.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
The infrastructure powering IBM's Gen AI model development
Authors:
Talia Gershon,
Seetharami Seelam,
Brian Belgodere,
Milton Bonilla,
Lan Hoang,
Danny Barnett,
I-Hsin Chung,
Apoorve Mohan,
Ming-Hung Chen,
Lixiang Luo,
Robert Walkup,
Constantinos Evangelinos,
Shweta Salaria,
Marc Dombrowa,
Yoonho Park,
Apo Kayi,
Liran Schour,
Alim Alim,
Ali Sydney,
Pavlos Maniotis,
Laurent Schares,
Bernard Metzler,
Bengi Karacali-Akyamac,
Sophia Wen,
Tatsuhiro Chiba
, et al. (121 additional authors not shown)
Abstract:
AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi…
▽ More
AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering efficient and high-performing AI training requires an end-to-end solution that combines hardware, software and holistic telemetry to cater for multiple types of AI workloads. In this report, we describe IBM's hybrid cloud infrastructure that powers our generative AI model development. This infrastructure includes (1) Vela: an AI-optimized supercomputing capability directly integrated into the IBM Cloud, delivering scalable, dynamic, multi-tenant and geographically distributed infrastructure for large-scale model training and other AI workflow steps and (2) Blue Vela: a large-scale, purpose-built, on-premises hosting environment that is optimized to support our largest and most ambitious AI model training tasks. Vela provides IBM with the dual benefit of high performance for internal use along with the flexibility to adapt to an evolving commercial landscape. Blue Vela provides us with the benefits of rapid development of our largest and most ambitious models, as well as future-proofing against the evolving model landscape in the industry. Taken together, they provide IBM with the ability to rapidly innovate in the development of both AI models and commercial offerings.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Gradient Boosting Mapping for Dimensionality Reduction and Feature Extraction
Authors:
Anri Patron,
Ayush Prasad,
Hoang Phuc Hau Luu,
Kai Puolamäki
Abstract:
A fundamental problem in supervised learning is to find a good set of features or distance measures. If the new set of features is of lower dimensionality and can be obtained by a simple transformation of the original data, they can make the model understandable, reduce overfitting, and even help to detect distribution drift. We propose a supervised dimensionality reduction method Gradient Boostin…
▽ More
A fundamental problem in supervised learning is to find a good set of features or distance measures. If the new set of features is of lower dimensionality and can be obtained by a simple transformation of the original data, they can make the model understandable, reduce overfitting, and even help to detect distribution drift. We propose a supervised dimensionality reduction method Gradient Boosting Mapping (GBMAP), where the outputs of weak learners -- defined as one-layer perceptrons -- define the embedding. We show that the embedding coordinates provide better features for the supervised learning task, making simple linear models competitive with the state-of-the-art regressors and classifiers. We also use the embedding to find a principled distance measure between points. The features and distance measures automatically ignore directions irrelevant to the supervised learning task. We also show that we can reliably detect out-of-distribution data points with potentially large regression or classification errors. GBMAP is fast and works in seconds for dataset of million data points or hundreds of features. As a bonus, GBMAP provides a regression and classification performance comparable to the state-of-the-art supervised learning methods.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation
Authors:
Aaditya Prasad,
Kevin Lin,
Jimmy Wu,
Linqi Zhou,
Jeannette Bohg
Abstract:
Many robotic systems, such as mobile manipulators or quadrotors, cannot be equipped with high-end GPUs due to space, weight, and power constraints. These constraints prevent these systems from leveraging recent developments in visuomotor policy architectures that require high-end GPUs to achieve fast policy inference. In this paper, we propose Consistency Policy, a faster and similarly powerful al…
▽ More
Many robotic systems, such as mobile manipulators or quadrotors, cannot be equipped with high-end GPUs due to space, weight, and power constraints. These constraints prevent these systems from leveraging recent developments in visuomotor policy architectures that require high-end GPUs to achieve fast policy inference. In this paper, we propose Consistency Policy, a faster and similarly powerful alternative to Diffusion Policy for learning visuomotor robot control. By virtue of its fast inference speed, Consistency Policy can enable low latency decision making in resource-constrained robotic setups. A Consistency Policy is distilled from a pretrained Diffusion Policy by enforcing self-consistency along the Diffusion Policy's learned trajectories. We compare Consistency Policy with Diffusion Policy and other related speed-up methods across 6 simulation tasks as well as three real-world tasks where we demonstrate inference on a laptop GPU. For all these tasks, Consistency Policy speeds up inference by an order of magnitude compared to the fastest alternative method and maintains competitive success rates. We also show that the Conistency Policy training procedure is robust to the pretrained Diffusion Policy's quality, a useful result that helps practioners avoid extensive testing of the pretrained model. Key design decisions that enabled this performance are the choice of consistency objective, reduced initial sample variance, and the choice of preset chaining steps.
△ Less
Submitted 28 June, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Authors:
Mayank Mishra,
Matt Stallone,
Gaoyuan Zhang,
Yikang Shen,
Aditya Prasad,
Adriana Meza Soria,
Michele Merler,
Parameswaran Selvam,
Saptha Surendran,
Shivdeep Singh,
Manish Sethi,
Xuan-Hong Dang,
Pengyuan Li,
Kun-Lung Wu,
Syed Zawad,
Andrew Coleman,
Matthew White,
Mark Lewis,
Raju Pavuluri,
Yan Koyfman,
Boris Lublinsky,
Maximilien de Bayser,
Ibrahim Abdelaziz,
Kinjal Basu,
Mayank Agarwal
, et al. (21 additional authors not shown)
Abstract:
Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili…
▽ More
Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabilities, including code generation, fixing bugs, explaining and documenting code, maintaining repositories, and more. In this work, we introduce the Granite series of decoder-only code models for code generative tasks, trained with code written in 116 programming languages. The Granite Code models family consists of models ranging in size from 3 to 34 billion parameters, suitable for applications ranging from complex application modernization tasks to on-device memory-constrained use cases. Evaluation on a comprehensive set of tasks demonstrates that Granite Code models consistently reaches state-of-the-art performance among available open-source code LLMs. The Granite Code model family was optimized for enterprise software development workflows and performs well across a range of coding tasks (e.g. code generation, fixing and explanation), making it a versatile all around code model. We release all our Granite Code models under an Apache 2.0 license for both research and commercial use.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Taming Server Memory TCO with Multiple Software-Defined Compressed Tiers
Authors:
Sandeep Kumar,
Aravinda Prasad,
Sreenivas Subramoney
Abstract:
Memory accounts for 33 - 50% of the total cost of ownership (TCO) in modern data centers. We propose a novel solution to tame memory TCO through the novel creation and judicious management of multiple software-defined compressed memory tiers.
As opposed to the state-of-the-art solutions that employ a 2-Tier solution, a single compressed tier along with DRAM, we define multiple compressed tiers i…
▽ More
Memory accounts for 33 - 50% of the total cost of ownership (TCO) in modern data centers. We propose a novel solution to tame memory TCO through the novel creation and judicious management of multiple software-defined compressed memory tiers.
As opposed to the state-of-the-art solutions that employ a 2-Tier solution, a single compressed tier along with DRAM, we define multiple compressed tiers implemented through a combination of different compression algorithms, memory allocators for compressed objects, and backing media to store compressed objects. These compressed memory tiers represent distinct points in the access latency, data compressibility, and unit memory usage cost spectrum, allowing rich and flexible trade-offs between memory TCO savings and application performance impact. A key advantage with ntier is that it enables aggressive memory TCO saving opportunities by placing warm data in low latency compressed tiers with a reasonable performance impact while simultaneously placing cold data in the best memory TCO saving tiers. We believe our work represents an important server system configuration and optimization capability to achieve the best SLA-aware performance per dollar for applications hosted in production data center environments.
We present a comprehensive and rigorous analytical cost model for performance and TCO trade-off based on continuous monitoring of the application's data access profile. Guided by this model, our placement model takes informed actions to dynamically manage the placement and migration of application data across multiple software-defined compressed tiers. On real-world benchmarks, our solution increases memory TCO savings by 22% - 40% percentage points while maintaining performance parity or improves performance by 2% - 10% percentage points while maintaining memory TCO parity compared to state-of-the-art 2-Tier solutions.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Soft Self-Consistency Improves Language Model Agents
Authors:
Han Wang,
Archiki Prasad,
Elias Stengel-Eskin,
Mohit Bansal
Abstract:
Generations from large language models (LLMs) can be improved by sampling and scoring multiple solutions to select a final answer. Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers. However, when tasks have many distinct and valid answers, selection by voting requires a large number of samples. This makes SC prohibitively expensive for inter…
▽ More
Generations from large language models (LLMs) can be improved by sampling and scoring multiple solutions to select a final answer. Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers. However, when tasks have many distinct and valid answers, selection by voting requires a large number of samples. This makes SC prohibitively expensive for interactive tasks that involve generating multiple actions (answers) sequentially. After establishing that majority voting fails to provide consistent gains on such tasks, we demonstrate how to increase success rates by softening the scoring criterion. We introduce Soft Self-Consistency (SOFT-SC), which replaces SC's discontinuous scoring with a continuous score computed from model likelihoods, allowing for selection even when actions are sparsely distributed. SOFT-SC improves both performance and efficiency on long-horizon interactive tasks, requiring half as many samples as SC for comparable or better performance. For a fixed number of samples, SOFT-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping (WebShop), and a 4.7% increase for an interactive household game (ALFWorld). Finally, we show that SOFT-SC can be applied to both open-source and black-box models.
△ Less
Submitted 5 June, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Authors:
Elias Stengel-Eskin,
Archiki Prasad,
Mohit Bansal
Abstract:
While large language models (LLMs) are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality. Generating redundant code from scratch is both inefficient and error-prone. To address this, we propose Refactoring for Generalizable Abstraction Learning (ReGAL)…
▽ More
While large language models (LLMs) are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality. Generating redundant code from scratch is both inefficient and error-prone. To address this, we propose Refactoring for Generalizable Abstraction Learning (ReGAL), a gradient-free method for learning a library of reusable functions via code refactorization, i.e., restructuring code without changing its execution output. ReGAL learns from a small set of existing programs, iteratively verifying and refining its abstractions via execution. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. On five datasets -- LOGO graphics generation, Date reasoning, TextCraft (a Minecraft-based text-game) MATH, and TabMWP -- both open-source and proprietary LLMs improve in accuracy when predicting programs with ReGAL functions. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains. Our analysis reveals ReGAL's abstractions encapsulate frequently-used subroutines as well as environment dynamics.
△ Less
Submitted 6 June, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities
Authors:
Yuhao Chen,
Chloe Wong,
Hanwen Yang,
Juan Aguenza,
Sai Bhujangari,
Benthan Vu,
Xun Lei,
Amisha Prasad,
Manny Fluss,
Eric Phuong,
Minghao Liu,
Raja Kumar,
Vanshika Vats,
James Davis
Abstract:
This study critically evaluates the efficacy of prompting methods in enhancing the mathematical reasoning capability of large language models (LLMs). The investigation uses three prescriptive prompting methods - simple, persona, and conversational prompting - known for their effectiveness in enhancing the linguistic tasks of LLMs. We conduct this analysis on OpenAI's LLM chatbot, ChatGPT-3.5, on e…
▽ More
This study critically evaluates the efficacy of prompting methods in enhancing the mathematical reasoning capability of large language models (LLMs). The investigation uses three prescriptive prompting methods - simple, persona, and conversational prompting - known for their effectiveness in enhancing the linguistic tasks of LLMs. We conduct this analysis on OpenAI's LLM chatbot, ChatGPT-3.5, on extensive problem sets from the MATH, GSM8K, and MMLU datasets, encompassing a broad spectrum of mathematical challenges. A grading script adapted to each dataset is used to determine the effectiveness of these prompting interventions in enhancing the model's mathematical analysis power. Contrary to expectations, our empirical analysis reveals that none of the investigated methods consistently improves over ChatGPT-3.5's baseline performance, with some causing significant degradation. Our findings suggest that prompting strategies do not necessarily generalize to new domains, in this study failing to enhance mathematical performance.
△ Less
Submitted 20 February, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Siracusa: A 16 nm Heterogenous RISC-V SoC for Extended Reality with At-MRAM Neural Engine
Authors:
Arpan Suravi Prasad,
Moritz Scherer,
Francesco Conti,
Davide Rossi,
Alfio Di Mauro,
Manuel Eggimann,
Jorge Tómas Gómez,
Ziyun Li,
Syed Shakib Sarwar,
Zhao Wang,
Barbara De Salvo,
Luca Benini
Abstract:
Extended reality (XR) applications are Machine Learning (ML)-intensive, featuring deep neural networks (DNNs) with millions of weights, tightly latency-bound (10-20 ms end-to-end), and power-constrained (low tens of mW average power). While ML performance and efficiency can be achieved by introducing neural engines within low-power systems-on-chip (SoCs), system-level power for nontrivial DNNs dep…
▽ More
Extended reality (XR) applications are Machine Learning (ML)-intensive, featuring deep neural networks (DNNs) with millions of weights, tightly latency-bound (10-20 ms end-to-end), and power-constrained (low tens of mW average power). While ML performance and efficiency can be achieved by introducing neural engines within low-power systems-on-chip (SoCs), system-level power for nontrivial DNNs depends strongly on the energy of non-volatile memory (NVM) access for network weights. This work introduces Siracusa, a near-sensor heterogeneous SoC for next-generation XR devices manufactured in 16 nm CMOS. Siracusa couples an octa-core cluster of RISC-V digital signal processing cores with a novel tightly-coupled "At-Memory" integration between a state-of-the-art digital neural engine called N-EUREKA and an on-chip NVM based on magnetoresistive memory(MRAM), achieving 1.7x higher throughput and 3x better energy efficiency than XR SoCs using NVM as background memory. The fabricated SoC prototype achieves an area efficiency of 65.2 GOp/s/mm2 and a peak energy efficiency of 8.84 TOp/J for DNN inference while supporting complex heterogeneous application workloads, which combine ML with conventional signal processing and control.
△ Less
Submitted 14 April, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
HypUC: Hyperfine Uncertainty Calibration with Gradient-boosted Corrections for Reliable Regression on Imbalanced Electrocardiograms
Authors:
Uddeshya Upadhyay,
Sairam Bade,
Arjun Puranik,
Shahir Asfahan,
Melwin Babu,
Francisco Lopez-Jimenez,
Samuel J. Asirvatham,
Ashim Prasad,
Ajit Rajasekharan,
Samir Awasthi,
Rakesh Barve
Abstract:
The automated analysis of medical time series, such as the electrocardiogram (ECG), electroencephalogram (EEG), pulse oximetry, etc, has the potential to serve as a valuable tool for diagnostic decisions, allowing for remote monitoring of patients and more efficient use of expensive and time-consuming medical procedures. Deep neural networks (DNNs) have been demonstrated to process such signals ef…
▽ More
The automated analysis of medical time series, such as the electrocardiogram (ECG), electroencephalogram (EEG), pulse oximetry, etc, has the potential to serve as a valuable tool for diagnostic decisions, allowing for remote monitoring of patients and more efficient use of expensive and time-consuming medical procedures. Deep neural networks (DNNs) have been demonstrated to process such signals effectively. However, previous research has primarily focused on classifying medical time series rather than attempting to regress the continuous-valued physiological parameters central to diagnosis. One significant challenge in this regard is the imbalanced nature of the dataset, as a low prevalence of abnormal conditions can lead to heavily skewed data that results in inaccurate predictions and a lack of certainty in such predictions when deployed. To address these challenges, we propose HypUC, a framework for imbalanced probabilistic regression in medical time series, making several contributions. (i) We introduce a simple kernel density-based technique to tackle the imbalanced regression problem with medical time series. (ii) Moreover, we employ a probabilistic regression framework that allows uncertainty estimation for the predicted continuous values. (iii) We also present a new approach to calibrate the predicted uncertainty further. (iv) Finally, we demonstrate a technique to use calibrated uncertainty estimates to improve the predicted continuous value and show the efficacy of the calibrated uncertainty estimates to flag unreliable predictions. HypUC is evaluated on a large, diverse, real-world dataset of ECGs collected from millions of patients, outperforming several conventional baselines on various diagnostic tasks, suggesting a potential use-case for the reliable clinical deployment of deep learning models.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Telescope: Telemetry at Terabyte Scale
Authors:
Alan Nair,
Sandeep Kumar,
Aravinda Prasad,
Andy Rudoff,
Sreenivas Subramoney
Abstract:
Data-hungry applications that require terabytes of memory have become widespread in recent years. To meet the memory needs of these applications, data centers are embracing tiered memory architectures with near and far memory tiers. Precise, efficient, and timely identification of hot and cold data and their placement in appropriate tiers is critical for performance in such systems. Unfortunately,…
▽ More
Data-hungry applications that require terabytes of memory have become widespread in recent years. To meet the memory needs of these applications, data centers are embracing tiered memory architectures with near and far memory tiers. Precise, efficient, and timely identification of hot and cold data and their placement in appropriate tiers is critical for performance in such systems. Unfortunately, the existing state-of-the-art telemetry techniques for hot and cold data detection are ineffective at the terabyte scale.
We propose Telescope, a novel technique that profiles different levels of the application's page table tree for fast and efficient identification of hot and cold data. Telescope is based on the observation that, for a memory- and TLB-intensive workload, higher levels of a page table tree are also frequently accessed during a hardware page table walk. Hence, the hotness of the higher levels of the page table tree essentially captures the hotness of its subtrees or address space sub-regions at a coarser granularity. We exploit this insight to quickly converge on even a few megabytes of hot data and efficiently identify several gigabytes of cold data in terabyte-scale applications. Importantly, such a technique can seamlessly scale to petabyte-scale applications.
Telescope's telemetry achieves 90%+ precision and recall at just 0.009% single CPU utilization for microbenchmarks with a 5 TB memory footprint. Memory tiering based on Telescope results in 5.6% to 34% throughput improvement for real-world benchmarks with a 1-2 TB memory footprint compared to other state-of-the-art telemetry techniques.
△ Less
Submitted 29 November, 2023; v1 submitted 16 November, 2023;
originally announced November 2023.
-
ADaPT: As-Needed Decomposition and Planning with Language Models
Authors:
Archiki Prasad,
Alexander Koller,
Mareike Hartmann,
Peter Clark,
Ashish Sabharwal,
Mohit Bansal,
Tushar Khot
Abstract:
Large Language Models (LLMs) are increasingly being used for interactive decision-making tasks requiring planning and adapting to the environment. Recent works employ LLMs-as-agents in broadly two ways: iteratively determining the next action (iterative executors) or generating plans and executing sub-tasks using LLMs (plan-and-execute). However, these methods struggle with task complexity, as the…
▽ More
Large Language Models (LLMs) are increasingly being used for interactive decision-making tasks requiring planning and adapting to the environment. Recent works employ LLMs-as-agents in broadly two ways: iteratively determining the next action (iterative executors) or generating plans and executing sub-tasks using LLMs (plan-and-execute). However, these methods struggle with task complexity, as the inability to execute any sub-task may lead to task failure. To address these shortcomings, we introduce As-Needed Decomposition and Planning for complex Tasks (ADaPT), an approach that explicitly plans and decomposes complex sub-tasks as-needed, i.e., when the LLM is unable to execute them. ADaPT recursively decomposes sub-tasks to adapt to both task complexity and LLM capability. Our results demonstrate that ADaPT substantially outperforms established strong baselines, achieving success rates up to 28.3% higher in ALFWorld, 27% in WebShop, and 33% in TextCraft -- a novel compositional dataset that we introduce. Through extensive analysis, we illustrate the importance of multilevel decomposition and establish that ADaPT dynamically adjusts to the capabilities of the executor LLM as well as to task complexity.
△ Less
Submitted 8 April, 2024; v1 submitted 8 November, 2023;
originally announced November 2023.
-
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models
Authors:
Archiki Prasad,
Elias Stengel-Eskin,
Mohit Bansal
Abstract:
An increasing number of vision-language tasks can be handled with little to no training, i.e., in a zero and few-shot manner, by marrying large language models (LLMs) to vision encoders, resulting in large vision-language models (LVLMs). While this has huge upsides, such as not requiring training data or custom architectures, how an input is presented to an LVLM can have a major impact on zero-sho…
▽ More
An increasing number of vision-language tasks can be handled with little to no training, i.e., in a zero and few-shot manner, by marrying large language models (LLMs) to vision encoders, resulting in large vision-language models (LVLMs). While this has huge upsides, such as not requiring training data or custom architectures, how an input is presented to an LVLM can have a major impact on zero-shot model performance. In particular, inputs phrased in an underspecified way can result in incorrect answers due to factors like missing visual information, complex implicit reasoning, or linguistic ambiguity. Therefore, adding visually-grounded information to the input as a preemptive clarification should improve model performance by reducing underspecification, e.g., by localizing objects and disambiguating references. Similarly, in the VQA setting, changing the way questions are framed can make them easier for models to answer. To this end, we present Rephrase, Augment and Reason (RepARe), a gradient-free framework that extracts salient details about the image using the underlying LVLM as a captioner and reasoner, in order to propose modifications to the original question. We then use the LVLM's confidence over a generated answer as an unsupervised scoring function to select the rephrased question most likely to improve zero-shot performance. Focusing on three visual question answering tasks, we show that RepARe can result in a 3.85% (absolute) increase in zero-shot accuracy on VQAv2, 6.41%, and 7.94% points increase on A-OKVQA, and VizWiz respectively. Additionally, we find that using gold answers for oracle question candidate selection achieves a substantial gain in VQA accuracy by up to 14.41%. Through extensive analysis, we demonstrate that outputs from RepARe increase syntactic complexity, and effectively utilize vision-language interaction and the frozen LLM.
△ Less
Submitted 2 April, 2024; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Motivating Next-Generation OS Physical Memory Management for Terabyte-Scale NVMMs
Authors:
Shivank Garg,
Aravinda Prasad,
Debadatta Mishra,
Sreenivas Subramoney
Abstract:
Software managed byte-addressable hybrid memory systems consisting of DRAMs and NVMMs offer a lot of flexibility to design efficient large scale data processing applications. Operating systems (OS) play an important role in enabling the applications to realize the integrated benefits of DRAMs' low access latency and NVMMs' large capacity along with its persistent characteristics. In this paper, we…
▽ More
Software managed byte-addressable hybrid memory systems consisting of DRAMs and NVMMs offer a lot of flexibility to design efficient large scale data processing applications. Operating systems (OS) play an important role in enabling the applications to realize the integrated benefits of DRAMs' low access latency and NVMMs' large capacity along with its persistent characteristics. In this paper, we comprehensively analyze the performance of conventional OS physical memory management subsystems that were designed only based on the DRAM memory characteristics in the context of modern hybrid byte-addressable memory systems.
To study the impact of high access latency and large capacity of NVMMs on physical memory management, we perform an extensive evaluation on Linux with Intel's Optane NVMM. We observe that the core memory management functionalities such as page allocation are negatively impacted by high NVMM media latency, while functionalities such as conventional fragmentation management are rendered inadequate. We also demonstrate that certain traditional memory management functionalities are affected by neither aspects of modern NVMMs. We conclusively motivate the need to overhaul fundamental aspects of traditional OS physical memory management in order to fully exploit terabyte-scale NVMMs.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
On Supermodular Contracts and Dense Subgraphs
Authors:
Ramiro Deo-Campo Vuong,
Shaddin Dughmi,
Neel Patel,
Aditya Prasad
Abstract:
We study the combinatorial contract design problem, introduced and studied by Dutting et. al. (2021, 2022), in both the single and multi-agent settings. Prior work has examined the problem when the principal's utility function is submodular in the actions chosen by the agent(s).
We complement this emerging literature with an examination of the problem when the principal's utility is supermodular…
▽ More
We study the combinatorial contract design problem, introduced and studied by Dutting et. al. (2021, 2022), in both the single and multi-agent settings. Prior work has examined the problem when the principal's utility function is submodular in the actions chosen by the agent(s).
We complement this emerging literature with an examination of the problem when the principal's utility is supermodular.
In the single-agent setting, we obtain a strongly polynomial time algorithm for the optimal contract.
This stands in contrast to the NP-hardness of the problem with submodular principal utility due to Dutting et. al. (2021).
This result has two technical components, the first of which applies beyond supermodular or submodular utilities.
This result strengthens and simplifies analogous enumeration algorithms from Dutting et. al. (2021), and applies to any nondecreasing valuation function for the principal.
Second, we show that supermodular valuations lead to a polynomial number of breakpoints, analogous to a similar result by Dutting et. al. (2021) for gross substitutes valuations.
In the multi-agent setting, we obtain a mixed bag of positive and negative results.
First, we show that it is NP-hard to obtain any finite multiplicative approximation, or an additive FPTAS.
This stands in contrast to the submodular case, where efficient computation of approximately optimal contracts was shown by Dutting et. al. (2022).
Second, we derive an additive PTAS for the problem in the instructive special case of graph-based supermodular valuations, and equal costs.
En-route to this result, we discover an intimate connection between the multi-agent contract problem and the notorious k-densest subgraph problem.
We build on and combine techniques from the literature on dense subgraph problems to obtain our additive PTAS.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Grain and Grain Boundary Segmentation using Machine Learning with Real and Generated Datasets
Authors:
Peter Warren,
Nandhini Raju,
Abhilash Prasad,
Shajahan Hossain,
Ramesh Subramanian,
Jayanta Kapat,
Navin Manjooran,
Ranajay Ghosh
Abstract:
We report significantly improved accuracy of grain boundary segmentation using Convolutional Neural Networks (CNN) trained on a combination of real and generated data. Manual segmentation is accurate but time-consuming, and existing computational methods are faster but often inaccurate. To combat this dilemma, machine learning models can be used to achieve the accuracy of manual segmentation and h…
▽ More
We report significantly improved accuracy of grain boundary segmentation using Convolutional Neural Networks (CNN) trained on a combination of real and generated data. Manual segmentation is accurate but time-consuming, and existing computational methods are faster but often inaccurate. To combat this dilemma, machine learning models can be used to achieve the accuracy of manual segmentation and have the efficiency of a computational method. An extensive dataset of from 316L stainless steel samples is additively manufactured, prepared, polished, etched, and then microstructure grain images were systematically collected. Grain segmentation via existing computational methods and manual (by-hand) were conducted, to create "real" training data. A Voronoi tessellation pattern combined with random synthetic noise and simulated defects, is developed to create a novel artificial grain image fabrication method. This provided training data supplementation for data-intensive machine learning methods. The accuracy of the grain measurements from microstructure images segmented via computational methods and machine learning methods proposed in this work are calculated and compared to provide much benchmarks in grain segmentation. Over 400 images of the microstructure of stainless steel samples were manually segmented for machine learning training applications. This data and the artificial data is available on Kaggle.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues
Authors:
Yue Feng,
Yunlong Jiao,
Animesh Prasad,
Nikolaos Aletras,
Emine Yilmaz,
Gabriella Kazai
Abstract:
User Satisfaction Modeling (USM) is one of the popular choices for task-oriented dialogue systems evaluation, where user satisfaction typically depends on whether the user's task goals were fulfilled by the system. Task-oriented dialogue systems use task schema, which is a set of task attributes, to encode the user's task goals. Existing studies on USM neglect explicitly modeling the user's task g…
▽ More
User Satisfaction Modeling (USM) is one of the popular choices for task-oriented dialogue systems evaluation, where user satisfaction typically depends on whether the user's task goals were fulfilled by the system. Task-oriented dialogue systems use task schema, which is a set of task attributes, to encode the user's task goals. Existing studies on USM neglect explicitly modeling the user's task goals fulfillment using the task schema. In this paper, we propose SG-USM, a novel schema-guided user satisfaction modeling framework. It explicitly models the degree to which the user's preferences regarding the task attributes are fulfilled by the system for predicting the user's satisfaction level. SG-USM employs a pre-trained language model for encoding dialogue context and task attributes. Further, it employs a fulfillment representation layer for learning how many task attributes have been fulfilled in the dialogue, an importance predictor component for calculating the importance of task attributes. Finally, it predicts the user satisfaction based on task attribute fulfillment and task attribute importance. Experimental results on benchmark datasets (i.e. MWOZ, SGD, ReDial, and JDDC) show that SG-USM consistently outperforms competitive existing methods. Our extensive analysis demonstrates that SG-USM can improve the interpretability of user satisfaction modeling, has good scalability as it can effectively deal with unseen tasks and can also effectively work in low-resource settings by leveraging unlabeled data.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding
Authors:
Juan Zuluaga-Gomez,
Iuliia Nigmatulina,
Amrutha Prasad,
Petr Motlicek,
Driss Khalil,
Srikanth Madikeri,
Allan Tart,
Igor Szoke,
Vincent Lenders,
Mickael Rigault,
Khalid Choukri
Abstract:
Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). This task requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts have been made to integrate artificial intelligence (AI) into ATC in order to reduce the workload of ATCos. However, the development of data-driven AI…
▽ More
Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). This task requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts have been made to integrate artificial intelligence (AI) into ATC in order to reduce the workload of ATCos. However, the development of data-driven AI systems for ATC demands large-scale annotated datasets, which are currently lacking in the field. This paper explores the lessons learned from the ATCO2 project, a project that aimed to develop a unique platform to collect and preprocess large amounts of ATC data from airspace in real time. Audio and surveillance data were collected from publicly accessible radio frequency channels with VHF receivers owned by a community of volunteers and later uploaded to Opensky Network servers, which can be considered an "unlimited source" of data. In addition, this paper reviews previous work from ATCO2 partners, including (i) robust automatic speech recognition, (ii) natural language processing, (iii) English language identification of ATC communications, and (iv) the integration of surveillance data such as ADS-B. We believe that the pipeline developed during the ATCO2 project, along with the open-sourcing of its data, will encourage research in the ATC field. A sample of the ATCO2 corpus is available on the following website: https://www.atco2.org/data, while the full corpus can be purchased through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. We demonstrated that ATCO2 is an appropriate dataset to develop ASR engines when little or near to no ATC in-domain data is available. For instance, with the CNN-TDNNf kaldi model, we reached the performance of as low as 17.9% and 24.9% WER on public ATC datasets which is 6.6/7.6% better than "out-of-domain" but supervised CNN-TDNNf model.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness
Authors:
Archiki Prasad,
Swarnadeep Saha,
Xiang Zhou,
Mohit Bansal
Abstract:
Multi-step reasoning ability is fundamental to many natural language tasks, yet it is unclear what constitutes a good reasoning chain and how to evaluate them. Most existing methods focus solely on whether the reasoning chain leads to the correct conclusion, but this answer-oriented view may confound reasoning quality with other spurious shortcuts to predict the answer. To bridge this gap, we eval…
▽ More
Multi-step reasoning ability is fundamental to many natural language tasks, yet it is unclear what constitutes a good reasoning chain and how to evaluate them. Most existing methods focus solely on whether the reasoning chain leads to the correct conclusion, but this answer-oriented view may confound reasoning quality with other spurious shortcuts to predict the answer. To bridge this gap, we evaluate reasoning chains by viewing them as informal proofs that derive the final answer. Specifically, we propose ReCEval (Reasoning Chain Evaluation), a framework that evaluates reasoning chains via two key properties: (1) correctness, i.e., each step makes a valid inference based on information contained within the step, preceding steps, and input context, and (2) informativeness, i.e., each step provides new information that is helpful towards deriving the generated answer. We evaluate these properties by developing metrics using natural language inference models and V-Information. On multiple datasets, we show that ReCEval effectively identifies various error types and yields notable improvements compared to prior methods. We analyze the impact of step boundaries, and previous steps on evaluating correctness and demonstrate that our informativeness metric captures the expected flow of information in high-quality reasoning chains. Finally, we show that scoring reasoning chains based on ReCEval improves downstream task performance. Our code is publicly available at: https://github.com/archiki/ReCEval
△ Less
Submitted 30 November, 2023; v1 submitted 20 April, 2023;
originally announced April 2023.
-
A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers
Authors:
Juan Zuluaga-Gomez,
Amrutha Prasad,
Iuliia Nigmatulina,
Petr Motlicek,
Matthias Kleinert
Abstract:
In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI) based tools. The virtual simulation-pilot engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and understanding. Thus, it goes beyond only transcribing the co…
▽ More
In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI) based tools. The virtual simulation-pilot engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and understanding. Thus, it goes beyond only transcribing the communication and can also understand its meaning. The output is subsequently sent to a response generator system, which resembles the spoken read back that pilots give to the ATCo trainees. The overall pipeline is composed of the following submodules: (i) automatic speech recognition (ASR) system that transforms audio into a sequence of words; (ii) high-level air traffic control (ATC) related entity parser that understands the transcribed voice communication; and (iii) a text-to-speech submodule that generates a spoken utterance that resembles a pilot based on the situation of the dialogue. Our system employs state-of-the-art AI-based tools such as Wav2Vec 2.0, Conformer, BERT and Tacotron models. To the best of our knowledge, this is the first work fully based on open-source ATC resources and AI tools. In addition, we have developed a robust and modular system with optional submodules that can enhance the system's performance by incorporating real-time surveillance data, metadata related to exercises (such as sectors or runways), or even introducing a deliberate read-back error to train ATCo trainees to identify them. Our ASR system can reach as low as 5.5% and 15.9% word error rates (WER) on high and low-quality ATC audio. We also demonstrate that adding surveillance data into the ASR can yield callsign detection accuracy of more than 96%.
△ Less
Submitted 16 April, 2023;
originally announced April 2023.
-
Rapid Development of Compositional AI
Authors:
Lee Martie,
Jessie Rosenberg,
Veronique Demers,
Gaoyuan Zhang,
Onkar Bhardwaj,
John Henning,
Aditya Prasad,
Matt Stallone,
Ja Young Lee,
Lucy Yip,
Damilola Adesina,
Elahe Paikari,
Oscar Resendiz,
Sarah Shaw,
David Cox
Abstract:
Compositional AI systems, which combine multiple artificial intelligence components together with other application components to solve a larger problem, have no known pattern of development and are often approached in a bespoke and ad hoc style. This makes development slower and harder to reuse for future applications. To support the full rapid development cycle of compositional AI applications,…
▽ More
Compositional AI systems, which combine multiple artificial intelligence components together with other application components to solve a larger problem, have no known pattern of development and are often approached in a bespoke and ad hoc style. This makes development slower and harder to reuse for future applications. To support the full rapid development cycle of compositional AI applications, we have developed a novel framework called (Bee)* (written as a regular expression and pronounced as "beestar"). We illustrate how (Bee)* supports building integrated, scalable, and interactive compositional AI applications with a simplified developer experience.
△ Less
Submitted 12 February, 2023;
originally announced February 2023.
-
XCRYPT: Accelerating Lattice Based Cryptography with Memristor Crossbar Arrays
Authors:
Sarabjeet Singh,
Xiong Fan,
Ananth Krishna Prasad,
Lin Jia,
Anirban Nag,
Rajeev Balasubramonian,
Mahdi Nazm Bojnordi,
Elaine Shi
Abstract:
This paper makes a case for accelerating lattice-based post quantum cryptography (PQC) with memristor based crossbars, and shows that these inherently error-tolerant algorithms are a good fit for noisy analog MAC operations in crossbars. We compare different NIST round-3 lattice-based candidates for PQC, and identify that SABER is not only a front-runner when executing on traditional systems, but…
▽ More
This paper makes a case for accelerating lattice-based post quantum cryptography (PQC) with memristor based crossbars, and shows that these inherently error-tolerant algorithms are a good fit for noisy analog MAC operations in crossbars. We compare different NIST round-3 lattice-based candidates for PQC, and identify that SABER is not only a front-runner when executing on traditional systems, but it is also amenable to acceleration with crossbars. SABER is a module-LWR based approach, which performs modular polynomial multiplications with rounding. We map the polynomial multiplications in SABER on crossbars and show that analog dot-products can yield a $1.7-32.5\times$ performance and energy efficiency improvement, compared to recent hardware proposals. This initial design combines the innovations in multiple state-of-the-art works -- the algorithm in SABER and the memristive acceleration principles proposed in ISAAC (for deep neural network acceleration). We then identify the bottlenecks in this initial design and introduce several additional techniques to improve its efficiency. These techniques are synergistic and especially benefit from SABER's power-of-two modulo operation. First, we show that some of the software techniques used in SABER, that are effective on CPU platforms, are unhelpful in crossbar-based accelerators. Relying on simpler algorithms further improves our efficiencies by $1.3-3.6\times$. Second, we exploit the nature of SABER's computations to stagger the operations in crossbars and share a few variable precision ADCs, resulting in up to $1.8\times$ higher efficiency. Third, to further reduce ADC pressure, we propose a simple analog Shift-and-Add technique, which results in a $1.3-6.3\times$ increase in the efficiency. Overall, our designs achieve $3-15\times$ higher efficiency over initial design, and $3-51\times$ higher than prior work.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
A systematic literature review on Internet of Vehicles Security
Authors:
Priyank Sharma,
Meet Patel,
Apoorva Prasad
Abstract:
The Internet of Vehicles IoV commonly referred to as connected automobiles is a vast network that connects various entities including users sensors and vehicles They will connect across a network to lessen traffic accidents and improve both the security and safety of smart vehicles The Internet of Vehicles is subject to a wide variety of threats including spoofing attacks recognition attacks priva…
▽ More
The Internet of Vehicles IoV commonly referred to as connected automobiles is a vast network that connects various entities including users sensors and vehicles They will connect across a network to lessen traffic accidents and improve both the security and safety of smart vehicles The Internet of Vehicles is subject to a wide variety of threats including spoofing attacks recognition attacks privacy attacks and verification attacks Our the primary concern when creating any new smart gadget is the users safety which will be improved by identifying solutions to the various cyber threats Therefore we will cover the security of smart automobiles in this literature review including their attacks and solutions.
△ Less
Submitted 16 December, 2022;
originally announced December 2022.
-
Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator
Authors:
Amrutha Prasad,
Juan Zuluaga-Gomez,
Petr Motlicek,
Saeed Sarfjoo,
Iuliia Nigmatulina,
Karel Vesely
Abstract:
This paper describes a simple yet efficient repetition-based modular system for speeding up air-traffic controllers (ATCos) training. E.g., a human pilot is still required in EUROCONTROL's ESCAPE lite simulator (see https://www.eurocontrol.int/simulator/escape) during ATCo training. However, this need can be substituted by an automatic system that could act as a pilot. In this paper, we aim to dev…
▽ More
This paper describes a simple yet efficient repetition-based modular system for speeding up air-traffic controllers (ATCos) training. E.g., a human pilot is still required in EUROCONTROL's ESCAPE lite simulator (see https://www.eurocontrol.int/simulator/escape) during ATCo training. However, this need can be substituted by an automatic system that could act as a pilot. In this paper, we aim to develop and integrate a pseudo-pilot agent into the ATCo training pipeline by merging diverse artificial intelligence (AI) powered modules. The system understands the voice communications issued by the ATCo, and, in turn, it generates a spoken prompt that follows the pilot's phraseology to the initial communication. Our system mainly relies on open-source AI tools and air traffic control (ATC) databases, thus, proving its simplicity and ease of replicability. The overall pipeline is composed of the following: (1) a submodule that receives and pre-processes the input stream of raw audio, (2) an automatic speech recognition (ASR) system that transforms audio into a sequence of words; (3) a high-level ATC-related entity parser, which extracts relevant information from the communication, i.e., callsigns and commands, and finally, (4) a speech synthesizer submodule that generates responses based on the high-level ATC entities previously extracted. Overall, we show that this system could pave the way toward developing a real proof-of-concept pseudo-pilot system. Hence, speeding up the training of ATCos while drastically reducing its overall cost.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications
Authors:
Juan Zuluaga-Gomez,
Karel Veselý,
Igor Szöke,
Alexander Blatt,
Petr Motlicek,
Martin Kocour,
Mickael Rigault,
Khalid Choukri,
Amrutha Prasad,
Seyyed Saeed Sarfjoo,
Iuliia Nigmatulina,
Claudia Cevenini,
Pavel Kolčárek,
Allan Tart,
Jan Černocký,
Dietrich Klakow
Abstract:
Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-h…
▽ More
Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-high frequency radio channels. In order to incorporate these novel technologies into ATC (low-resource domain), large-scale annotated datasets are required to develop the data-driven AI systems. Two examples are automatic speech recognition (ASR) and natural language understanding (NLU). In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data. The ATCO2 corpus covers 1) data collection and pre-processing, 2) pseudo-annotations of speech data, and 3) extraction of ATC-related named entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set corpus contains 4 hours of ATC speech with manual transcripts and a subset with gold annotations for named-entity recognition (callsign, command, value). 2) The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched with automatic transcripts from an in-domain speech recognizer, contextual information, speaker turn information, signal-to-noise ratio estimate and English language detection score per sample. Both available for purchase through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3) The ATCO2-test-set-1h corpus is a one-hour subset from the original test set corpus, that we are offering for free at https://www.atco2.org/data. We expect the ATCO2 corpus will foster research on robust ASR and NLU not only in the field of ATC communications but also in the general research community.
△ Less
Submitted 15 June, 2023; v1 submitted 8 November, 2022;
originally announced November 2022.
-
An Open Source Interactive Visual Analytics Tool for Comparative Programming Comprehension
Authors:
Ayush Kumar,
Ashish Kumar,
Aakanksha Prasad,
Michael Burch,
Shenghui Cheng,
Klaus Mueller
Abstract:
This paper proposes an open source visual analytics tool consisting of several views and perspectives on eye movement data collected during code reading tasks when writing computer programs. Hence the focus of this work is on code and program comprehension. The source code is shown as a visual stimulus. It can be inspected in combination with overlaid scanpaths in which the saccades can be visuall…
▽ More
This paper proposes an open source visual analytics tool consisting of several views and perspectives on eye movement data collected during code reading tasks when writing computer programs. Hence the focus of this work is on code and program comprehension. The source code is shown as a visual stimulus. It can be inspected in combination with overlaid scanpaths in which the saccades can be visually encoded in several forms, including straight, curved, and orthogonal lines, modifiable by interaction techniques. The tool supports interaction techniques like filter functions, aggregations, data sampling, and many more. We illustrate the usefulness of our tool by applying it to the eye movements of 216 programmers of multiple expertise levels that were collected during two code comprehension tasks. Our tool helped to analyze the difference between the strategic program comprehension of programmers based on their demographic background, time taken to complete the task, choice of programming task, and expertise.
△ Less
Submitted 29 July, 2022;
originally announced August 2022.
-
SNE: an Energy-Proportional Digital Accelerator for Sparse Event-Based Convolutions
Authors:
Alfio Di Mauro,
Arpan Suravi Prasad,
Zhikai Huang,
Matteo Spallanzani,
Francesco Conti,
Luca Benini
Abstract:
Event-based sensors are drawing increasing attention due to their high temporal resolution, low power consumption, and low bandwidth. To efficiently extract semantically meaningful information from sparse data streams produced by such sensors, we present a 4.5TOP/s/W digital accelerator capable of performing 4-bits-quantized event-based convolutional neural networks (eCNN). Compared to standard co…
▽ More
Event-based sensors are drawing increasing attention due to their high temporal resolution, low power consumption, and low bandwidth. To efficiently extract semantically meaningful information from sparse data streams produced by such sensors, we present a 4.5TOP/s/W digital accelerator capable of performing 4-bits-quantized event-based convolutional neural networks (eCNN). Compared to standard convolutional engines, our accelerator performs a number of operations proportional to the number of events contained into the input data stream, ultimately achieving a high energy-to-information processing proportionality. On the IBM-DVS-Gesture dataset, we report 80uJ/inf to 261uJ/inf, respectively, when the input activity is 1.2% and 4.9%. Our accelerator consumes 0.221pJ/SOP, to the best of our knowledge it is the lowest energy/OP reported on a digital neuromorphic engine.
△ Less
Submitted 29 April, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications
Authors:
Juan Zuluaga-Gomez,
Amrutha Prasad,
Iuliia Nigmatulina,
Saeed Sarfjoo,
Petr Motlicek,
Matthias Kleinert,
Hartmut Helmke,
Oliver Ohneiser,
Qingran Zhan
Abstract:
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine-tuned on downstream tasks e.g., automatic speech recognition (ASR). Yet, few works investigated the impact on performance when the data properties substantially differ between the pre-training and fine-tuning phases, termed d…
▽ More
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine-tuned on downstream tasks e.g., automatic speech recognition (ASR). Yet, few works investigated the impact on performance when the data properties substantially differ between the pre-training and fine-tuning phases, termed domain shift. We target this scenario by analyzing the robustness of Wav2Vec 2.0 and XLS-R models on downstream ASR for a completely unseen domain, air traffic control (ATC) communications. We benchmark these two models on several open-source and challenging ATC databases with signal-to-noise ratio between 5 and 20 dB. Relative word error rate (WER) reductions between 20% to 40% are obtained in comparison to hybrid-based ASR baselines by only fine-tuning E2E acoustic models with a smaller fraction of labeled data. We analyze WERs on the low-resource scenario and gender bias carried by one ATC dataset.
△ Less
Submitted 17 October, 2022; v1 submitted 31 March, 2022;
originally announced March 2022.
-
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models
Authors:
Archiki Prasad,
Peter Hase,
Xiang Zhou,
Mohit Bansal
Abstract:
Providing natural language instructions in prompts is a useful new paradigm for improving task performance of large language models in a zero-shot setting. Recent work has aimed to improve such prompts via manual rewriting or gradient-based tuning. However, manual rewriting is time-consuming and requires subjective interpretation, while gradient-based tuning can be extremely computationally demand…
▽ More
Providing natural language instructions in prompts is a useful new paradigm for improving task performance of large language models in a zero-shot setting. Recent work has aimed to improve such prompts via manual rewriting or gradient-based tuning. However, manual rewriting is time-consuming and requires subjective interpretation, while gradient-based tuning can be extremely computationally demanding for large models and may not be feasible for API-based models. In this work, we introduce Gradient-free Instructional Prompt Search (GrIPS), a gradient-free, edit-based search approach for improving task instructions for large language models. GrIPS takes in instructions designed for humans and automatically returns an improved, edited prompt, while allowing for API-based tuning. With InstructGPT models, GrIPS improves the average task performance by up to 4.30 percentage points on eight classification tasks from the Natural Instructions dataset (with similar improvements for OPT, BLOOM, and FLAN-T5). We see improvements for both instruction-only prompts and instruction + k-shot examples prompts. Notably, GrIPS outperforms manual rewriting and purely example-based prompts while controlling for the available compute and data budget. Further, performance of GrIPS is comparable to select gradient-based tuning approaches. Qualitatively, we show our edits can simplify instructions and at times make them incoherent but nonetheless improve accuracy. Our code is available at: https://github.com/archiki/GrIPS
△ Less
Submitted 26 April, 2023; v1 submitted 14 March, 2022;
originally announced March 2022.
-
Distribution augmentation for low-resource expressive text-to-speech
Authors:
Mateusz Lajszczak,
Animesh Prasad,
Arent van Korlaar,
Bajibabu Bollepalli,
Antonio Bonafonte,
Arnaud Joly,
Marco Nicolis,
Alexis Moinet,
Thomas Drugman,
Trevor Wood,
Elena Sokolova
Abstract:
This paper presents a novel data augmentation technique for text-to-speech (TTS), that allows to generate new (text, audio) training examples without requiring any additional data. Our goal is to increase diversity of text conditionings available during training. This helps to reduce overfitting, especially in low-resource settings. Our method relies on substituting text and audio fragments in a w…
▽ More
This paper presents a novel data augmentation technique for text-to-speech (TTS), that allows to generate new (text, audio) training examples without requiring any additional data. Our goal is to increase diversity of text conditionings available during training. This helps to reduce overfitting, especially in low-resource settings. Our method relies on substituting text and audio fragments in a way that preserves syntactical correctness. We take additional measures to ensure that synthesized speech does not contain artifacts caused by combining inconsistent audio samples. The perceptual evaluations show that our method improves speech quality over a number of datasets, speakers, and TTS architectures. We also demonstrate that it greatly improves robustness of attention-based TTS models.
△ Less
Submitted 19 February, 2022; v1 submitted 13 February, 2022;
originally announced February 2022.
-
A two-step approach to leverage contextual data: speech recognition in air-traffic communications
Authors:
Iuliia Nigmatulina,
Juan Zuluaga-Gomez,
Amrutha Prasad,
Seyyed Saeed Sarfjoo,
Petr Motlicek
Abstract:
Automatic Speech Recognition (ASR), as the assistance of speech communication between pilots and air-traffic controllers, can significantly reduce the complexity of the task and increase the reliability of transmitted information. ASR application can lead to a lower number of incidents caused by misunderstanding and improve air traffic management (ATM) efficiency. Evidently, high accuracy predicti…
▽ More
Automatic Speech Recognition (ASR), as the assistance of speech communication between pilots and air-traffic controllers, can significantly reduce the complexity of the task and increase the reliability of transmitted information. ASR application can lead to a lower number of incidents caused by misunderstanding and improve air traffic management (ATM) efficiency. Evidently, high accuracy predictions, especially, of key information, i.e., callsigns and commands, are required to minimize the risk of errors. We prove that combining the benefits of ASR and Natural Language Processing (NLP) methods to make use of surveillance data (i.e. additional modality) helps to considerably improve the recognition of callsigns (named entity). In this paper, we investigate a two-step callsign boosting approach: (1) at the 1 step (ASR), weights of probable callsign n-grams are reduced in G.fst and/or in the decoding FST (lattices), (2) at the 2 step (NLP), callsigns extracted from the improved recognition outputs with Named Entity Recognition (NER) are correlated with the surveillance data to select the most suitable one. Boosting callsign n-grams with the combination of ASR and NLP methods eventually leads up to 53.7% of an absolute, or 60.4% of a relative, improvement in callsign recognition.
△ Less
Submitted 8 February, 2022;
originally announced February 2022.
-
Dependence model assessment and selection with DecoupleNets
Authors:
Marius Hofert,
Avinash Prasad,
Mu Zhu
Abstract:
Neural networks are suggested for learning a map from $d$-dimensional samples with any underlying dependence structure to multivariate uniformity in $d'$ dimensions. This map, termed DecoupleNet, is used for dependence model assessment and selection. If the data-generating dependence model was known, and if it was among the few analytically tractable ones, one such transformation for $d'=d$ is Ros…
▽ More
Neural networks are suggested for learning a map from $d$-dimensional samples with any underlying dependence structure to multivariate uniformity in $d'$ dimensions. This map, termed DecoupleNet, is used for dependence model assessment and selection. If the data-generating dependence model was known, and if it was among the few analytically tractable ones, one such transformation for $d'=d$ is Rosenblatt's transform. DecoupleNets have multiple advantages. For example, they only require an available sample and are applicable to $d'<d$, in particular $d'=2$. This allows for simpler model assessment and selection, both numerically and, because $d'=2$, especially graphically. A graphical assessment method has the advantage of being able to identify why, or in which region of the domain, a candidate model does not provide an adequate fit, thus leading to model selection in particular regions of interest or improved model building strategies in such regions. Through simulation studies with data from various copulas, the feasibility and validity of this novel DecoupleNet approach is demonstrated. Applications to real world data illustrate its usefulness for model assessment and selection.
△ Less
Submitted 5 October, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
RafterNet: Probabilistic predictions in multi-response regression
Authors:
Marius Hofert,
Avinash Prasad,
Mu Zhu
Abstract:
A fully nonparametric approach for making probabilistic predictions in multi-response regression problems is introduced. Random forests are used as marginal models for each response variable and, as novel contribution of the present work, the dependence between the multiple response variables is modeled by a generative neural network. This combined modeling approach of random forests, correspondin…
▽ More
A fully nonparametric approach for making probabilistic predictions in multi-response regression problems is introduced. Random forests are used as marginal models for each response variable and, as novel contribution of the present work, the dependence between the multiple response variables is modeled by a generative neural network. This combined modeling approach of random forests, corresponding empirical marginal residual distributions and a generative neural network is referred to as RafterNet. Multiple datasets serve as examples to demonstrate the flexibility of the approach and its impact for making probabilistic forecasts.
△ Less
Submitted 11 October, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications
Authors:
Juan Zuluaga-Gomez,
Seyyed Saeed Sarfjoo,
Amrutha Prasad,
Iuliia Nigmatulina,
Petr Motlicek,
Karel Ondrej,
Oliver Ohneiser,
Hartmut Helmke
Abstract:
Automatic speech recognition (ASR) allows transcribing the communications between air traffic controllers (ATCOs) and aircraft pilots. The transcriptions are used later to extract ATC named entities, e.g., aircraft callsigns. One common challenge is speech activity detection (SAD) and speaker diarization (SD). In the failure condition, two or more segments remain in the same recording, jeopardizin…
▽ More
Automatic speech recognition (ASR) allows transcribing the communications between air traffic controllers (ATCOs) and aircraft pilots. The transcriptions are used later to extract ATC named entities, e.g., aircraft callsigns. One common challenge is speech activity detection (SAD) and speaker diarization (SD). In the failure condition, two or more segments remain in the same recording, jeopardizing the overall performance. We propose a system that combines SAD and a BERT model to perform speaker change detection and speaker role detection (SRD) by chunking ASR transcripts, i.e., SD with a defined number of speakers together with SRD. The proposed model is evaluated on real-life public ATC databases. Our BERT SD model baseline reaches up to 10% and 20% token-based Jaccard error rate (JER) in public and private ATC databases. We also achieved relative improvements of 32% and 7.7% in JERs and SD error rate (DER), respectively, compared to VBx, a well-known SD system.
△ Less
Submitted 14 October, 2022; v1 submitted 12 October, 2021;
originally announced October 2021.
-
Differentiable Spline Approximations
Authors:
Minsu Cho,
Aditya Balu,
Ameya Joshi,
Anjana Deva Prasad,
Biswajit Khara,
Soumik Sarkar,
Baskar Ganapathysubramanian,
Adarsh Krishnamurthy,
Chinmay Hegde
Abstract:
The paradigm of differentiable programming has significantly enhanced the scope of machine learning via the judicious use of gradient-based optimization. However, standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable, limiting their applicability. Our goal in this paper is to use a new, principled approach to extend grad…
▽ More
The paradigm of differentiable programming has significantly enhanced the scope of machine learning via the judicious use of gradient-based optimization. However, standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable, limiting their applicability. Our goal in this paper is to use a new, principled approach to extend gradient-based optimization to functions well modeled by splines, which encompass a large family of piecewise polynomial models. We derive the form of the (weak) Jacobian of such functions and show that it exhibits a block-sparse structure that can be computed implicitly and efficiently. Overall, we show that leveraging this redesigned Jacobian in the form of a differentiable "layer" in predictive models leads to improved performance in diverse applications such as image segmentation, 3D point cloud reconstruction, and finite element analysis.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
A two-step machine learning approach for crop disease detection: an application of GAN and UAV technology
Authors:
Aaditya Prasad,
Nikhil Mehta,
Matthew Horak,
Wan D. Bae
Abstract:
Automated plant diagnosis is a technology that promises large increases in cost-efficiency for agriculture. However, multiple problems reduce the effectiveness of drones, including the inverse relationship between resolution and speed and the lack of adequate labeled training data. This paper presents a two-step machine learning approach that analyzes low-fidelity and high-fidelity images in seque…
▽ More
Automated plant diagnosis is a technology that promises large increases in cost-efficiency for agriculture. However, multiple problems reduce the effectiveness of drones, including the inverse relationship between resolution and speed and the lack of adequate labeled training data. This paper presents a two-step machine learning approach that analyzes low-fidelity and high-fidelity images in sequence, preserving efficiency as well as accuracy. Two data-generators are also used to minimize class imbalance in the high-fidelity dataset and to produce low-fidelity data that is representative of UAV images. The analysis of applications and methods is conducted on a database of high-fidelity apple tree images which are corrupted with class imbalance. The application begins by generating high-fidelity data using generative networks and then uses this novel data alongside the original high-fidelity data to produce low-fidelity images. A machine-learning identifier identifies plants and labels them as potentially diseased or not. A machine learning classifier is then given the potentially diseased plant images and returns actual diagnoses for these plants. The results show an accuracy of 96.3% for the high-fidelity system and a 75.5% confidence level for our low-fidelity system. Our drone technology shows promising results in accuracy when compared to labor-based methods of diagnosis.
△ Less
Submitted 18 September, 2021;
originally announced September 2021.
-
Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition
Authors:
Amrutha Prasad,
Juan Zuluaga-Gomez,
Petr Motlicek,
Saeed Sarfjoo,
Iuliia Nigmatulina,
Oliver Ohneiser,
Hartmut Helmke
Abstract:
Automatic Speech Recognition (ASR) for air traffic control is generally trained by pooling Air Traffic Controller (ATCO) and pilot data into one set. This is motivated by the fact that pilot's voice communications are more scarce than ATCOs. Due to this data imbalance and other reasons (e.g., varying acoustic conditions), the speech from ATCOs is usually recognized more accurately than from pilots…
▽ More
Automatic Speech Recognition (ASR) for air traffic control is generally trained by pooling Air Traffic Controller (ATCO) and pilot data into one set. This is motivated by the fact that pilot's voice communications are more scarce than ATCOs. Due to this data imbalance and other reasons (e.g., varying acoustic conditions), the speech from ATCOs is usually recognized more accurately than from pilots. Automatically identifying the speaker roles is a challenging task, especially in the case of the noisy voice recordings collected using Very High Frequency (VHF) receivers or due to the unavailability of the push-to-talk (PTT) signal, i.e., both audio channels are mixed. In this work, we propose to (1) automatically segment the ATCO and pilot data based on an intuitive approach exploiting ASR transcripts and (2) subsequently consider an automatic recognition of ATCOs' and pilots' voice as two separate tasks. Our work is performed on VHF audio data with high noise levels, i.e., signal-to-noise (SNR) ratios below 15 dB, as this data is recognized to be helpful for various speech-based machine-learning tasks. Specifically, for the speaker role identification task, the module is represented by a simple yet efficient knowledge-based system exploiting a grammar defined by the International Civil Aviation Organization (ICAO). The system accepts text as the input, either manually verified annotations or automatically generated transcripts. The developed approach provides an average accuracy in speaker role identification of about 83%. Finally, we show that training an acoustic model for ASR tasks separately (i.e., separate models for ATCOs and pilots) or using a multitask approach is well suited for the noisy data and outperforms the traditional ASR system where all data is pooled together.
△ Less
Submitted 14 December, 2022; v1 submitted 27 August, 2021;
originally announced August 2021.
-
Heavy-tailed Streaming Statistical Estimation
Authors:
Che-Ping Tsai,
Adarsh Prasad,
Sivaraman Balakrishnan,
Pradeep Ravikumar
Abstract:
We consider the task of heavy-tailed statistical estimation given streaming $p$-dimensional samples. This could also be viewed as stochastic optimization under heavy-tailed distributions, with an additional $O(p)$ space complexity constraint. We design a clipped stochastic gradient descent algorithm and provide an improved analysis, under a more nuanced condition on the noise of the stochastic gra…
▽ More
We consider the task of heavy-tailed statistical estimation given streaming $p$-dimensional samples. This could also be viewed as stochastic optimization under heavy-tailed distributions, with an additional $O(p)$ space complexity constraint. We design a clipped stochastic gradient descent algorithm and provide an improved analysis, under a more nuanced condition on the noise of the stochastic gradients, which we show is critical when analyzing stochastic optimization problems arising from general statistical estimation problems. Our results guarantee convergence not just in expectation but with exponential concentration, and moreover does so using $O(1)$ batch size. We provide consequences of our results for mean estimation and linear regression. Finally, we provide empirical corroboration of our results and algorithms via synthetic experiments for mean estimation and linear regression.
△ Less
Submitted 25 February, 2022; v1 submitted 25 August, 2021;
originally announced August 2021.
-
Monarch: A Durable Polymorphic Memory For Data Intensive Applications
Authors:
Ananth Krishna Prasad,
Mahdi Nazm Bojnordi
Abstract:
3D die stacking has often been proposed to build large-scale DRAM-based caches. Unfortunately, the power and performance overheads of DRAM limit the efficiency of high-bandwidth memories. Also, DRAM is facing serious scalability challenges that make alternative technologies more appealing. This paper examines Monarch, a resistive 3D stacked memory based on a novel reconfigurable crosspoint array c…
▽ More
3D die stacking has often been proposed to build large-scale DRAM-based caches. Unfortunately, the power and performance overheads of DRAM limit the efficiency of high-bandwidth memories. Also, DRAM is facing serious scalability challenges that make alternative technologies more appealing. This paper examines Monarch, a resistive 3D stacked memory based on a novel reconfigurable crosspoint array called XAM. The XAM array is capable of switching between random access and content-addressable modes, which enables Monarch (i) to better utilize the in-package bandwidth and (ii) to satisfy both the random access memory and associative search requirements of various applications. Moreover, the Monarch controller ensures a given target lifetime for the resistive stack. Our simulation results on a set of parallel memory-intensive applications indicate that Monarch outperforms an ideal DRAM caching by 1.21x on average. For in-memory hash table and string matching workloads, Monarch improves performance up to 12x over the conventional high bandwidth memories.
△ Less
Submitted 19 August, 2021;
originally announced August 2021.
-
The Effectiveness of Intermediate-Task Training for Code-Switched Natural Language Understanding
Authors:
Archiki Prasad,
Mohammad Ali Rehan,
Shreya Pathak,
Preethi Jyothi
Abstract:
While recent benchmarks have spurred a lot of new work on improving the generalization of pretrained multilingual language models on multilingual tasks, techniques to improve code-switched natural language understanding tasks have been far less explored. In this work, we propose the use of bilingual intermediate pretraining as a reliable technique to derive large and consistent performance gains o…
▽ More
While recent benchmarks have spurred a lot of new work on improving the generalization of pretrained multilingual language models on multilingual tasks, techniques to improve code-switched natural language understanding tasks have been far less explored. In this work, we propose the use of bilingual intermediate pretraining as a reliable technique to derive large and consistent performance gains on three different NLP tasks using code-switched text. We achieve substantial absolute improvements of 7.87%, 20.15%, and 10.99%, on the mean accuracies and F1 scores over previous state-of-the-art systems for Hindi-English Natural Language Inference (NLI), Question Answering (QA) tasks, and Spanish-English Sentiment Analysis (SA) respectively. We show consistent performance gains on four different code-switched language-pairs (Hindi-English, Spanish-English, Tamil-English and Malayalam-English) for SA. We also present a code-switched masked language modelling (MLM) pretraining technique that consistently benefits SA compared to standard MLM pretraining using real code-switched text.
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
Thin-Film Smoothed Particle Hydrodynamics Fluid
Authors:
Mengdi Wang,
Yitong Deng,
Xiangxin Kong,
Aditya H. Prasad,
Shiying Xiong,
Bo Zhu
Abstract:
We propose a particle-based method to simulate thin-film fluid that jointly facilitates aggressive surface deformation and vigorous tangential flows. We build our dynamics model from the surface tension driven Navier-Stokes equation with the dimensionality reduced using the asymptotic lubrication theory and customize a set of differential operators based on the weakly compressible Smoothed Particl…
▽ More
We propose a particle-based method to simulate thin-film fluid that jointly facilitates aggressive surface deformation and vigorous tangential flows. We build our dynamics model from the surface tension driven Navier-Stokes equation with the dimensionality reduced using the asymptotic lubrication theory and customize a set of differential operators based on the weakly compressible Smoothed Particle Hydrodynamics (SPH) for evolving pointset surfaces. The key insight is that the compressible nature of SPH, which is unfavorable in its typical usage, is helpful in our application to co-evolve the thickness, calculate the surface tension, and enforce the fluid incompressibility on a thin film. In this way, we are able to two-way couple the surface deformation with the in-plane flows in a physically based manner. We can simulate complex vortical swirls, fingering effects due to Rayleigh-Taylor instability, capillary waves, Newton's interference fringes, and the Marangoni effect on liberally deforming surfaces by presenting both realistic visual results and numerical validations. The particle-based nature of our system also enables it to conveniently handle topology changes and codimension transitions, allowing us to marry the thin-film simulation with a wide gamut of 3D phenomena, such as pinch-off of unstable catenoids, dripping under gravity, merging of droplets, as well as bubble rupture.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
NURBS-Diff: A Differentiable Programming Module for NURBS
Authors:
Anjana Deva Prasad,
Aditya Balu,
Harshil Shah,
Soumik Sarkar,
Chinmay Hegde,
Adarsh Krishnamurthy
Abstract:
Boundary representations (B-reps) using Non-Uniform Rational B-splines (NURBS) are the de facto standard used in CAD, but their utility in deep learning-based approaches is not well researched. We propose a differentiable NURBS module to integrate NURBS representations of CAD models with deep learning methods. We mathematically define the derivatives of the NURBS curves or surfaces with respect to…
▽ More
Boundary representations (B-reps) using Non-Uniform Rational B-splines (NURBS) are the de facto standard used in CAD, but their utility in deep learning-based approaches is not well researched. We propose a differentiable NURBS module to integrate NURBS representations of CAD models with deep learning methods. We mathematically define the derivatives of the NURBS curves or surfaces with respect to the input parameters (control points, weights, and the knot vector). These derivatives are used to define an approximate Jacobian used for performing the "backward" evaluation to train the deep learning models. We have implemented our NURBS module using GPU-accelerated algorithms and integrated it with PyTorch, a popular deep learning framework. We demonstrate the efficacy of our NURBS module in performing CAD operations such as curve or surface fitting and surface offsetting. Further, we show its utility in deep learning for unsupervised point cloud reconstruction and enforce analysis constraints. These examples show that our module performs better for certain deep learning frameworks and can be directly integrated with any deep-learning framework requiring NURBS.
△ Less
Submitted 13 January, 2022; v1 submitted 29 April, 2021;
originally announced April 2021.
-
Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems
Authors:
Juan Zuluaga-Gomez,
Iuliia Nigmatulina,
Amrutha Prasad,
Petr Motlicek,
Karel Veselý,
Martin Kocour,
Igor Szöke
Abstract:
Air traffic management and specifically air-traffic control (ATC) rely mostly on voice communications between Air Traffic Controllers (ATCos) and pilots. In most cases, these voice communications follow a well-defined grammar that could be leveraged in Automatic Speech Recognition (ASR) technologies. The callsign used to address an airplane is an essential part of all ATCo-pilot communications. We…
▽ More
Air traffic management and specifically air-traffic control (ATC) rely mostly on voice communications between Air Traffic Controllers (ATCos) and pilots. In most cases, these voice communications follow a well-defined grammar that could be leveraged in Automatic Speech Recognition (ASR) technologies. The callsign used to address an airplane is an essential part of all ATCo-pilot communications. We propose a two-steps approach to add contextual knowledge during semi-supervised training to reduce the ASR system error rates at recognizing the part of the utterance that contains the callsign. Initially, we represent in a WFST the contextual knowledge (i.e. air-surveillance data) of an ATCo-pilot communication. Then, during Semi-Supervised Learning (SSL) the contextual knowledge is added by second-pass decoding (i.e. lattice re-scoring). Results show that `unseen domains' (e.g. data from airports not present in the supervised training data) are further aided by contextual SSL when compared to standalone SSL. For this task, we introduce the Callsign Word Error Rate (CA-WER) as an evaluation metric, which only assesses ASR performance of the spoken callsign in an utterance. We obtained a 32.1% CA-WER relative improvement applying SSL with an additional 17.5% CA-WER improvement by adding contextual knowledge during SSL on a challenging ATC-based test set gathered from LiveATC.
△ Less
Submitted 27 August, 2021; v1 submitted 8 April, 2021;
originally announced April 2021.
-
Page Table Management for Heterogeneous Memory Systems
Authors:
Sandeep Kumar,
Aravinda Prasad,
Smruti R. Sarangi,
Sreenivas Subramoney
Abstract:
Modern enterprise servers are increasingly embracing tiered memory systems with a combination of low latency DRAMs and large capacity but high latency non-volatile main memories (NVMMs) such as Intel's Optane DC PMM. Prior works have focused on efficient placement and migration of data on a tiered memory system, but have not studied the optimal placement of page tables.
Explicit and efficient pl…
▽ More
Modern enterprise servers are increasingly embracing tiered memory systems with a combination of low latency DRAMs and large capacity but high latency non-volatile main memories (NVMMs) such as Intel's Optane DC PMM. Prior works have focused on efficient placement and migration of data on a tiered memory system, but have not studied the optimal placement of page tables.
Explicit and efficient placement of page tables is crucial for large memory footprint applications with high TLB miss rates because they incur dramatically higher page walk latency when page table pages are placed in NVMM. We show that (i) page table pages can end up on NVMM even when enough DRAM memory is available and (ii) page table pages that spill over to NVMM due to DRAM memory pressure are not migrated back later when memory is available in DRAM.
We study the performance impact of page table placement in a tiered memory system and propose an efficient and transparent page table management technique that (i) applies different placement policies for data and page table pages, (ii) introduces a differentiating policy for page table pages by placing a small but critical part of the page table in DRAM, and (iii) dynamically and judiciously manages the rest of the page table by transparently migrating the page table pages between DRAM and NVMM. Our implementation on a real system equipped with Intel's Optane NVMM running Linux reduces the page table walk cycles by 12% and total cycles by 20% on an average. This improves the runtime by 20% on an average for a set of synthetic and real-world large memory footprint applications when compared with various default Linux kernel techniques.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
On Proximal Policy Optimization's Heavy-tailed Gradients
Authors:
Saurabh Garg,
Joshua Zhanson,
Emilio Parisotto,
Adarsh Prasad,
J. Zico Kolter,
Zachary C. Lipton,
Sivaraman Balakrishnan,
Ruslan Salakhutdinov,
Pradeep Ravikumar
Abstract:
Modern policy gradient algorithms such as Proximal Policy Optimization (PPO) rely on an arsenal of heuristics, including loss clipping and gradient clipping, to ensure successful learning. These heuristics are reminiscent of techniques from robust statistics, commonly used for estimation in outlier-rich (``heavy-tailed'') regimes. In this paper, we present a detailed empirical study to characteriz…
▽ More
Modern policy gradient algorithms such as Proximal Policy Optimization (PPO) rely on an arsenal of heuristics, including loss clipping and gradient clipping, to ensure successful learning. These heuristics are reminiscent of techniques from robust statistics, commonly used for estimation in outlier-rich (``heavy-tailed'') regimes. In this paper, we present a detailed empirical study to characterize the heavy-tailed nature of the gradients of the PPO surrogate reward function. We demonstrate that the gradients, especially for the actor network, exhibit pronounced heavy-tailedness and that it increases as the agent's policy diverges from the behavioral policy (i.e., as the agent goes further off policy). Further examination implicates the likelihood ratios and advantages in the surrogate reward as the main sources of the observed heavy-tailedness. We then highlight issues arising due to the heavy-tailed nature of the gradients. In this light, we study the effects of the standard PPO clipping heuristics, demonstrating that these tricks primarily serve to offset heavy-tailedness in gradients. Thus motivated, we propose incorporating GMOM, a high-dimensional robust estimator, into PPO as a substitute for three clipping tricks. Despite requiring less hyperparameter tuning, our method matches the performance of PPO (with all heuristics enabled) on a battery of MuJoCo continuous control tasks.
△ Less
Submitted 12 July, 2021; v1 submitted 20 February, 2021;
originally announced February 2021.
-
An Investigation of End-to-End Models for Robust Speech Recognition
Authors:
Archiki Prasad,
Preethi Jyothi,
Rajbabu Velmurugan
Abstract:
End-to-end models for robust automatic speech recognition (ASR) have not been sufficiently well-explored in prior work. With end-to-end models, one could choose to preprocess the input speech using speech enhancement techniques and train the model using enhanced speech. Another alternative is to pass the noisy speech as input and modify the model architecture to adapt to noisy speech. A systematic…
▽ More
End-to-end models for robust automatic speech recognition (ASR) have not been sufficiently well-explored in prior work. With end-to-end models, one could choose to preprocess the input speech using speech enhancement techniques and train the model using enhanced speech. Another alternative is to pass the noisy speech as input and modify the model architecture to adapt to noisy speech. A systematic comparison of these two approaches for end-to-end robust ASR has not been attempted before. We address this gap and present a detailed comparison of speech enhancement-based techniques and three different model-based adaptation techniques covering data augmentation, multi-task learning, and adversarial learning for robust ASR. While adversarial learning is the best-performing technique on certain noise types, it comes at the cost of degrading clean speech WER. On other relatively stationary noise types, a new speech enhancement technique outperformed all the model-based adaptation techniques. This suggests that knowledge of the underlying noise type can meaningfully inform the choice of adaptation technique.
△ Less
Submitted 11 February, 2021;
originally announced February 2021.
-
Photo2CAD: Automated 3D solid reconstruction from 2D drawings using OpenCV
Authors:
Ajay B. Harish,
Abhishek Rajendra Prasad
Abstract:
This study showcases the utilisation of OpenCV for extracting features from photos of 2D engineering drawings. These features are then employed to reconstruct 3D CAD models in SCAD format and generate 3D point cloud data similar to LIDAR scans. Many historical mechanical, aerospace, and civil engineering designs exist only as drawings, lacking software-generated CAD or BIM models. While 2D to 3D c…
▽ More
This study showcases the utilisation of OpenCV for extracting features from photos of 2D engineering drawings. These features are then employed to reconstruct 3D CAD models in SCAD format and generate 3D point cloud data similar to LIDAR scans. Many historical mechanical, aerospace, and civil engineering designs exist only as drawings, lacking software-generated CAD or BIM models. While 2D to 3D conversion itself is not novel, the novelty of this work is in the usage of simple photos rather than scans or electronic documentation of 2D drawings. The method can also use scanned drawing data. While the approach is effective for simple shapes, it currently does not address hidden lines in CAD drawings. The Python Jupyter notebook codes developed for this purpose are accessible through GitHub.
△ Less
Submitted 8 September, 2023; v1 submitted 11 January, 2021;
originally announced January 2021.
-
Applications of multivariate quasi-random sampling with neural networks
Authors:
Marius Hofert,
Avinash Prasad,
Mu Zhu
Abstract:
Generative moment matching networks (GMMNs) are suggested for modeling the cross-sectional dependence between stochastic processes. The stochastic processes considered are geometric Brownian motions and ARMA-GARCH models. Geometric Brownian motions lead to an application of pricing American basket call options under dependence and ARMA-GARCH models lead to an application of simulating predictive d…
▽ More
Generative moment matching networks (GMMNs) are suggested for modeling the cross-sectional dependence between stochastic processes. The stochastic processes considered are geometric Brownian motions and ARMA-GARCH models. Geometric Brownian motions lead to an application of pricing American basket call options under dependence and ARMA-GARCH models lead to an application of simulating predictive distributions. In both types of applications the benefit of using GMMNs in comparison to parametric dependence models is highlighted and the fact that GMMNs can produce dependent quasi-random samples with no additional effort is exploited to obtain variance reduction.
△ Less
Submitted 27 August, 2021; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Decentralized Age-of-Information Bandits
Authors:
Archiki Prasad,
Vishal Jain,
Sharayu Moharir
Abstract:
Age-of-Information (AoI) is a performance metric for scheduling systems that measures the freshness of the data available at the intended destination. AoI is formally defined as the time elapsed since the destination received the recent most update from the source. We consider the problem of scheduling to minimize the cumulative AoI in a multi-source multi-channel setting. Our focus is on the sett…
▽ More
Age-of-Information (AoI) is a performance metric for scheduling systems that measures the freshness of the data available at the intended destination. AoI is formally defined as the time elapsed since the destination received the recent most update from the source. We consider the problem of scheduling to minimize the cumulative AoI in a multi-source multi-channel setting. Our focus is on the setting where channel statistics are unknown and we model the problem as a distributed multi-armed bandit problem. For an appropriately defined AoI regret metric, we provide analytical performance guarantees of an existing UCB-based policy for the distributed multi-armed bandit problem. In addition, we propose a novel policy based on Thomson Sampling and a hybrid policy that tries to balance the trade-off between the aforementioned policies. Further, we develop AoI-aware variants of these policies in which each source takes its current AoI into account while making decisions. We compare the performance of various policies via simulations.
△ Less
Submitted 18 January, 2021; v1 submitted 27 September, 2020;
originally announced September 2020.