Skip to main content

Showing 1–12 of 12 results for author: Rawal, A

  1. arXiv:2406.15570  [pdf, other

    cs.CL cs.LG

    DEM: Distribution Edited Model for Training with Mixed Data Distributions

    Authors: Dhananjay Ram, Aditya Rawal, Momchil Hardalov, Nikolaos Pappas, Sheng Zha

    Abstract: Training with mixed data distributions is a common and important part of creating multi-task and instruction-following models. The diversity of the data distributions and cost of joint training makes the optimization procedure extremely challenging. Data mixing methods partially address this problem, albeit having a sub-optimal performance across data sources and require multiple expensive trainin… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  2. arXiv:2403.19725  [pdf, other

    cs.CL cs.AI cs.LG

    MUGC: Machine Generated versus User Generated Content Detection

    Authors: Yaqi Xie, Anjali Rawal, Yujing Cen, Dixuan Zhao, Sunil K Narang, Shanu Sushmita

    Abstract: As advanced modern systems like deep neural networks (DNNs) and generative AI continue to enhance their capabilities in producing convincing and realistic content, the need to distinguish between user-generated and machine generated content is becoming increasingly evident. In this research, we undertake a comparative evaluation of eight traditional machine-learning algorithms to distinguish betwe… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 11 pages, 16 figures

  3. arXiv:2402.17509  [pdf, other

    cs.CL

    Extreme Miscalibration and the Illusion of Adversarial Robustness

    Authors: Vyas Raina, Samson Tan, Volkan Cevher, Aditya Rawal, Sheng Zha, George Karypis

    Abstract: Deep learning-based Natural Language Processing (NLP) models are vulnerable to adversarial attacks, where small perturbations can cause a model to misclassify. Adversarial Training (AT) is often used to increase model robustness. However, we have discovered an intriguing phenomenon: deliberately or accidentally miscalibrating models masks gradients in a way that interferes with adversarial attack… ▽ More

    Submitted 30 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  4. arXiv:2206.14085  [pdf, other

    cs.LG cs.CV

    Continual Learning with Transformers for Image Classification

    Authors: Beyza Ermis, Giovanni Zappella, Martin Wistuba, Aditya Rawal, Cedric Archambeau

    Abstract: In many real-world scenarios, data to train machine learning models become available over time. However, neural network models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is often difficult to prevent due to practical constraints, such as the amount of data that can be stored or the limit… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: Appeared in CVPR CLVision workshop. arXiv admin note: substantial text overlap with arXiv:2203.04640

  5. arXiv:2203.04640  [pdf, other

    cs.CL cs.AI stat.ML

    Memory Efficient Continual Learning with Transformers

    Authors: Beyza Ermis, Giovanni Zappella, Martin Wistuba, Aditya Rawal, Cedric Archambeau

    Abstract: In many real-world scenarios, data to train machine learning models becomes available over time. Unfortunately, these models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is difficult to prevent due to practical constraints. For instance, the amount of data that can be stored or the computa… ▽ More

    Submitted 13 January, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: This paper was published at NeurIPS 2022

  6. arXiv:2005.13092  [pdf, other

    cs.LG stat.ML

    Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search

    Authors: Aditya Rawal, Joel Lehman, Felipe Petroski Such, Jeff Clune, Kenneth O. Stanley

    Abstract: Neural Architecture Search (NAS) explores a large space of architectural motifs -- a compute-intensive process that often involves ground-truth evaluation of each motif by instantiating it within a large network, and training and evaluating the network with thousands of domain-specific data samples. Inspired by how biological motifs such as cells are sometimes extracted from their natural environm… ▽ More

    Submitted 26 May, 2020; originally announced May 2020.

  7. arXiv:2003.08536  [pdf, other

    cs.NE

    Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

    Authors: Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley

    Abstract: Creating open-ended algorithms, which generate their own never-ending stream of novel and appropriately challenging learning opportunities, could help to automate and accelerate progress in machine learning. A recent step in this direction is the Paired Open-Ended Trailblazer (POET), an algorithm that generates and solves its own challenges, and allows solutions to goal-switch between challenges t… ▽ More

    Submitted 13 April, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: 23 pages, 14 figures

  8. arXiv:2002.10585  [pdf, other

    cs.NE

    Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity

    Authors: Thomas Miconi, Aditya Rawal, Jeff Clune, Kenneth O. Stanley

    Abstract: The impressive lifelong learning in animal brains is primarily enabled by plastic changes in synaptic connectivity. Importantly, these changes are not passive, but are actively controlled by neuromodulation, which is itself under the control of the brain. The resulting self-modifying abilities of the brain play an important role in learning and adaptation, and are a major basis for biological rein… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

    Comments: Presented at the 7th International Conference on Learning Representations (ICLR 2019)

    Journal ref: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019

  9. arXiv:1912.07768  [pdf, other

    cs.LG stat.ML

    Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data

    Authors: Felipe Petroski Such, Aditya Rawal, Joel Lehman, Kenneth O. Stanley, Jeff Clune

    Abstract: This paper investigates the intriguing question of whether we can create learning algorithms that automatically generate training data, learning environments, and curricula in order to help AI agents rapidly learn. We show that such algorithms are possible via Generative Teaching Networks (GTNs), a general approach that is, in theory, applicable to supervised, unsupervised, and reinforcement learn… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

  10. arXiv:1910.08461  [pdf, other

    cs.LG stat.ML

    First-Order Preconditioning via Hypergradient Descent

    Authors: Ted Moskovitz, Rui Wang, Janice Lan, Sanyam Kapoor, Thomas Miconi, Jason Yosinski, Aditya Rawal

    Abstract: Standard gradient descent methods are susceptible to a range of issues that can impede training, such as high correlations and different scaling in parameter space.These difficulties can be addressed by second-order approaches that apply a pre-conditioning matrix to the gradient to improve convergence. Unfortunately, such algorithms typically struggle to scale to high-dimensional problems, in part… ▽ More

    Submitted 27 April, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

  11. arXiv:1803.04439  [pdf, other

    cs.NE cs.LG

    From Nodes to Networks: Evolving Recurrent Neural Networks

    Authors: Aditya Rawal, Risto Miikkulainen

    Abstract: Gated recurrent networks such as those composed of Long Short-Term Memory (LSTM) nodes have recently been used to improve state of the art in many sequential processing tasks such as speech recognition and machine translation. However, the basic structure of the LSTM node is essentially the same as when it was first conceived 25 years ago. Recently, evolutionary and reinforcement learning mechanis… ▽ More

    Submitted 7 June, 2018; v1 submitted 12 March, 2018; originally announced March 2018.

  12. arXiv:1703.00548  [pdf, other

    cs.NE cs.AI

    Evolving Deep Neural Networks

    Authors: Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Dan Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duffy, Babak Hodjat

    Abstract: The success of deep learning depends on finding an architecture to fit the task. As deep learning has scaled up to more challenging tasks, the architectures have become difficult to design by hand. This paper proposes an automated method, CoDeepNEAT, for optimizing deep learning architectures through evolution. By extending existing neuroevolution methods to topology, components, and hyperparamete… ▽ More

    Submitted 4 March, 2017; v1 submitted 1 March, 2017; originally announced March 2017.