Skip to main content

Showing 1–6 of 6 results for author: Singhania, P

  1. arXiv:2406.10209  [pdf, other

    cs.CL

    Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

    Authors: Abhimanyu Hans, Yuxin Wen, Neel Jain, John Kirchenbauer, Hamid Kazemi, Prajwal Singhania, Siddharth Singh, Gowthami Somepalli, Jonas Geiping, Abhinav Bhatele, Tom Goldstein

    Abstract: Large language models can memorize and repeat their training data, causing privacy and copyright risks. To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the goldfish loss. During training, a randomly sampled subset of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verba… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 9.5 pages, 8 figures, and 1 table in the main body. Code available at https://github.com/ahans30/goldfish-loss

  2. arXiv:2406.02542  [pdf, other

    cs.LG

    Loki: Low-Rank Keys for Efficient Sparse Attention

    Authors: Prajwal Singhania, Siddharth Singh, Shwai He, Soheil Feizi, Abhinav Bhatele

    Abstract: Inference on large language models can be expensive in terms of the compute and memory costs involved, especially when long sequence lengths are used. In particular, the self-attention mechanism used in such models contributes significantly to these costs, which has resulted in several recent works that propose sparse attention approximations for inference. In this work, we propose to approximate… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2305.13525  [pdf, other

    cs.LG cs.AI cs.DC cs.PF

    A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs

    Authors: Siddharth Singh, Prajwal Singhania, Aditya K. Ranjan, Zack Sating, Abhinav Bhatele

    Abstract: Heavy communication, in particular, collective operations, can become a critical performance bottleneck in scaling the training of billion-parameter neural networks to large-scale parallel systems. This paper introduces a four-dimensional (4D) approach to optimize communication in parallel training. This 4D approach is a hybrid of 3D tensor and data parallelism, and is implemented in the AxoNN fra… ▽ More

    Submitted 14 May, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

  4. arXiv:2009.03482  [pdf, ps, other

    math.OC cs.LG

    Alternating Direction Method of Multipliers for Quantization

    Authors: Tianjian Huang, Prajwal Singhania, Maziar Sanjabi, Pabitra Mitra, Meisam Razaviyayn

    Abstract: Quantization of the parameters of machine learning models, such as deep neural networks, requires solving constrained optimization problems, where the constraint set is formed by the Cartesian product of many simple discrete sets. For such optimization problems, we study the performance of the Alternating Direction Method of Multipliers for Quantization ($\texttt{ADMM-Q}$) algorithm, which is a va… ▽ More

    Submitted 1 March, 2021; v1 submitted 7 September, 2020; originally announced September 2020.

  5. Stance Detection in Web and Social Media: A Comparative Study

    Authors: Shalmoli Ghosh, Prajwal Singhania, Siddharth Singh, Koustav Rudra, Saptarshi Ghosh

    Abstract: Online forums and social media platforms are increasingly being used to discuss topics of varying polarities where different people take different stances. Several methodologies for automatic stance detection from text have been proposed in literature. To our knowledge, there has not been any systematic investigation towards their reproducibility, and their comparative performances. In this work,… ▽ More

    Submitted 12 July, 2020; originally announced July 2020.

    Journal ref: Proceedings of Conference and Labs of the Evaluation Forum (CLEF) 2019; Lecture Notes in Computer Science, vol 11696, pp. 75-87

  6. arXiv:1808.04409  [pdf, other

    cs.SI

    Thou shalt not hate: Countering Online Hate Speech

    Authors: Binny Mathew, Punyajoy Saha, Hardik Tharad, Subham Rajgaria, Prajwal Singhania, Suman Kalyan Maity, Pawan Goyal, Animesh Mukherje

    Abstract: Hate content in social media is ever-increasing. While Facebook, Twitter, Google have attempted to take several steps to tackle the hateful content, they have mostly been unsuccessful. Counterspeech is seen as an effective way of tackling the online hate without any harm to the freedom of speech. Thus, an alternative strategy for these platforms could be to promote counterspeech as a defense again… ▽ More

    Submitted 4 April, 2019; v1 submitted 13 August, 2018; originally announced August 2018.

    Comments: Accepted at ICWSM 2019. 12 Pages, 5 Figures, and 7 Tables. The dataset and models are available here: https://github.com/binny-mathew/Countering_Hate_Speech_ICWSM2019