Skip to main content

Showing 1–8 of 8 results for author: Aneja, J

  1. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  2. arXiv:2306.11644  [pdf, other

    cs.CL cs.AI cs.LG

    Textbooks Are All You Need

    Authors: Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li

    Abstract: We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accu… ▽ More

    Submitted 2 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 26 pages; changed color scheme of plot. fixed minor typos and added couple clarifications

  3. arXiv:2210.01848  [pdf, other

    cs.LG cs.AI cs.CL q-bio.NC stat.ML

    Explaining Patterns in Data with Language Models via Interpretable Autoprompting

    Authors: Chandan Singh, John X. Morris, Jyoti Aneja, Alexander M. Rush, Jianfeng Gao

    Abstract: Large language models (LLMs) have displayed an impressive ability to harness natural language to perform complex tasks. In this work, we explore whether we can leverage this learned ability to find and explain patterns in data. Specifically, given a pre-trained LLM and data examples, we introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explainin… ▽ More

    Submitted 26 January, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: The two first authors contributed equally

  4. arXiv:2204.08790  [pdf, other

    cs.CV cs.CL cs.LG

    ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

    Authors: Chunyuan Li, Haotian Liu, Liunian Harold Li, Pengchuan Zhang, Jyoti Aneja, Jianwei Yang, Ping Jin, Houdong Hu, Zicheng Liu, Yong Jae Lee, Jianfeng Gao

    Abstract: Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks. However, it remains challenging to evaluate the transferablity of these models due to the lack of easy-to-use evaluation toolkits and public bench… ▽ More

    Submitted 13 October, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: NeurIPS 2022 (Datasets and Benchmarks Track). The first two authors contribute equally. Benchmark page: https://computer-vision-in-the-wild.github.io/ELEVATER/

  5. arXiv:2010.02917  [pdf, other

    cs.LG cs.CV stat.ML

    A Contrastive Learning Approach for Training Variational Autoencoder Priors

    Authors: Jyoti Aneja, Alexander Schwing, Jan Kautz, Arash Vahdat

    Abstract: Variational autoencoders (VAEs) are one of the powerful likelihood-based generative models with applications in many domains. However, they struggle to generate high-quality images, especially when samples are obtained from the prior without any tempering. One explanation for VAEs' poor generative quality is the prior hole problem: the prior distribution fails to match the aggregate approximate po… ▽ More

    Submitted 3 November, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: Accepted to NeurIPS 2021

  6. arXiv:1908.08529  [pdf, other

    cs.CV cs.CL cs.LG stat.ML

    Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

    Authors: Jyoti Aneja, Harsh Agrawal, Dhruv Batra, Alexander Schwing

    Abstract: Diverse and accurate vision+language modeling is an important goal to retain creative freedom and maintain user engagement. However, adequately capturing the intricacies of diversity in language models is challenging. Recent works commonly resort to latent variable models augmented with more or less supervision from object detectors or part-of-speech tags. Common to all those methods is the fact t… ▽ More

    Submitted 22 August, 2019; originally announced August 2019.

    Comments: Accepted to ICCV 2019

  7. arXiv:1805.12589  [pdf, other

    cs.CV

    Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech

    Authors: Aditya Deshpande, Jyoti Aneja, Liwei Wang, Alexander Schwing, D. A. Forsyth

    Abstract: Image captioning is an ambiguous problem, with many suitable captions for an image. To address ambiguity, beam search is the de facto method for sampling multiple captions. However, beam search is computationally expensive and known to produce generic captions. To address this concern, some variational auto-encoder (VAE) and generative adversarial net (GAN) based methods have been proposed. Though… ▽ More

    Submitted 10 April, 2019; v1 submitted 31 May, 2018; originally announced May 2018.

    Comments: 12 pages with references and appendix. To appear CVPR'19

  8. arXiv:1711.09151  [pdf, other

    cs.CV

    Convolutional Image Captioning

    Authors: Jyoti Aneja, Aditya Deshpande, Alexander Schwing

    Abstract: Image captioning is an important but challenging task, applicable to virtual assistants, editing tools, image indexing, and support of the disabled. Its challenges are due to the variability and ambiguity of possible image descriptions. In recent years significant progress has been made in image captioning, using Recurrent Neural Networks powered by long-short-term-memory (LSTM) units. Despite mit… ▽ More

    Submitted 24 November, 2017; originally announced November 2017.

    Comments: 11 pages, 9 Figures