subscribe to arXiv mailings

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench). Moreover, we also introduce phi-3-vision, a 4.2 billion parameter model based on phi-3-mini with strong reasoning capabilities for image and text prompts. △ Less

Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 19 pages

arXiv:2206.08674 [pdf, other]

doi 10.1039/D2CP02252D

The Kinetic Energy of PAH Dication and Trication Dissociation Determined by Recoil-Frame Covariance Map Imaging

Authors: Jason W. L. Lee, Denis S. Tikhonov, Felix Allum, Rebecca Boll, Pragya Chopra, Benjamin Erk, Sebastian Gruet, Lanhai He, David Heathcote, Mehdi M. Kazemi, Jan Lahl, Alexander K. Lemmens, Donatella Loru, Sylvain Maclot, Robert Mason, Erland Müller, Terry Mullins, Christopher Passow, Jasper Peschel, Daniel Ramm, Amanda L. Steber, Sadia Bari, Mark Brouard, Michael Burt, Jochen Küpper , et al. (6 additional authors not shown)

Abstract: We investigated the dissociation of dications and trications of three polycyclic aromatic hydrocarbons (PAHs), fluorene, phenanthrene, and pyrene. PAHs are a family of molecules ubiquitous in space and involved in much of the chemistry of the interstellar medium. In our experiments, ions are formed by interaction with 30.3 nm extreme ultraviolet (XUV) photons, and their velocity map images are rec… ▽ More We investigated the dissociation of dications and trications of three polycyclic aromatic hydrocarbons (PAHs), fluorene, phenanthrene, and pyrene. PAHs are a family of molecules ubiquitous in space and involved in much of the chemistry of the interstellar medium. In our experiments, ions are formed by interaction with 30.3 nm extreme ultraviolet (XUV) photons, and their velocity map images are recorded using a PImMS2 multi-mass imaging sensor. Application of recoil-frame covariance analysis allows the total kinetic energy release (TKER) associated with multiple fragmentation channels to be determined to high precision, ranging 1.94-2.60 eV and 2.95-5.29 eV for the dications and trications, respectively. Experimental measurements are supported by Born-Oppenheimer molecular dynamics (BOMD) simulations. △ Less

Submitted 17 June, 2022; originally announced June 2022.

arXiv:2111.01231 [pdf, other]

Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

Authors: Parul Chopra, Sai Krishna Rallabandi, Alan W Black, Khyathi Raghavi Chandu

Abstract: Code-switching (CS), a ubiquitous phenomenon due to the ease of communication it offers in multilingual communities still remains an understudied problem in language processing. The primary reasons behind this are: (1) minimal efforts in leveraging large pretrained multilingual models, and (2) the lack of annotated data. The distinguishing case of low performance of multilingual models in CS is th… ▽ More Code-switching (CS), a ubiquitous phenomenon due to the ease of communication it offers in multilingual communities still remains an understudied problem in language processing. The primary reasons behind this are: (1) minimal efforts in leveraging large pretrained multilingual models, and (2) the lack of annotated data. The distinguishing case of low performance of multilingual models in CS is the intra-sentence mixing of languages leading to switch points. We first benchmark two sequence labeling tasks -- POS and NER on 4 different language pairs with a suite of pretrained models to identify the problems and select the best performing model, char-BERT, among them (addressing (1)). We then propose a self training method to repurpose the existing pretrained models using a switch-point bias by leveraging unannotated data (addressing (2)). We finally demonstrate that our approach performs well on both tasks by reducing the gap between the switch point performance while retaining the overall performance on two distinct language pairs in both the tasks. Our code is available here: https://github.com/PC09/EMNLP2021-Switch-Point-biased-Self-Training. △ Less

Submitted 1 November, 2021; originally announced November 2021.

Comments: Accepted at EMNLP Findings 2021

arXiv:2103.11373 [pdf]

ProgressiveSpinalNet architecture for FC layers

Authors: Praveen Chopra

Abstract: In deeplearning models the FC (fully connected) layer has biggest important role for classification of the input based on the learned features from previous layers. The FC layers has highest numbers of parameters and fine-tuning these large numbers of parameters, consumes most of the computational resources, so in this paper it is aimed to reduce these large numbers of parameters significantly wit… ▽ More In deeplearning models the FC (fully connected) layer has biggest important role for classification of the input based on the learned features from previous layers. The FC layers has highest numbers of parameters and fine-tuning these large numbers of parameters, consumes most of the computational resources, so in this paper it is aimed to reduce these large numbers of parameters significantly with improved performance. The motivation is inspired from SpinalNet and other biological architecture. The proposed architecture has a gradient highway between input to output layers and this solves the problem of diminishing gradient in deep networks. In this all the layers receives the input from previous layers as well as the CNN layer output and this way all layers contribute in decision making with last layer. This approach has improved classification performance over the SpinalNet architecture and has SOTA performance on many datasets such as Caltech101, KMNIST, QMNIST and EMNIST. The source code is available at https://github.com/praveenchopra/ProgressiveSpinalNet. △ Less

Submitted 21 March, 2021; originally announced March 2021.

arXiv:2010.12115 [pdf, ps, other]

doi 10.1103/PhysRevFluids.7.L071101

Geometric effects induce anomalous size-dependent active transport in structured environments

Authors: Pooja Chopra, David Quint, Ajay Gopinathan, Bin Liu

Abstract: Variations of transport efficiency in structured environments between distinct individuals in actively self-propelled systems is both hard to study and poorly understood. Here, we study the transport of a non-tumbling {\ecoli} strain, an active-matter archetype with intrinsic size variation but fairly uniform speed, through a periodic pillar array. We show that long-term transport switches from a… ▽ More Variations of transport efficiency in structured environments between distinct individuals in actively self-propelled systems is both hard to study and poorly understood. Here, we study the transport of a non-tumbling {\ecoli} strain, an active-matter archetype with intrinsic size variation but fairly uniform speed, through a periodic pillar array. We show that long-term transport switches from a trapping dominated state for shorter cells to a much more dispersive state for longer cells above a critical bacterial size set by the pillar array geometry. Using a combination of experiments and modeling, we show that this anomalous size-dependence arises from an enhancement of the escape rate from trapping for longer cells caused by nearby pillars. Our results show that geometric effects can lead to size being a sensitive tuning knob for transport in structured environments, with implications in general for active matter systems and, in particular, for the morphological adaptation of bacteria to structured habitats, spatial structuring of communities and for anti-biofouling materials design. △ Less

Submitted 17 June, 2022; v1 submitted 22 October, 2020; originally announced October 2020.

Comments: 4 figures

arXiv:1502.06200 [pdf, ps, other]

On an extension of extended Beta and hypergeometric functions

Authors: R. K. Parmar, P. Chopra, R. B. Paris

Abstract: Motivated mainly by certain interesting recent extensions of the Gamma, Beta and hypergeometric functions, we introduce here new extensions of the Beta function, hypergeometric and confluent hypergeometric functions. We systematically investigate several properties of each of these extended functions, namely their various integral representations, Mellin transforms, derivatives, transformations, s… ▽ More Motivated mainly by certain interesting recent extensions of the Gamma, Beta and hypergeometric functions, we introduce here new extensions of the Beta function, hypergeometric and confluent hypergeometric functions. We systematically investigate several properties of each of these extended functions, namely their various integral representations, Mellin transforms, derivatives, transformations, summation formulas and asymptotics. Relevant connections of certain special cases of the main results presented here are also pointed out. △ Less

Submitted 22 February, 2015; originally announced February 2015.

Comments: 14 pages, 0 figures

MSC Class: 33B20; 33C20; 33B15; 33C05

Showing 1–6 of 6 results for author: Chopra, P