Skip to main content

Showing 1–13 of 13 results for author: Bjorck, J

  1. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  2. arXiv:2302.14045  [pdf, other

    cs.CL cs.CV

    Language Is Not All You Need: Aligning Perception with Language Models

    Authors: Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei

    Abstract: A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal co… ▽ More

    Submitted 1 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  3. arXiv:2208.10442  [pdf, other

    cs.CV cs.CL

    Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

    Authors: Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei

    Abstract: A big convergence of language, vision, and multimodal pretraining is emerging. In this work, we introduce a general-purpose multimodal foundation model BEiT-3, which achieves state-of-the-art transfer performance on both vision and vision-language tasks. Specifically, we advance the big convergence from three aspects: backbone architecture, pretraining task, and model scaling up. We introduce Mult… ▽ More

    Submitted 30 August, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

    Comments: 18 pages

  4. arXiv:2110.11222  [pdf, other

    cs.LG cs.AI

    Is High Variance Unavoidable in RL? A Case Study in Continuous Control

    Authors: Johan Bjorck, Carla P. Gomes, Kilian Q. Weinberger

    Abstract: Reinforcement learning (RL) experiments have notoriously high variance, and minor details can have disproportionately large effects on measured outcomes. This is problematic for creating reproducible research and also serves as an obstacle for real-world applications, where safety and predictability are paramount. In this paper, we investigate causes for this perceived instability. To allow for an… ▽ More

    Submitted 5 February, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: Accepted to ICLR2022

  5. arXiv:2106.01151  [pdf, other

    cs.LG

    Towards Deeper Deep Reinforcement Learning with Spectral Normalization

    Authors: Johan Bjorck, Carla P. Gomes, Kilian Q. Weinberger

    Abstract: In computer vision and natural language processing, innovations in model architecture that increase model capacity have reliably translated into gains in performance. In stark contrast with this trend, state-of-the-art reinforcement learning (RL) algorithms often use small MLPs, and gains in performance typically originate from algorithmic innovations. It is natural to hypothesize that small datas… ▽ More

    Submitted 3 January, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: accepted NeurIPS 2021

  6. arXiv:2102.13565  [pdf, other

    cs.LG

    Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision

    Authors: Johan Bjorck, Xiangyu Chen, Christopher De Sa, Carla P. Gomes, Kilian Q. Weinberger

    Abstract: Low-precision training has become a popular approach to reduce compute requirements, memory footprint, and energy consumption in supervised learning. In contrast, this promising approach has not yet enjoyed similarly widespread adoption within the reinforcement learning (RL) community, partly because RL agents can be notoriously hard to train even in full precision. In this paper we consider conti… ▽ More

    Submitted 3 June, 2021; v1 submitted 26 February, 2021; originally announced February 2021.

  7. arXiv:2012.13841  [pdf, other

    cs.LG stat.ML

    Understanding Decoupled and Early Weight Decay

    Authors: Johan Bjorck, Kilian Weinberger, Carla Gomes

    Abstract: Weight decay (WD) is a traditional regularization technique in deep learning, but despite its ubiquity, its behavior is still an area of active research. Golatkar et al. have recently shown that WD only matters at the start of the training in computer vision, upending traditional wisdom. Loshchilov et al. show that for adaptive optimizers, manually decaying weights can outperform adding an $l_2$ p… ▽ More

    Submitted 26 December, 2020; originally announced December 2020.

  8. arXiv:1902.09069  [pdf, other

    cs.SD cs.LG eess.AS

    Automatic Detection and Compression for Passive Acoustic Monitoring of the African Forest Elephant

    Authors: Johan Bjorck, Brendan H. Rappazzo, Di Chen, Richard Bernstein, Peter H. Wrege, Carla P. Gomes

    Abstract: In this work, we consider applying machine learning to the analysis and compression of audio signals in the context of monitoring elephants in sub-Saharan Africa. Earth's biodiversity is increasingly under threat by sources of anthropogenic change (e.g. resource extraction, land use change, and climate change) and surveying animal populations is critical for developing conservation strategies. How… ▽ More

    Submitted 24 February, 2019; originally announced February 2019.

  9. arXiv:1806.02375  [pdf, other

    cs.LG cs.AI stat.ML

    Understanding Batch Normalization

    Authors: Johan Bjorck, Carla Gomes, Bart Selman, Kilian Q. Weinberger

    Abstract: Batch normalization (BN) is a technique to normalize activations in intermediate layers of deep neural networks. Its tendency to improve accuracy and speed up training have established BN as a favorite technique in deep learning. Yet, despite its enormous success, there remains little consensus on the exact reason and mechanism behind these improvements. In this paper we take a step towards a bett… ▽ More

    Submitted 30 November, 2018; v1 submitted 31 May, 2018; originally announced June 2018.

  10. arXiv:1711.06800  [pdf, other

    cs.AI cs.SI

    Scalable Relaxations of Sparse Packing Constraints: Optimal Biocontrol in Predator-Prey Network

    Authors: Johan Bjorck, Yiwei Bai, Xiaojian Wu, Yexiang Xue, Mark C. Whitmore, Carla Gomes

    Abstract: Cascades represent rapid changes in networks. A cascading phenomenon of ecological and economic impact is the spread of invasive species in geographic landscapes. The most promising management strategy is often biocontrol, which entails introducing a natural predator able to control the invading population, a setting that can be treated as two interacting cascades of predator and prey populations.… ▽ More

    Submitted 8 February, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

    Comments: AAAI 2018

  11. arXiv:1611.06018  [pdf, ps, other

    cond-mat.mes-hall

    Surface acoustic wave unidirectional transducers for quantum applications

    Authors: Maria K. Ekström, Thomas Aref, Johan Runeson, Johan Björck, Isac Boström, Per Delsing

    Abstract: The conversion efficiency of electric microwave signals into surface acoustic waves in different types of superconducting transducers is studied with the aim of quantum applications. We compare delay lines containing either conventional symmetric transducers (IDTs) or unidirectional transducers (UDTs) at 2.3 GHz and 10 mK. The UDT delay lines improve the insertion loss with 4.7 dB and a directivit… ▽ More

    Submitted 18 November, 2016; originally announced November 2016.

    Comments: 4 pages (5 including references), 3 figures

  12. arXiv:1610.02005  [pdf

    cond-mat.mtrl-sci

    Automated Phase Mapping with AgileFD and its Application to Light Absorber Discovery in the V-Mn-Nb Oxide System

    Authors: Santosh K. Suram, Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Lan Zhou, Robert B. van Dover, Carla P. Gomes, John M. Gregoire

    Abstract: Rapid construction of phase diagrams is a central tenet of combinatorial materials science with accelerated materials discovery efforts often hampered by challenges in interpreting combinatorial x-ray diffraction datasets, which we address by developing AgileFD, an artificial intelligence algorithm that enables rapid phase mapping from a combinatorial library of x-ray diffraction patterns. AgileFD… ▽ More

    Submitted 6 October, 2016; originally announced October 2016.

  13. arXiv:1610.00689  [pdf, other

    cs.AI

    Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery

    Authors: Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Liane Longpre, Santosh K. Suram, Robert B. van Dover, John Gregoire, Carla P. Gomes

    Abstract: High-Throughput materials discovery involves the rapid synthesis, measurement, and characterization of many different but structurally-related materials. A key problem in materials discovery, the phase map identification problem, involves the determination of the crystal phase diagram from the materials' composition and structural characterization data. We present Phase-Mapper, a novel AI platform… ▽ More

    Submitted 7 October, 2016; v1 submitted 3 October, 2016; originally announced October 2016.