Skip to main content

Showing 1–21 of 21 results for author: Nijkamp, E

  1. arXiv:2309.03450  [pdf, other

    cs.CL cs.AI cs.LG

    XGen-7B Technical Report

    Authors: Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu, Wojciech Kryściński, Lidiya Murakhovs'ka, Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat, Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Joty, Caiming Xiong

    Abstract: Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many t… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  2. arXiv:2305.02309  [pdf, other

    cs.LG

    CodeGen2: Lessons for Training LLMs on Programming and Natural Languages

    Authors: Erik Nijkamp, Hiroaki Hayashi, Caiming Xiong, Silvio Savarese, Yingbo Zhou

    Abstract: Large language models (LLMs) have demonstrated remarkable abilities in representation learning for program synthesis and understanding tasks. The quality of the learned representations appears to be dictated by the neural scaling laws as a function of the number of model parameters and observations, while imposing upper bounds on the model performance by the amount of available data and compute, w… ▽ More

    Submitted 11 July, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

  3. arXiv:2302.00138  [pdf, other

    cs.LG

    Generating High Fidelity Synthetic Data via Coreset selection and Entropic Regularization

    Authors: Omead Pooladzandi, Pasha Khosravi, Erik Nijkamp, Baharan Mirzasoleiman

    Abstract: Generative models have the ability to synthesize data points drawn from the data distribution, however, not all generated samples are high quality. In this paper, we propose using a combination of coresets selection methods and ``entropic regularization'' to select the highest fidelity samples. We leverage an Energy-Based Model which resembles a variational auto-encoder with an inference and gener… ▽ More

    Submitted 31 January, 2023; originally announced February 2023.

    Comments: NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research

  4. arXiv:2210.16486  [pdf, other

    cs.CV cs.LG stat.ML

    Learning Probabilistic Models from Generator Latent Spaces with Hat EBM

    Authors: Mitch Hill, Erik Nijkamp, Jonathan Mitchell, Bo Pang, Song-Chun Zhu

    Abstract: This work proposes a method for using any generator network as the foundation of an Energy-Based Model (EBM). Our formulation posits that observed images are the sum of unobserved latent variables passed through the generator network and a residual random variable that spans the gap between the generator output and the image manifold. One can then define an EBM that includes the generator as part… ▽ More

    Submitted 12 January, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022 camera ready

  5. arXiv:2207.10739  [pdf, ps, other

    cs.LG cs.SE

    BigIssue: A Realistic Bug Localization Benchmark

    Authors: Paul Kassianik, Erik Nijkamp, Bo Pang, Yingbo Zhou, Caiming Xiong

    Abstract: As machine learning tools progress, the inevitable question arises: How can machine learning help us write better code? With significant progress being achieved in natural language processing with models like GPT-3 and Bert, the applications of natural language processing techniques to code are starting to be explored. Most of the research has been focused on automatic program repair (APR), and wh… ▽ More

    Submitted 4 May, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

  6. arXiv:2206.13517  [pdf, other

    cs.LG q-bio.QM

    ProGen2: Exploring the Boundaries of Protein Language Models

    Authors: Erik Nijkamp, Jeffrey Ruffolo, Eli N. Weinstein, Nikhil Naik, Ali Madani

    Abstract: Attention-based models trained on protein sequences have demonstrated incredible success at classification and generation tasks relevant for artificial intelligence-driven protein design. However, we lack a sufficient understanding of how very large-scale models and data play a role in effective protein model development. We introduce a suite of protein language models, named ProGen2, that are sca… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

  7. arXiv:2203.13474  [pdf, other

    cs.LG cs.CL cs.PL

    CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

    Authors: Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong

    Abstract: Program synthesis strives to generate a computer program as a solution to a given problem specification, expressed with input-output examples or natural language descriptions. The prevalence of large language models advances the state-of-the-art for program synthesis, though limited training resources and data impede open access to such models. To democratize this, we train and release a family of… ▽ More

    Submitted 27 February, 2023; v1 submitted 25 March, 2022; originally announced March 2022.

  8. arXiv:2203.07586  [pdf, other

    cs.CL

    Long Document Summarization with Top-down and Bottom-up Inference

    Authors: Bo Pang, Erik Nijkamp, Wojciech Kryściński, Silvio Savarese, Yingbo Zhou, Caiming Xiong

    Abstract: Text summarization aims to condense long documents and retain key information. Critical to the success of a summarization model is the faithful inference of latent representations of words or tokens in the source documents. Most recent models infer the latent representations with a transformer encoder, which is purely bottom-up. Also, self-attention-based inference models face the challenge of qua… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: 21 pages

  9. arXiv:2106.02513  [pdf, other

    cs.LG

    Generative Text Modeling through Short Run Inference

    Authors: Bo Pang, Erik Nijkamp, Tian Han, Ying Nian Wu

    Abstract: Latent variable models for text, when trained successfully, accurately model the data distribution and capture global semantic and syntactic features of sentences. The prominent approach to train such models is variational autoencoders (VAE). It is nevertheless challenging to train and often results in a trivial local optimum where the latent variable is ignored and its posterior collapses into th… ▽ More

    Submitted 8 June, 2021; v1 submitted 27 May, 2021; originally announced June 2021.

    Comments: 10 pages

  10. arXiv:2010.09359  [pdf, ps, other

    cs.LG

    Semi-supervised Learning by Latent Space Energy-Based Model of Symbol-Vector Coupling

    Authors: Bo Pang, Erik Nijkamp, Jiali Cui, Tian Han, Ying Nian Wu

    Abstract: This paper proposes a latent space energy-based prior model for semi-supervised learning. The model stands on a generator network that maps a latent vector to the observed example. The energy term of the prior model couples the latent vector and a symbolic one-hot vector, so that classification can be based on the latent vector inferred from the observed example. In our learning method, the symbol… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: work in progress

  11. arXiv:2006.08205  [pdf, other

    stat.ML cs.LG

    Learning Latent Space Energy-Based Prior Model

    Authors: Bo Pang, Tian Han, Erik Nijkamp, Song-Chun Zhu, Ying Nian Wu

    Abstract: We propose to learn energy-based model (EBM) in the latent space of a generator model, so that the EBM serves as a prior model that stands on the top-down network of the generator model. Both the latent space EBM and the top-down network can be learned jointly by maximum likelihood, which involves short-run MCMC sampling from both the prior and posterior distributions of the latent vector. Due to… ▽ More

    Submitted 29 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020 Camera-Ready

  12. arXiv:2006.06897  [pdf, other

    stat.ML cs.LG

    MCMC Should Mix: Learning Energy-Based Model with Neural Transport Latent Space MCMC

    Authors: Erik Nijkamp, Ruiqi Gao, Pavel Sountsov, Srinivas Vasudevan, Bo Pang, Song-Chun Zhu, Ying Nian Wu

    Abstract: Learning energy-based model (EBM) requires MCMC sampling of the learned model as an inner loop of the learning algorithm. However, MCMC sampling of EBMs in high-dimensional data space is generally not mixing, because the energy function, which is usually parametrized by a deep network, is highly multi-modal in the data space. This is a serious handicap for both theory and practice of EBMs. In this… ▽ More

    Submitted 16 March, 2022; v1 submitted 11 June, 2020; originally announced June 2020.

  13. arXiv:2006.06059  [pdf, other

    cs.CV cs.LG

    Joint Training of Variational Auto-Encoder and Latent Energy-Based Model

    Authors: Tian Han, Erik Nijkamp, Linqi Zhou, Bo Pang, Song-Chun Zhu, Ying Nian Wu

    Abstract: This paper proposes a joint training method to learn both the variational auto-encoder (VAE) and the latent energy-based model (EBM). The joint training of VAE and latent EBM are based on an objective function that consists of three Kullback-Leibler divergences between three joint distributions on the latent vector and the image, and the objective function is of an elegant symmetric and anti-symme… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

  14. arXiv:1912.01909  [pdf, other

    stat.ML cs.LG

    Learning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate Inference

    Authors: Erik Nijkamp, Bo Pang, Tian Han, Linqi Zhou, Song-Chun Zhu, Ying Nian Wu

    Abstract: This paper studies the fundamental problem of learning deep generative models that consist of multiple layers of latent variables organized in top-down architectures. Such models have high expressivity and allow for learning hierarchical representations. Learning such a generative model requires inferring the latent variables for each training example based on the posterior distribution of these l… ▽ More

    Submitted 17 July, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

  15. arXiv:1912.00589  [pdf, other

    stat.ML cs.CV cs.LG

    Flow Contrastive Estimation of Energy-Based Models

    Authors: Ruiqi Gao, Erik Nijkamp, Diederik P. Kingma, Zhen Xu, Andrew M. Dai, Ying Nian Wu

    Abstract: This paper studies a training method to jointly estimate an energy-based model and a flow-based model, in which the two models are iteratively updated based on a shared adversarial value function. This joint training method has the following traits. (1) The update of the energy-based model is based on noise contrastive estimation, with the flow model serving as a strong noise distribution. (2) The… ▽ More

    Submitted 1 April, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

  16. arXiv:1911.11374  [pdf, other

    stat.ML cs.LG

    Representation Learning: A Statistical Perspective

    Authors: Jianwen Xie, Ruiqi Gao, Erik Nijkamp, Song-Chun Zhu, Ying Nian Wu

    Abstract: Learning representations of data is an important problem in statistics and machine learning. While the origin of learning representations can be traced back to factor analysis and multidimensional scaling in statistics, it has become a central theme in deep learning with important applications in computer vision and computational neuroscience. In this article, we review recent advances in learning… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Journal ref: Annual Review of Statistics and Its Application 2020

  17. arXiv:1905.02898  [pdf, other

    stat.ML cs.LG

    A Generative Model for Sampling High-Performance and Diverse Weights for Neural Networks

    Authors: Lior Deutsch, Erik Nijkamp, Yu Yang

    Abstract: Recent work on mode connectivity in the loss landscape of deep neural networks has demonstrated that the locus of (sub-)optimal weight vectors lies on continuous paths. In this work, we train a neural network that serves as a hypernetwork, mapping a latent vector into high-performance (low-loss) weight vectors, generalizing recent findings of mode connectivity to higher dimensional manifolds. We f… ▽ More

    Submitted 7 May, 2019; originally announced May 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1801.01952

  18. arXiv:1904.09770  [pdf, other

    stat.ML cs.LG

    Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model

    Authors: Erik Nijkamp, Mitch Hill, Song-Chun Zhu, Ying Nian Wu

    Abstract: This paper studies a curious phenomenon in learning energy-based model (EBM) using MCMC. In each learning iteration, we generate synthesized examples by running a non-convergent, non-mixing, and non-persistent short-run MCMC toward the current model, always starting from the same initial distribution such as uniform noise distribution, and always running a fixed number of MCMC steps. After generat… ▽ More

    Submitted 25 November, 2019; v1 submitted 22 April, 2019; originally announced April 2019.

  19. arXiv:1903.12370  [pdf, other

    stat.ML cs.CV cs.LG

    On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models

    Authors: Erik Nijkamp, Mitch Hill, Tian Han, Song-Chun Zhu, Ying Nian Wu

    Abstract: This study investigates the effects of Markov chain Monte Carlo (MCMC) sampling in unsupervised Maximum Likelihood (ML) learning. Our attention is restricted to the family of unnormalized probability densities for which the negative log density (or energy function) is a ConvNet. We find that many of the techniques used to stabilize training in previous studies are not necessary. ML learning with a… ▽ More

    Submitted 27 November, 2019; v1 submitted 29 March, 2019; originally announced March 2019.

    Comments: Code available at: https://github.com/point0bar1/ebm-anatomy

    Journal ref: AAAI 2020

  20. arXiv:1812.10907  [pdf, other

    stat.ML cs.CV cs.LG

    Divergence Triangle for Joint Training of Generator Model, Energy-based Model, and Inference Model

    Authors: Tian Han, Erik Nijkamp, Xiaolin Fang, Mitch Hill, Song-Chun Zhu, Ying Nian Wu

    Abstract: This paper proposes the divergence triangle as a framework for joint training of generator model, energy-based model and inference model. The divergence triangle is a compact and symmetric (anti-symmetric) objective function that seamlessly integrates variational learning, adversarial learning, wake-sleep algorithm, and contrastive divergence in a unified probabilistic formulation. This unificatio… ▽ More

    Submitted 31 January, 2019; v1 submitted 28 December, 2018; originally announced December 2018.

  21. arXiv:1803.01043  [pdf, other

    stat.ML cs.LG

    Building a Telescope to Look Into High-Dimensional Image Spaces

    Authors: Mitch Hill, Erik Nijkamp, Song-Chun Zhu

    Abstract: An image pattern can be represented by a probability distribution whose density is concentrated on different low-dimensional subspaces in the high-dimensional image space. Such probability densities have an astronomical number of local modes corresponding to typical pattern appearances. Related groups of modes can join to form macroscopic image basins that represent pattern concepts. Recent works… ▽ More

    Submitted 2 March, 2018; originally announced March 2018.