Skip to main content

Showing 1–21 of 21 results for author: Yu, A W

  1. arXiv:2310.01798  [pdf, other

    cs.CL cs.AI

    Large Language Models Cannot Self-Correct Reasoning Yet

    Authors: Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, Denny Zhou

    Abstract: Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their generated content. A contemporary methodology, self-correction, has been proposed as a remedy to these issues. Building upon this premise, this paper critically e… ▽ More

    Submitted 14 March, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  2. arXiv:2305.10429  [pdf, other

    cs.CL cs.LG

    DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

    Authors: Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

    Abstract: The mixture proportions of pretraining data domains (e.g., Wikipedia, books, web text) greatly affect language model (LM) performance. In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of do… ▽ More

    Submitted 20 November, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  3. arXiv:2207.06010  [pdf, other

    cs.LG q-bio.BM

    Does GNN Pretraining Help Molecular Representation?

    Authors: Ruoxi Sun, Hanjun Dai, Adams Wei Yu

    Abstract: Extracting informative representations of molecules using Graph neural networks (GNNs) is crucial in AI-driven drug discovery. Recently, the graph research community has been trying to replicate the success of self-supervised pretraining in natural language processing, with several successes claimed. However, we find the benefit brought by self-supervised pretraining on small molecular data can be… ▽ More

    Submitted 2 November, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

  4. arXiv:2203.08195  [pdf, other

    cs.CV

    DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

    Authors: Yingwei Li, Adams Wei Yu, Tianjian Meng, Ben Caine, Jiquan Ngiam, Daiyi Peng, Junyang Shen, Bo Wu, Yifeng Lu, Denny Zhou, Quoc V. Le, Alan Yuille, Mingxing Tan

    Abstract: Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving. While prevalent multi-modal methods simply decorate raw lidar point clouds with camera features and feed them directly to existing 3D detection models, our study shows that fusing camera features with deep lidar features instead of raw points, can lead to better performance. Howev… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: CVPR 2022. 1st rank 3D detection method on Waymo Challenge Leaderboard: https://waymo.com/open/challenges/entry/?timestamp=1647356360224524&challenge=DETECTION_3D&emailId=5451f123-a0ea

  5. arXiv:2112.06905  [pdf, other

    cs.CL

    GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

    Authors: Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie Pellat, Kevin Robinson, Kathleen Meier-Hellstern, Toju Duke, Lucas Dixon, Kun Zhang, Quoc V Le, Yonghui Wu , et al. (2 additional authors not shown)

    Abstract: Scaling language models with more data, compute and parameters has driven significant progress in natural language processing. For example, thanks to scaling, GPT-3 was able to achieve strong results on in-context learning tasks. However, training these large dense models requires significant amounts of computing resources. In this paper, we propose and develop a family of language models named GL… ▽ More

    Submitted 1 August, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: Accepted to ICML 2022

  6. arXiv:2111.10050  [pdf, other

    cs.LG cs.CL cs.CV

    Combined Scaling for Zero-shot Transfer Learning

    Authors: Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, Quoc V. Le

    Abstract: We present a combined scaling method - named BASIC - that achieves 85.7% top-1 accuracy on the ImageNet ILSVRC-2012 validation set without learning from any labeled ImageNet example. This accuracy surpasses best published similar models - CLIP and ALIGN - by 9.3%. Our BASIC model also shows significant improvements in robustness benchmarks. For instance, on 5 test sets with natural distribution sh… ▽ More

    Submitted 12 April, 2023; v1 submitted 19 November, 2021; originally announced November 2021.

  7. arXiv:2109.09193  [pdf, other

    cs.CL cs.LG

    Towards Zero-Label Language Learning

    Authors: Zirui Wang, Adams Wei Yu, Orhan Firat, Yuan Cao

    Abstract: This paper explores zero-label learning in Natural Language Processing (NLP), whereby no human-annotated data is used anywhere during training and models are trained purely on synthetic data. At the core of our framework is a novel approach for better leveraging the powerful pretrained language models. Specifically, inspired by the recent success of few-shot inference on GPT-3, we present a traini… ▽ More

    Submitted 19 September, 2021; originally announced September 2021.

  8. arXiv:2109.01652  [pdf, other

    cs.CL

    Finetuned Language Models Are Zero-Shot Learners

    Authors: Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le

    Abstract: This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natur… ▽ More

    Submitted 8 February, 2022; v1 submitted 3 September, 2021; originally announced September 2021.

    Comments: Version 5. Find list of changes in Appendix F (page 35)

  9. arXiv:2108.10904  [pdf, other

    cs.CV cs.CL cs.LG

    SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

    Authors: Zirui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao

    Abstract: With recent progress in joint modeling of visual and textual representations, Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks. However, the requirement for expensive annotations including clean image captions and regional labels limits the scalability of existing approaches, and complicates the pretraining procedure with the introduction of… ▽ More

    Submitted 15 May, 2022; v1 submitted 24 August, 2021; originally announced August 2021.

    Comments: Published at ICLR 2022

  10. arXiv:2008.06662  [pdf, other

    cs.LG cs.AI stat.ML

    Compositional Generalization via Neural-Symbolic Stack Machines

    Authors: Xinyun Chen, Chen Liang, Adams Wei Yu, Dawn Song, Denny Zhou

    Abstract: Despite achieving tremendous success, existing deep learning models have exposed limitations in compositional generalization, the capability to learn compositional rules and apply them to unseen cases in a systematic manner. To tackle this issue, we propose the Neural-Symbolic Stack Machine (NeSS). It contains a neural network to generate traces, which are then executed by a symbolic stack machine… ▽ More

    Submitted 22 October, 2020; v1 submitted 15 August, 2020; originally announced August 2020.

    Comments: Published in NeurIPS 2020

  11. arXiv:2006.03656  [pdf, other

    cs.CV

    AutoHAS: Efficient Hyperparameter and Architecture Search

    Authors: Xuanyi Dong, Mingxing Tan, Adams Wei Yu, Daiyi Peng, Bogdan Gabrys, Quoc V. Le

    Abstract: Efficient hyperparameter or architecture search methods have shown remarkable results, but each of them is only applicable to searching for either hyperparameters (HPs) or architectures. In this work, we propose a unified pipeline, AutoHAS, to efficiently search for both architectures and hyperparameters. AutoHAS learns to alternately update the shared network weights and a reinforcement learning… ▽ More

    Submitted 7 April, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

    Comments: Accepted to 2nd Workshop on Neural Architecture Search at ICLR 2021

  12. arXiv:1804.09541  [pdf, other

    cs.CL cs.AI cs.LG

    QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

    Authors: Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, Quoc V. Le

    Abstract: Current end-to-end machine reading and question answering (Q\&A) models are primarily based on recurrent neural networks (RNNs) with attention. Despite their success, these models are often slow for both training and inference due to the sequential nature of RNNs. We propose a new Q\&A architecture called QANet, which does not require recurrent networks: Its encoder consists exclusively of convolu… ▽ More

    Submitted 23 April, 2018; originally announced April 2018.

    Comments: Published as full paper in ICLR 2018

  13. arXiv:1803.03919  [pdf, other

    stat.ML cs.LG stat.ME

    Detecting Nonlinear Causality in Multivariate Time Series with Sparse Additive Models

    Authors: Yingxiang Yang, Adams Wei Yu, Zhaoran Wang, Tuo Zhao

    Abstract: We propose a nonparametric method for detecting nonlinear causal relationship within a set of multidimensional discrete time series, by using sparse additive models (SpAMs). We show that, when the input to the SpAM is a $β$-mixing time series, the model can be fitted by first approximating each unknown function with a linear combination of a set of B-spline bases, and then solving a group-lasso-ty… ▽ More

    Submitted 26 April, 2018; v1 submitted 11 March, 2018; originally announced March 2018.

  14. arXiv:1709.06079  [pdf, other

    cs.LG

    Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks

    Authors: Lei Huang, Xianglong Liu, Bo Lang, Adams Wei Yu, Yongliang Wang, Bo Li

    Abstract: Orthogonal matrix has shown advantages in training Recurrent Neural Networks (RNNs), but such matrix is limited to be square for the hidden-to-hidden transformation in RNNs. In this paper, we generalize such square orthogonal matrix to orthogonal rectangular matrix and formulating this problem in feed-forward Neural Networks (FNNs) as Optimization over Multiple Dependent Stiefel Manifolds (OMDSM).… ▽ More

    Submitted 21 November, 2017; v1 submitted 16 September, 2017; originally announced September 2017.

    Comments: 20 pages, Accepted by AAAI 2018

  15. arXiv:1707.04822  [pdf, other

    cs.LG cs.AI

    Block-Normalized Gradient Method: An Empirical Study for Training Deep Neural Network

    Authors: Adams Wei Yu, Lei Huang, Qihang Lin, Ruslan Salakhutdinov, Jaime Carbonell

    Abstract: In this paper, we propose a generic and simple strategy for utilizing stochastic gradient information in optimization. The technique essentially contains two consecutive steps in each iteration: 1) computing and normalizing each block (layer) of the mini-batch stochastic gradient; 2) selecting appropriate step size to update the decision variable (parameter) towards the negative of the block-norma… ▽ More

    Submitted 23 April, 2018; v1 submitted 16 July, 2017; originally announced July 2017.

  16. arXiv:1704.06877  [pdf, other

    cs.CL cs.LG

    Learning to Skim Text

    Authors: Adams Wei Yu, Hongrae Lee, Quoc V. Le

    Abstract: Recurrent Neural Networks are showing much promise in many sub-areas of natural language processing, ranging from document classification to machine translation to automatic question answering. Despite their promise, many recurrent models have to read the whole text word by word, making it slow to handle long documents. For example, it is difficult to use a recurrent network to read a book and ans… ▽ More

    Submitted 29 April, 2017; v1 submitted 22 April, 2017; originally announced April 2017.

  17. arXiv:1602.07046  [pdf, ps, other

    stat.ML cs.LG math.NA

    An Improved Gap-Dependency Analysis of the Noisy Power Method

    Authors: Maria Florina Balcan, Simon S. Du, Yining Wang, Adams Wei Yu

    Abstract: We consider the noisy power method algorithm, which has wide applications in machine learning and statistics, especially those related to principal component analysis (PCA) under resource (communication, memory or privacy) constraints. Existing analysis of the noisy power method shows an unsatisfactory dependency over the "consecutive" spectral gap $(σ_k-σ_{k+1})$ of an input data matrix, which co… ▽ More

    Submitted 23 February, 2016; originally announced February 2016.

  18. arXiv:1601.02068  [pdf, other

    stat.ML cs.LG math.ST

    On Computationally Tractable Selection of Experiments in Measurement-Constrained Regression Models

    Authors: Yining Wang, Adams Wei Yu, Aarti Singh

    Abstract: We derive computationally tractable methods to select a small subset of experiment settings from a large pool of given design points. The primary focus is on linear regression models, while the technique extends to generalized linear models and Delta's method (estimating functions of linear regression models) as well. The algorithms are based on a continuous relaxation of an otherwise intractable… ▽ More

    Submitted 20 December, 2017; v1 submitted 8 January, 2016; originally announced January 2016.

    Comments: 41 pages. Accepted for publication in Journal of Machine Learning Research

  19. arXiv:1508.05003  [pdf, other

    stat.ML cs.LG math.OC

    AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization

    Authors: Suvrit Sra, Adams Wei Yu, Mu Li, Alexander J. Smola

    Abstract: We study distributed stochastic convex optimization under the delayed gradient model where the server nodes perform parameter updates, while the worker nodes compute stochastic gradients. We discuss, analyze, and experiment with a setup motivated by the behavior of real-world distributed computation networks, where the machines are differently slow at different time. Therefore, we allow the parame… ▽ More

    Submitted 20 August, 2015; originally announced August 2015.

    Comments: 19 pages

  20. arXiv:1508.03390  [pdf, other

    cs.LG stat.ML

    Doubly Stochastic Primal-Dual Coordinate Method for Bilinear Saddle-Point Problem

    Authors: Adams Wei Yu, Qihang Lin, Tianbao Yang

    Abstract: We propose a doubly stochastic primal-dual coordinate optimization algorithm for empirical risk minimization, which can be formulated as a bilinear saddle-point problem. In each iteration, our method randomly samples a block of coordinates of the primal and dual solutions to update. The linear convergence of our method could be established in terms of 1) the distance from the current iterate to th… ▽ More

    Submitted 12 April, 2017; v1 submitted 13 August, 2015; originally announced August 2015.

  21. arXiv:1206.4638  [pdf

    cs.LG stat.ML

    Efficient Euclidean Projections onto the Intersection of Norm Balls

    Authors: Adams Wei Yu, Hao Su, Li Fei-Fei

    Abstract: Using sparse-inducing norms to learn robust models has received increasing attention from many fields for its attractive properties. Projection-based methods have been widely applied to learning tasks constrained by such norms. As a key building block of these methods, an efficient operator for Euclidean projection onto the intersection of $\ell_1$ and $\ell_{1,q}$ norm balls… ▽ More

    Submitted 18 June, 2012; originally announced June 2012.

    Comments: ICML2012