Skip to main content

Showing 1–25 of 25 results for author: Jun, H

  1. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  2. HIDA: A Hierarchical Dataflow Compiler for High-Level Synthesis

    Authors: Hanchen Ye, Hyegang Jun, Deming Chen

    Abstract: Dataflow architectures are growing in popularity due to their potential to mitigate the challenges posed by the memory wall inherent to the Von Neumann architecture. At the same time, high-level synthesis (HLS) has demonstrated its efficacy as a design methodology for generating efficient dataflow architectures within a short development cycle. However, existing HLS tools rely on developers to exp… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: ASPLOS'24

  3. arXiv:2305.09252  [pdf, other

    cs.HC

    Social Wormholes: Exploring Preferences and Opportunities for Distributed and Physically-Grounded Social Connections

    Authors: Joanne Leong, Yuanyang Teng, Xingyu "Bruce" Liu, Hanseul Jun, Sven Kratz, Yu Jiang Tham, Andrés Monroy-Hernández, Brian A. Smith, Rajan Vaish

    Abstract: Ubiquitous computing encapsulates the idea for technology to be interwoven into the fabric of everyday life. As computing blends into everyday physical artifacts, powerful opportunities open up for social connection. Prior connected media objects span a broad spectrum of design combinations. Such diversity suggests that people have varying needs and preferences for staying connected to one another… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: To appear in the Proceedings of the ACM on Human-Computer Interaction, CSCW2, November 2023 issue. To be presented at CSCW 2023. 29 pages

  4. arXiv:2305.02463  [pdf, other

    cs.CV cs.LG

    Shap-E: Generating Conditional 3D Implicit Functions

    Authors: Heewoo Jun, Alex Nichol

    Abstract: We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of implicit functions that can be rendered as both textured meshes and neural radiance fields. We train Shap-E in two stages: first, we train an encoder that deterministically maps 3D assets into the param… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: 23 pages, 13 figures

  5. arXiv:2303.11916  [pdf, other

    cs.CV cs.IR

    CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion

    Authors: Geonmo Gu, Sanghyuk Chun, Wonjae Kim, HeeJae Jun, Yoohoon Kang, Sangdoo Yun

    Abstract: This paper proposes a novel diffusion-based model, CompoDiff, for solving zero-shot Composed Image Retrieval (ZS-CIR) with latent diffusion. This paper also introduces a new synthetic dataset, named SynthTriplets18M, with 18.8 million reference images, conditions, and corresponding target image triplets to train CIR models. CompoDiff and SynthTriplets18M tackle the shortages of the previous CIR ap… ▽ More

    Submitted 16 July, 2024; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: TMLR camera-ready; First two authors contributed equally; TMLR Expert Certification; 30 pages, 5.9MB

  6. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  7. arXiv:2212.08751  [pdf, other

    cs.CV cs.LG

    Point-E: A System for Generating 3D Point Clouds from Complex Prompts

    Authors: Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, Mark Chen

    Abstract: While recent work on text-conditional 3D object generation has shown promising results, the state-of-the-art methods typically require multiple GPU-hours to produce a single sample. This is in stark contrast to state-of-the-art generative image models, which produce samples in a number of seconds or minutes. In this paper, we explore an alternative method for 3D object generation which produces 3D… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

    Comments: 8 pages, 11 figures

  8. arXiv:2207.14255  [pdf, other

    cs.CL

    Efficient Training of Language Models to Fill in the Middle

    Authors: Mohammad Bavarian, Heewoo Jun, Nikolas Tezak, John Schulman, Christine McLeavey, Jerry Tworek, Mark Chen

    Abstract: We show that autoregressive language models can learn to infill text after we apply a straightforward transformation to the dataset, which simply moves a span of text from the middle of a document to its end. While this data augmentation has garnered much interest in recent years, we provide extensive evidence that training models with a large fraction of data transformed in this way does not harm… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

  9. arXiv:2207.04095  [pdf, other

    cs.HC

    An Evaluation Study of 2D and 3D Teleconferencing for Remote Physical Therapy

    Authors: Hanseul Jun, Husam Shaik, Dyan DeVeaux, Michael Lewek, Henry Fuchs, Jeremy Bailenson

    Abstract: The present research investigates the effectiveness of using a telepresence system compared to a video conferencing system and the effectiveness of using two cameras compared to one camera for remote physical therapy. We used Telegie as our telepresence system, which allowed users to see an environment captured with RGBD cameras in 3D through a VR headset. Since both telepresence and the inclusion… ▽ More

    Submitted 3 March, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: 39 pages, 15 figures

  10. arXiv:2202.03677  [pdf, other

    cs.CV

    A Novel Image Descriptor with Aggregated Semantic Skeleton Representation for Long-term Visual Place Recognition

    Authors: Nie Jiwei, Feng Joe-Mei, Xue Dingyu, Pan Feng, Liu Wei, Hu Jun, Cheng Shuai

    Abstract: In a Simultaneous Localization and Mapping (SLAM) system, a loop-closure can eliminate accumulated errors, which is accomplished by Visual Place Recognition (VPR), a task that retrieves the current scene from a set of pre-stored sequential images through matching specific scene-descriptors. In urban scenes, the appearance variation caused by seasons and illumination has brought great challenges to… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

  11. arXiv:2110.14168  [pdf, other

    cs.LG cs.CL

    Training Verifiers to Solve Math Word Problems

    Authors: Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman

    Abstract: State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures of current models and support research, we introduce GSM8K, a dataset of 8.5K high quality linguistically diverse grade school math word problems. We find that even the largest transformer models fail to achieve high tes… ▽ More

    Submitted 17 November, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

  12. arXiv:2107.03374  [pdf, other

    cs.LG

    Evaluating Large Language Models Trained on Code

    Authors: Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter , et al. (33 additional authors not shown)

    Abstract: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J sol… ▽ More

    Submitted 14 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: corrected typos, added references, added authors, added acknowledgements

  13. arXiv:2012.08977  [pdf, other

    cs.RO cs.LG

    Visually Grounding Language Instruction for History-Dependent Manipulation

    Authors: Hyemin Ahn, Obin Kwon, Kyoungdo Kim, Jaeyeon Jeong, Howoong Jun, Hongjung Lee, Dongheui Lee, Songhwai Oh

    Abstract: This paper emphasizes the importance of a robot's ability to refer to its task history, especially when it executes a series of pick-and-place manipulations by following language instructions given one by one. The advantage of referring to the manipulation history can be categorized into two folds: (1) the language instructions omitting details but using expressions referring to the past can be in… ▽ More

    Submitted 14 March, 2022; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: 8 pages, 5 figures

  14. arXiv:2010.14701  [pdf, other

    cs.LG cs.CL cs.CV

    Scaling Laws for Autoregressive Generative Modeling

    Authors: Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish

    Abstract: We identify empirical scaling laws for the cross-entropy loss in four domains: generative image modeling, video modeling, multimodal image$\leftrightarrow$text models, and mathematical problem solving. In all cases autoregressive Transformers smoothly improve in performance as model size and compute budgets increase, following a power-law plus constant scaling law. The optimal model size also depe… ▽ More

    Submitted 5 November, 2020; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: 20+17 pages, 33 figures; added appendix with additional language results

  15. arXiv:2005.12739  [pdf, other

    cs.CV cs.IR cs.LG

    An Effective Pipeline for a Real-world Clothes Retrieval System

    Authors: Yang-Ho Ji, HeeJae Jun, Insik Kim, Jongtack Kim, Youngjoon Kim, Byungsoo Ko, Hyong-Keun Kook, Jingeun Lee, Sangwon Lee, Sanghyuk Park

    Abstract: In this paper, we propose an effective pipeline for clothes retrieval system which has sturdiness on large-scale real-world fashion data. Our proposed method consists of three components: detection, retrieval, and post-processing. We firstly conduct a detection task for precise retrieval on target clothes, then retrieve the corresponding items with the metric learning-based model. To improve the r… ▽ More

    Submitted 26 May, 2020; originally announced May 2020.

    Comments: 2nd place solution on DeepFashion2 clothes retrieval challenge in CVPR2020 workshop (CVFAD)

  16. arXiv:2005.00341  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Jukebox: A Generative Model for Music

    Authors: Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever

    Abstract: We introduce Jukebox, a model that generates music with singing in the raw audio domain. We tackle the long context of raw audio using a multi-scale VQ-VAE to compress it to discrete codes, and modeling those using autoregressive Transformers. We show that the combined model at scale can generate high-fidelity and diverse songs with coherence up to multiple minutes. We can condition on artist and… ▽ More

    Submitted 30 April, 2020; originally announced May 2020.

  17. arXiv:1907.11854  [pdf, other

    cs.CV cs.IR cs.LG

    A Benchmark on Tricks for Large-scale Image Retrieval

    Authors: Byungsoo Ko, Minchul Shin, Geonmo Gu, HeeJae Jun, Tae Kwan Lee, Youngjoon Kim

    Abstract: Many studies have been performed on metric learning, which has become a key ingredient in top-performing methods of instance-level image retrieval. Meanwhile, less attention has been paid to pre-processing and post-processing tricks that can significantly boost performance. Furthermore, we found that most previous studies used small scale datasets to simplify processing. Because the behavior of a… ▽ More

    Submitted 23 April, 2020; v1 submitted 27 July, 2019; originally announced July 2019.

  18. arXiv:1903.10663  [pdf, other

    cs.CV cs.IR cs.LG

    Combination of Multiple Global Descriptors for Image Retrieval

    Authors: HeeJae Jun, Byungsoo Ko, Youngjoon Kim, Insik Kim, Jongtack Kim

    Abstract: Recent studies in image retrieval task have shown that ensembling different models and combining multiple global descriptors lead to performance improvement. However, training different models for the ensemble is not only difficult but also inefficient with respect to time and memory. In this paper, we propose a novel framework that exploits multiple global descriptors to get an ensemble effect wh… ▽ More

    Submitted 23 April, 2020; v1 submitted 25 March, 2019; originally announced March 2019.

  19. arXiv:1810.10045  [pdf, other

    cs.CL

    Language Modeling at Scale

    Authors: Mostofa Patwary, Milind Chabbi, Heewoo Jun, Jiaji Huang, Gregory Diamos, Kenneth Church

    Abstract: We show how Zipf's Law can be used to scale up language modeling (LM) to take advantage of more training data and more GPUs. LM plays a key role in many important natural language applications such as speech recognition and machine translation. Scaling up LM is important since it is widely accepted by the community that there is no data like more data. Eventually, we would like to train on terabyt… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  20. arXiv:1808.06719  [pdf, other

    cs.SD cs.LG eess.AS

    Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

    Authors: Sercan O. Arik, Heewoo Jun, Gregory Diamos

    Abstract: We propose the multi-head convolutional neural network (MCNN) architecture for waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is employed with transposed convolution layers in parallel heads. MCNN achieves more than an order of magnitude higher compute intensity than commonly-used iterative algorithms like Griffin-Lim, yielding efficient utilization for modern multi-core pro… ▽ More

    Submitted 5 November, 2018; v1 submitted 20 August, 2018; originally announced August 2018.

  21. arXiv:1712.00409  [pdf, other

    cs.LG stat.ML

    Deep Learning Scaling is Predictable, Empirically

    Authors: Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

    Abstract: Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, comput… ▽ More

    Submitted 1 December, 2017; originally announced December 2017.

    Comments: 19 pages, 11 figures

  22. arXiv:1711.01567  [pdf, other

    cs.CL cs.LG

    Robust Speech Recognition Using Generative Adversarial Networks

    Authors: Anuroop Sriram, Heewoo Jun, Yashesh Gaur, Sanjeev Satheesh

    Abstract: This paper describes a general, scalable, end-to-end framework that uses the generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Unlike previous methods, the new framework does not rely on domain expertise or simpli… ▽ More

    Submitted 5 November, 2017; originally announced November 2017.

  23. arXiv:1708.06426  [pdf, other

    cs.CL

    Cold Fusion: Training Seq2Seq Models Together with Language Models

    Authors: Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, Adam Coates

    Abstract: Sequence-to-sequence (Seq2Seq) models with attention have excelled at tasks which involve generating natural language sentences such as machine translation, image captioning and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language model. In this work, we present the Cold Fusion method, which leverages a pre-trained language model d… ▽ More

    Submitted 21 August, 2017; originally announced August 2017.

  24. arXiv:1705.04400  [pdf, other

    cs.CL

    Reducing Bias in Production Speech Models

    Authors: Eric Battenberg, Rewon Child, Adam Coates, Christopher Fougner, Yashesh Gaur, Jiaji Huang, Heewoo Jun, Ajay Kannan, Markus Kliegl, Atul Kumar, Hairong Liu, Vinay Rao, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu

    Abstract: Replacing hand-engineered pipelines with end-to-end deep learning systems has enabled strong results in applications like speech and object recognition. However, the causality and latency constraints of production systems put end-to-end speech models back into the underfitting regime and expose biases in the model that we show cannot be overcome by "scaling up", i.e., training bigger models on mor… ▽ More

    Submitted 11 May, 2017; originally announced May 2017.

  25. Hadoop Mapreduce Performance Enhancement Using In-node Combiners

    Authors: Woo-Hyun Lee, Hee-Gook Jun, Hyoung-Joo Kim

    Abstract: While advanced analysis of large dataset is in high demand, data sizes have surpassed capabilities of conventional software and hardware. Hadoop framework distributes large datasets over multiple commodity servers and performs parallel computations. We discuss the I/O bottlenecks of Hadoop framework and propose methods for enhancing I/O performance. A proven approach is to cache data to maximize m… ▽ More

    Submitted 16 November, 2015; originally announced November 2015.

    Comments: International Journal of Computer Science & Information Technology, 2015