Skip to main content

Showing 1–50 of 134 results for author: Ahn, J

  1. arXiv:2407.11368  [pdf

    cs.CL

    Ancient Korean Archive Translation: Comparison Analysis on Statistical phrase alignment, LLM in-context learning, and inter-methodological approach

    Authors: Sojung Lucia Kim, Taehong Jang, Joonmo Ahn

    Abstract: This study aims to compare three methods for translating ancient texts with sparse corpora: (1) the traditional statistical translation method of phrase alignment, (2) in-context LLM learning, and (3) proposed inter methodological approach - statistical machine translation method using sentence piece tokens derived from unified set of source-target corpus. The performance of the proposed approach… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ACL2024 submitted

  2. arXiv:2407.11017  [pdf, other

    cs.CL cs.AI cs.LG

    Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation

    Authors: Jihyun Janice Ahn, Ryo Kamoi, Lu Cheng, Rui Zhang, Wenpeng Yin

    Abstract: Mainstream LLM research has primarily focused on enhancing their generative capabilities. However, even the most advanced LLMs experience uncertainty in their outputs, often producing varied results on different runs or when faced with minor changes in input, despite no substantial change in content. Given multiple responses from the same LLM to the same input, we advocate leveraging the LLMs' dis… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: 4 pages, 3 tables

  3. arXiv:2407.10558  [pdf, other

    cs.CV cs.LG

    ConTEXTure: Consistent Multiview Images to Texture

    Authors: Jaehoon Ahn, Sumin Cho, Harim Jung, Kibeom Hong, Seonghoon Ban, Moon-Ryul Jung

    Abstract: We introduce ConTEXTure, a generative network designed to create a texture map/atlas for a given 3D mesh using images from multiple viewpoints. The process begins with generating a front-view image from a text prompt, such as 'Napoleon, front view', describing the 3D mesh. Additional images from different viewpoints are derived from this front-view image and camera poses relative to it. ConTEXTure… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 11 pages, 7 figures

  4. arXiv:2406.15709  [pdf, other

    cs.CR

    I Experienced More than 10 DeFi Scams: On DeFi Users' Perception of Security Breaches and Countermeasures

    Authors: Mingyi Liu, Jun Ho Huh, HyungSeok Han, Jaehyuk Lee, Jihae Ahn, Frank Li, Hyoungshick Kim, Taesoo Kim

    Abstract: Decentralized Finance (DeFi) offers a whole new investment experience and has quickly emerged as an enticing alternative to Centralized Finance (CeFi). Rapidly growing market size and active users, however, have also made DeFi a lucrative target for scams and hacks, with 1.95 billion USD lost in 2023. Unfortunately, no prior research thoroughly investigates DeFi users' security risk awareness leve… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: In Proceedings of the 33rd USENIX Security Symposium, Philadelphia, PA, USA, Aug. 2024

  5. arXiv:2406.12233  [pdf, other

    cs.AI cs.CL cs.CV

    SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization

    Authors: Young Jin Ahn, Jungwoo Park, Sangha Park, Jonghyun Choi, Kee-Eung Kim

    Abstract: Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fel… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  6. arXiv:2406.05963  [pdf, other

    cs.CV cs.AI

    Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024

    Authors: Jinwoo Ahn, Junhyeok Park, Min-Jun Kim, Kang-Hyeon Kim, So-Yeong Sohn, Yun-Ji Lee, Du-Seong Chang, Yu-Jung Heo, Eun-Sol Kim

    Abstract: In this paper, the solution of HYU MLLAB KT Team to the Multimodal Algorithmic Reasoning Task: SMART-101 CVPR 2024 Challenge is presented. Beyond conventional visual question-answering problems, the SMART-101 challenge aims to achieve human-level multimodal understanding by tackling complex visio-linguistic puzzles designed for children in the 6-8 age group. To solve this problem, we suggest two m… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  7. arXiv:2406.05602  [pdf, other

    cs.CV cs.CL

    Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

    Authors: Philip Wootaek Shin, Jihyun Janice Ahn, Wenpeng Yin, Jack Sampson, Vijaykrishnan Narayanan

    Abstract: It has been shown that many generative models inherit and amplify societal biases. To date, there is no uniform/systematic agreed standard to control/adjust for these biases. This study examines the presence and manipulation of societal biases in leading text-to-image models: Stable Diffusion, DALL-E 3, and Adobe Firefly. Through a comprehensive analysis combining base prompts with modifiers and t… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  8. arXiv:2405.18027  [pdf, other

    cs.CL

    TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models

    Authors: Jaewoo Ahn, Taehyun Lee, Junyoung Lim, Jin-Hwa Kim, Sangdoo Yun, Hwaran Lee, Gunhee Kim

    Abstract: While Large Language Models (LLMs) can serve as agents to simulate human behaviors (i.e., role-playing agents), we emphasize the importance of point-in-time role-playing. This situates characters at specific moments in the narrative progression for three main reasons: (i) enhancing users' narrative immersion, (ii) avoiding spoilers, and (iii) fostering engagement in fandom role-playing. To accurat… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: ACL 2024 Findings. Code and dataset are released at https://ahnjaewoo.github.io/timechara

  9. arXiv:2405.10272  [pdf, other

    cs.CV cs.AI cs.SD eess.AS eess.IV

    Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

    Authors: Youngjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung

    Abstract: The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  10. arXiv:2405.02499  [pdf, other

    cs.CR cs.AR

    DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands

    Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

    Abstract: The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses t… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: To appear at the 51st IEEE/ACM International Symposium on Computer Architecture (ISCA)

  11. arXiv:2404.14687  [pdf, other

    cs.MM cs.AI cs.CL cs.CV

    Pegasus-v1 Technical Report

    Authors: Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, Jin-Young Kim, Junwan Kim, Kyle Park, Lucas Lee, Mars Ha, Minjoon Seo, Abraham Jo, Ed Park, Hassan Kianinejad, SJ Kim, Tony Moon, Wade Jeong, Andrei Popescu, Esther Kim, EK Yoon , et al. (19 additional authors not shown)

    Abstract: This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's archi… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  12. arXiv:2404.03602  [pdf, other

    cs.CL

    Evaluating LLMs at Detecting Errors in LLM Responses

    Authors: Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Ranran Haoran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang

    Abstract: With Large Language Models (LLMs) being widely used across various tasks, detecting errors in their responses is increasingly crucial. However, little research has been conducted on error detection of LLM responses. Collecting error annotations on LLM responses is challenging due to the subjective nature of many NLP tasks, and thus previous research focuses on tasks of little practical value (e.g.… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Benchmark and code: https://github.com/psunlpgroup/ReaLMistake

  13. arXiv:2404.02155  [pdf, other

    cs.CV

    Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields

    Authors: Joshua Ahn, Haochen Wang, Raymond A. Yeh, Greg Shakhnarovich

    Abstract: Scale-ambiguity in 3D scene dimensions leads to magnitude-ambiguity of volumetric densities in neural radiance fields, i.e., the densities double when scene size is halved, and vice versa. We call this property alpha invariance. For NeRFs to better maintain alpha invariance, we recommend 1) parameterizing both distance and volume densities in log space, and 2) a discretization-agnostic initializat… ▽ More

    Submitted 16 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. project page https://pals.ttic.edu/p/alpha-invariance

  14. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  15. arXiv:2403.20109  [pdf, ps, other

    cs.LG cs.AI q-bio.BM

    Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

    Authors: Jinyeong Park, Jaegyoon Ahn, Jonghwan Choi, Jibum Kim

    Abstract: Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence(AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  16. arXiv:2403.14963  [pdf, other

    cs.CR

    Enabling Physical Localization of Uncooperative Cellular Devices

    Authors: Taekkyung Oh, Sangwook Bae, Junho Ahn, Yonghwa Lee, Dinh-Tuan Hoang, Min Suk Kang, Nils Ole Tippenhauer, Yongdae Kim

    Abstract: In cellular networks, it can become necessary for authorities to physically locate user devices for tracking criminals or illegal devices. While cellular operators can provide authorities with cell information the device is camping on, fine-grained localization is still required. Therefore, the authorized agents trace the device by monitoring its uplink signals. However, tracking the uplink signal… ▽ More

    Submitted 25 March, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  17. arXiv:2403.05591  [pdf, other

    cs.HC cs.LG

    Data-Driven Ergonomic Risk Assessment of Complex Hand-intensive Manufacturing Processes

    Authors: Anand Krishnan, Xingjian Yang, Utsav Seth, Jonathan M. Jeyachandran, Jonathan Y. Ahn, Richard Gardner, Samuel F. Pedigo, Adriana, Blom-Schieber, Ashis G. Banerjee, Krithika Manohar

    Abstract: Hand-intensive manufacturing processes, such as composite layup and textile draping, require significant human dexterity to accommodate task complexity. These strenuous hand motions often lead to musculoskeletal disorders and rehabilitation surgeries. We develop a data-driven ergonomic risk assessment system with a special focus on hand and finger activity to better identify and address ergonomic… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 26 pages, 7 figures

  18. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  19. arXiv:2402.02648  [pdf, other

    cs.CL cs.AI

    Recursive Chain-of-Feedback Prevents Performance Degradation from Redundant Prompting

    Authors: Jinwoo Ahn, Kyuseung Shin

    Abstract: Large Language Models (LLMs) frequently struggle with complex reasoning tasks, failing to construct logically sound steps towards the solution. In response to this behavior, users often try prompting the LLMs repeatedly in hopes of reaching a better response. This paper studies such repetitive behavior and its effect by defining a novel setting, Chain-of-Feedback (CoF). The setting takes questions… ▽ More

    Submitted 1 March, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: Still Ongoing Work; 8 Pages; 2 Figures

  20. arXiv:2402.02447  [pdf, other

    cs.LG cs.CL

    Breaking MLPerf Training: A Case Study on Optimizing BERT

    Authors: Yongdeok Kim, Jaehyung Ahn, Myeongwoo Kim, Changin Choi, Heejae Kim, Narankhuu Tuvshinjargal, Seungwon Lee, Yanzi Zhang, Yuan Pei, Xiongzhan Linghu, Jingkun Ma, Lin Chen, Yuehua Dai, Sungjoo Yoo

    Abstract: Speeding up the large-scale distributed training is challenging in that it requires improving various components of training including load balancing, communication, optimizers, etc. We present novel approaches for fast large-scale training of BERT model which individually ameliorates each component thereby leading to a new level of BERT training performance. Load balancing is imperative in distri… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: Total 15 pages (Appendix 3 pages)

  21. arXiv:2402.00157  [pdf, other

    cs.CL

    Large Language Models for Mathematical Reasoning: Progresses and Challenges

    Authors: Janice Ahn, Rishu Verma, Renze Lou, Di Liu, Rui Zhang, Wenpeng Yin

    Abstract: Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive capabilities of human intelligence. In recent times, there has been a notable surge in the development of Large Language Models (LLMs) geared towards the automated resolution of mathematical problems. However, the landscape of mathematical problem types is vast and varied, with LLM-oriented techniques undergoing… ▽ More

    Submitted 5 April, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: EACL 2024 Student Research Workshop, 8 pages

  22. arXiv:2312.15288  [pdf, other

    cs.CV stat.ML

    Understanding normalization in contrastive representation learning and out-of-distribution detection

    Authors: Tai Le-Gia, Jaehyun Ahn

    Abstract: Contrastive representation learning has emerged as an outstanding approach for anomaly detection. In this work, we explore the $\ell_2$-norm of contrastive features and its applications in out-of-distribution detection. We propose a simple method based on contrastive learning, which incorporates out-of-distribution data by discriminating against normal samples in the contrastive layer space. Our a… ▽ More

    Submitted 8 April, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

  23. arXiv:2312.12488  [pdf, other

    cs.LG cs.CR cs.CV

    Foreseeing Reconstruction Quality of Gradient Inversion: An Optimization Perspective

    Authors: HyeongGwon Hong, Yooshin Cho, Hanbyel Cho, Jaesung Ahn, Junmo Kim

    Abstract: Gradient inversion attacks can leak data privacy when clients share weight updates with the server in federated learning (FL). Existing studies mainly use L2 or cosine distance as the loss function for gradient matching in the attack. Our empirical investigation shows that the vulnerability ranking varies with the loss function used. Gradient norm, which is commonly used as a vulnerability proxy f… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: To appear in AAAI 2024

  24. arXiv:2312.11881  [pdf, other

    cs.CL cs.AI

    Punctuation restoration Model and Spacing Model for Korean Ancient Document

    Authors: Taehong Jang, Joonmo Ahn, Sojung Lucia Kim

    Abstract: In Korean ancient documents, there is no spacing or punctuation, and they are written in classical Chinese characters. This makes it challenging for modern individuals and translation models to accurately interpret and translate them. While China has models predicting punctuation and spacing, applying them directly to Korean texts is problematic due to data differences. Therefore, we developed the… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 5 Pages, 2 Figures

  25. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  26. arXiv:2312.04356  [pdf, other

    cs.CR cs.LG

    NeuJeans: Private Neural Network Inference with Joint Optimization of Convolution and Bootstrapping

    Authors: Jae Hyung Ju, Jaiyoung Park, Jongmin Kim, Donghwan Kim, Jung Ho Ahn

    Abstract: Fully homomorphic encryption (FHE) is a promising cryptographic primitive for realizing private neural network inference (PI) services by allowing a client to fully offload the inference task to a cloud server while keeping the client data oblivious to the server. This work proposes NeuJeans, an FHE-based solution for the PI of deep convolutional neural networks (CNNs). NeuJeans tackles the critic… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: 16 pages, 9 figures

  27. arXiv:2312.02436  [pdf, other

    cs.CL cs.AI

    MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following

    Authors: Renze Lou, Kai Zhang, Jian Xie, Yuxuan Sun, Janice Ahn, Hanzi Xu, Yu Su, Wenpeng Yin

    Abstract: In the realm of large language models (LLMs), enhancing instruction-following capability often involves curating expansive training data. This is achieved through two primary schemes: i) Scaling-Inputs: Amplifying (input, output) pairs per task instruction, aiming for better instruction adherence. ii) Scaling Input-Free Tasks: Enlarging tasks, each composed of an (instruction, output) pair (withou… ▽ More

    Submitted 14 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: ICLR 2024. Data, model, and code are available at: https://renzelou.github.io/Muffin/

  28. arXiv:2310.16530  [pdf, other

    cs.CR cs.AR

    Toward Practical Privacy-Preserving Convolutional Neural Networks Exploiting Fully Homomorphic Encryption

    Authors: Jaiyoung Park, Donghwan Kim, Jongmin Kim, Sangpyo Kim, Wonkyung Jung, Jung Hee Cheon, Jung Ho Ahn

    Abstract: Incorporating fully homomorphic encryption (FHE) into the inference process of a convolutional neural network (CNN) draws enormous attention as a viable approach for achieving private inference (PI). FHE allows delegating the entire computation process to the server while ensuring the confidentiality of sensitive client-side data. However, practical FHE implementation of a CNN faces significant hu… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 3 pages, 1 figure, appears at DISCC 2023 (2nd Workshop on Data Integrity and Secure Cloud Computing, in conjunction with the 56th International Symposium on Microarchitecture (MICRO 2023))

  29. arXiv:2310.14579  [pdf, other

    cs.LG cs.AI

    FedSplitX: Federated Split Learning for Computationally-Constrained Heterogeneous Clients

    Authors: Jiyun Shin, Jinhyun Ahn, Honggu Kang, Joonhyuk Kang

    Abstract: Foundation models (FMs) have demonstrated remarkable performance in machine learning but demand extensive training data and computational resources. Federated learning (FL) addresses the challenges posed by FMs, especially related to data privacy and computational burdens. However, FL on FMs faces challenges in situations with heterogeneous clients possessing varying computing capabilities, as cli… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  30. arXiv:2309.12304  [pdf, other

    cs.CV

    SlowFast Network for Continuous Sign Language Recognition

    Authors: Junseok Ahn, Youngjoon Jang, Joon Son Chung

    Abstract: The objective of this work is the effective extraction of spatial and dynamic features for Continuous Sign Language Recognition (CSLR). To accomplish this, we utilise a two-pathway SlowFast network, where each pathway operates at distinct temporal resolutions to separately capture spatial (hand shapes, facial expressions) and dynamic (movements) information. In addition, we introduce two distinct… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  31. arXiv:2308.15413  [pdf, other

    cs.CV eess.IV

    WrappingNet: Mesh Autoencoder via Deep Sphere Deformation

    Authors: Eric Lei, Muhammad Asad Lodhi, Jiahao Pang, Junghyun Ahn, Dong Tian

    Abstract: There have been recent efforts to learn more meaningful representations via fixed length codewords from mesh data, since a mesh serves as a complete model of underlying 3D shape compared to a point cloud. However, the mesh connectivity presents new difficulties when constructing a deep learning pipeline for meshes. Previous mesh unsupervised learning approaches typically assume category-specific t… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  32. arXiv:2308.07491  [pdf, other

    cs.RO cs.GR cs.LG

    Adaptive Tracking of a Single-Rigid-Body Character in Various Environments

    Authors: Taesoo Kwon, Taehong Gu, Jaewon Ahn, Yoonsang Lee

    Abstract: Since the introduction of DeepMimic [Peng et al. 2018], subsequent research has focused on expanding the repertoire of simulated motions across various scenarios. In this study, we propose an alternative approach for this goal, a deep reinforcement learning method based on the simulation of a single-rigid-body character. Using the centroidal dynamics model (CDM) to express the full-body character… ▽ More

    Submitted 28 January, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: SIGGRAPH Asia 2023 Conference Papers

    Journal ref: SA '23: SIGGRAPH Asia 2023 Conference Papers, December 2023, Article No.: 118, Pages 1-11

  33. arXiv:2308.06964  [pdf

    eess.IV cs.CV

    How inter-rater variability relates to aleatoric and epistemic uncertainty: a case study with deep learning-based paraspinal muscle segmentation

    Authors: Parinaz Roshanzamir, Hassan Rivaz, Joshua Ahn, Hamza Mirza, Neda Naghdi, Meagan Anstruther, Michele C. Battié, Maryse Fortin, Yiming Xiao

    Abstract: Recent developments in deep learning (DL) techniques have led to great performance improvement in medical image segmentation tasks, especially with the latest Transformer model and its variants. While labels from fusing multi-rater manual segmentations are often employed as ideal ground truths in DL model training, inter-rater variability due to factors such as training bias, image noise, and extr… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted in UNSURE MICCAI 2023

    MSC Class: I.2.1; I.4.6

  34. arXiv:2308.04890  [pdf, other

    cs.AR cs.CR

    CiFHER: A Chiplet-Based FHE Accelerator with a Resizable Structure

    Authors: Sangpyo Kim, Jongmin Kim, Jaeyoung Choi, Jung Ho Ahn

    Abstract: Fully homomorphic encryption (FHE) is in the spotlight as a definitive solution for privacy, but the high computational overhead of FHE poses a challenge to its practical adoption. Although prior studies have attempted to design ASIC accelerators to mitigate the overhead, their designs require excessive chip resources (e.g., areas) to contain and process massive data for FHE operations. We propose… ▽ More

    Submitted 31 March, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: 12 pages, 10 figures, to appear in 2024 International Symposium on Secure and Private Execution Environment Design (SEED)

  35. arXiv:2307.06294  [pdf, other

    cs.AR cs.ET cs.NI

    Corona: System Implications of Emerging Nanophotonic Technology

    Authors: Dana Vantrease, Robert Schreiber, Matteo Monchiero, Moray McLaren, Norman P. Jouppi, Marco Fiorentin, Al Davis, Nathan Binkert, Raymond G. Beausoleil, Jung Ho Ahn

    Abstract: We expect that many-core microprocessors will push performance per chip from the 10 gigaflop to the 10 teraflop range in the coming decade. To support this increased performance, memory and inter-core bandwidths will also have to scale by orders of magnitude. Pin limitations, the energy cost of electrical signaling, and the non-scalability of chip-length global wires are significant bandwidth impe… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: This edition is recompiled from proceedings of ISCA-35 (the 35th International Symposium on Computer Architecture, June 21 - 25, 2008, Beijing, China) and has minor formatting differences. 13 pages; 11 figures

  36. arXiv:2306.17651  [pdf, other

    cs.CV

    Implicit 3D Human Mesh Recovery using Consistency with Pose and Shape from Unseen-view

    Authors: Hanbyel Cho, Yooshin Cho, Jaesung Ahn, Junmo Kim

    Abstract: From an image of a person, we can easily infer the natural 3D pose and shape of the person even if ambiguity exists. This is because we have a mental model that allows us to imagine a person's appearance at different viewing directions from a given image and utilize the consistency between them for inference. However, existing human mesh recovery methods only consider the direction in which the im… ▽ More

    Submitted 2 July, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: Accepted to CVPR 2023 (poster)

  37. arXiv:2306.15688  [pdf, ps, other

    cs.AR cs.NI

    RETROSPECTIVE: Corona: System Implications of Emerging Nanophotonic Technology

    Authors: Dana Vantrease, Robert Schreiber, Matteo Monchiero, Moray McLaren, Norman P. Jouppi, Marco Fiorentino, Al Davis, Nathan Binkert, Raymond G. Beausoleil, Jung Ho Ahn

    Abstract: The 2008 Corona effort was inspired by a pressing need for more of everything, as demanded by the salient problems of the day. Dennard scaling was no longer in effect. A lot of computer architecture research was in the doldrums. Papers often showed incremental subsystem performance improvements, but at incommensurate cost and complexity. The many-core era was moving rapidly, and the approach with… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

    Comments: 2 pages. Proceedings of ISCA-50: 50 years of the International Symposia on Computer Architecture (selected papers) June 17-21 Orlando, Florida

  38. arXiv:2306.15577  [pdf, ps, other

    cs.AR cs.DC

    Retrospective: A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing

    Authors: Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, Kiyoung Choi

    Abstract: Our ISCA 2015 paper provides a new programmable processing-in-memory (PIM) architecture and system design that can accelerate key data-intensive applications, with a focus on graph processing workloads. Our major idea was to completely rethink the system, including the programming model, data partitioning mechanisms, system support, instruction set architecture, along with near-memory execution un… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: Selected to the 50th Anniversary of ISCA (ACM/IEEE International Symposium on Computer Architecture), Commemorative Issue, 2023

  39. arXiv:2306.14592  [pdf, other

    cs.CL cs.DL

    Transfer Learning across Several Centuries: Machine and Historian Integrated Method to Decipher Royal Secretary's Diary

    Authors: Sojung Lucia Kim, Taehong Jang, Joonmo Ahn, Hyungil Lee, Jaehyuk Lee

    Abstract: A named entity recognition and classification plays the first and foremost important role in capturing semantics in data and anchoring in translation as well as downstream study for history. However, NER in historical text has faced challenges such as scarcity of annotated corpus, multilanguage variety, various noise, and different convention far different from the contemporary language model. Thi… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: 7 pages, 9 figures

  40. arXiv:2306.05334  [pdf, ps, other

    math.CO cs.DM

    Twin-width of subdivisions of multigraphs

    Authors: Jungho Ahn, Debsoumya Chakraborti, Kevin Hendrey, Sang-il Oum

    Abstract: For each $d\leq3$, we construct a finite set $F_d$ of multigraphs such that for each graph $H$ of girth at least $5$ obtained from a multigraph $G$ by subdividing each edge at least two times, $H$ has twin-width at most $d$ if and only if $G$ has no minor in $F_d$. This answers a question of Bergé, Bonnet, and Déprés asking for the structure of graphs $G$ such that each long subdivision of $G$ has… ▽ More

    Submitted 4 June, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: 46 pages, 8 figures, 1 table

    MSC Class: 05C35; 05C75

  41. X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands

    Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

    Abstract: The demand for accurate information about the internal structure and characteristics of dynamic random-access memory (DRAM) has been on the rise. Recent studies have explored the structure and characteristics of DRAM to improve processing in memory, enhance reliability, and mitigate a vulnerability known as rowhammer. However, DRAM manufacturers only disclose limited information through official d… ▽ More

    Submitted 12 August, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: 4 pages, 7 figures, accepted at IEEE Computer Architecture Letters

  42. arXiv:2305.17388  [pdf, other

    cs.CL cs.CV

    MPCHAT: Towards Multimodal Persona-Grounded Conversation

    Authors: Jaewoo Ahn, Yeda Song, Sangdoo Yun, Gunhee Kim

    Abstract: In order to build self-consistent personalized dialogue agents, previous research has mostly focused on textual persona that delivers personal facts or personalities. However, to fully describe the multi-faceted nature of persona, image modality can help better reveal the speaker's personal characteristics and experiences in episodic memory (Rubin et al., 2003; Conway, 2009). In this work, we exte… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023

  43. arXiv:2305.15060  [pdf, other

    cs.CL

    Who Wrote this Code? Watermarking for Code Generation

    Authors: Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, Gunhee Kim

    Abstract: Since the remarkable generation performance of large language models raised ethical and legal concerns, approaches to detect machine-generated text by embedding watermarks are being developed. However, we discover that the existing works fail to function appropriately in code generation tasks due to the task's nature of having low entropy. Extending a logit-modifying watermark method, we propose S… ▽ More

    Submitted 3 July, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: To be presented at ACL 2024

  44. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  45. arXiv:2305.01905  [pdf, other

    cs.CV

    Localization using Multi-Focal Spatial Attention for Masked Face Recognition

    Authors: Yooshin Cho, Hanbyel Cho, Hyeong Gwon Hong, Jaesung Ahn, Dongmin Cho, JungWoo Chang, Junmo Kim

    Abstract: Since the beginning of world-wide COVID-19 pandemic, facial masks have been recommended to limit the spread of the disease. However, these masks hide certain facial attributes. Hence, it has become difficult for existing face recognition systems to perform identity verification on masked faces. In this context, it is necessary to develop masked Face Recognition (MFR) for contactless biometric reco… ▽ More

    Submitted 7 September, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted at FG 2023 - InterID Workshop

  46. Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices

    Authors: Yan Sun, Yifan Yuan, Zeduo Yu, Reese Kuper, Chihun Song, Jinghan Huang, Houxiang Ji, Siddharth Agarwal, Jiaqi Lou, Ipoom Jeong, Ren Wang, Jung Ho Ahn, Tianyin Xu, Nam Sung Kim

    Abstract: The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on memory expansion and disaggregation technologies based on Compute eXpress Link (CXL). Especially, CXL-based memory expansion technology has recently gained notable attention for its ability not only to economically expand memory capacity and bandwidth but also to decouple memory technolo… ▽ More

    Submitted 4 October, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: This paper has been accepted by MICRO'23. Please refer to the https://doi.org/10.1145/3613424.3614256 for the official version of this paper

    ACM Class: C.4; D.4; C.0

  47. arXiv:2303.01963  [pdf

    cs.LG cs.RO math.OC

    Multi-Start Team Orienteering Problem for UAS Mission Re-Planning with Data-Efficient Deep Reinforcement Learning

    Authors: Dong Ho Lee, Jaemyung Ahn

    Abstract: In this paper, we study the Multi-Start Team Orienteering Problem (MSTOP), a mission re-planning problem where vehicles are initially located away from the depot and have different amounts of fuel. We consider/assume the goal of multiple vehicles is to travel to maximize the sum of collected profits under resource (e.g., time, fuel) consumption constraints. Such re-planning problems occur in a wid… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: 48 pages, 18 figures, 7 tables

  48. arXiv:2302.14233  [pdf, other

    cs.CL cs.AI cs.LG

    Goal Driven Discovery of Distributional Differences via Language Descriptions

    Authors: Ruiqi Zhong, Peter Zhang, Steve Li, Jinwoo Ahn, Dan Klein, Jacob Steinhardt

    Abstract: Mining large corpora can generate useful discoveries but is time-consuming for humans. We formulate a new task, D5, that automatically discovers differences between two large corpora in a goal-driven way. The task input is a problem comprising a research goal "$\textit{comparing the side effects of drug A and drug B}$" and a corpus pair (two large collections of patients' self-reported reactions a… ▽ More

    Submitted 24 October, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  49. HyPHEN: A Hybrid Packing Method and Optimizations for Homomorphic Encryption-Based Neural Networks

    Authors: Donghwan Kim, Jaiyoung Park, Jongmin Kim, Sangpyo Kim, Jung Ho Ahn

    Abstract: Convolutional neural network (CNN) inference using fully homomorphic encryption (FHE) is a promising private inference (PI) solution due to the capability of FHE that enables offloading the whole computation process to the server while protecting the privacy of sensitive user data. Prior FHE-based CNN (HCNN) work has demonstrated the feasibility of constructing deep neural network architectures su… ▽ More

    Submitted 8 December, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: 15 pages, 12 figures

  50. arXiv:2301.06375  [pdf, other

    cs.MM cs.AI cs.CL cs.CV cs.LG cs.SD

    OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset

    Authors: Jeongkyun Park, Jung-Wook Hwang, Kwanghee Choi, Seung-Hyun Lee, Jun Hwan Ahn, Rae-Hong Park, Hyung-Min Park

    Abstract: Inspired by humans comprehending speech in a multi-modal manner, various audio-visual datasets have been constructed. However, most existing datasets focus on English, induce dependencies with various prediction models during dataset preparation, and have only a small number of multi-view videos. To mitigate the limitations, we recently developed the Open Large-scale Korean Audio-Visual Speech (OL… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.