Skip to main content

Showing 1–50 of 429 results for author: Lee, G

  1. arXiv:2407.11439  [pdf, other

    cs.LG cs.AI q-bio.BM

    Repurformer: Transformers for Repurposing-Aware Molecule Generation

    Authors: Changhun Lee, Gyumin Lee

    Abstract: Generating as diverse molecules as possible with desired properties is crucial for drug discovery research, which invokes many approaches based on deep generative models today. Despite recent advancements in these models, particularly in variational autoencoders (VAEs), generative adversarial networks (GANs), Transformers, and diffusion models, a significant challenge known as \textit{the sample b… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 12 pages, 8 figures, conference

  2. arXiv:2407.09514  [pdf

    cond-mat.mtrl-sci cs.LG physics.app-ph

    Machine Learning Based Prediction of Proton Conductivity in Metal-Organic Frameworks

    Authors: Seunghee Han, Byeong Gwan Lee, Dae Woon Lim, Jihan Kim

    Abstract: Recently, metal-organic frameworks (MOFs) have demonstrated their potential as solid-state electrolytes in proton exchange membrane fuel cells. However, the number of MOFs reported to exhibit proton conductivity remains limited, and the mechanisms underlying this phenomenon are not fully elucidated, complicating the design of proton-conductive MOFs. In response, we developed a comprehensive databa… ▽ More

    Submitted 18 June, 2024; originally announced July 2024.

  3. arXiv:2407.06682  [pdf, other

    cs.LG cs.AI

    A Predictive Model Based on Transformer with Statistical Feature Embedding in Manufacturing Sensor Dataset

    Authors: Gyeong Taek Lee, Oh-Ran Kwon

    Abstract: In the manufacturing process, sensor data collected from equipment is crucial for building predictive models to manage processes and improve productivity. However, in the field, it is challenging to gather sufficient data to build robust models. This study proposes a novel predictive model based on the Transformer, utilizing statistical feature embedding and window positional encoding. Statistical… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  4. arXiv:2407.03741  [pdf, other

    cs.IT

    A Unified Expression for Upper Bounds on the BLER of Spinal Codes over Fading Channels

    Authors: Aimin Li, Xiaomeng Chen, Shaohua Wu, Gary C. F. Lee, Sumei Sun

    Abstract: Performance evaluation of particular channel coding has been a significant topic in coding theory, often involving the use of bounding techniques. This paper focuses on the new family of capacity-achieving codes, Spinal codes, to provide a comprehensive analysis framework to tightly upper bound the block error rate (BLER) of Spinal codes in the finite block length (FBL) regime. First, we resort to… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  5. arXiv:2407.02681  [pdf, other

    cs.LG eess.IV math.OC stat.ML

    Uniform Transformation: Refining Latent Representation in Variational Autoencoders

    Authors: Ye Shi, C. S. George Lee

    Abstract: Irregular distribution in latent space causes posterior collapse, misalignment between posterior and prior, and ill-sampling problem in Variational Autoencoders (VAEs). In this paper, we introduce a novel adaptable three-stage Uniform Transformation (UT) module -- Gaussian Kernel Density Estimation (G-KDE) clustering, non-parametric Gaussian Mixture (GM) Modeling, and Probability Integral Transfor… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by 2024 IEEE 20th International Conference on Automation Science and Engineering

  6. arXiv:2407.02245  [pdf, other

    cs.RO cs.AI

    Safe CoR: A Dual-Expert Approach to Integrating Imitation Learning and Safe Reinforcement Learning Using Constraint Rewards

    Authors: Hyeokjin Kwon, Gunmin Lee, Junseo Lee, Songhwai Oh

    Abstract: In the realm of autonomous agents, ensuring safety and reliability in complex and dynamic environments remains a paramount challenge. Safe reinforcement learning addresses these concerns by introducing safety constraints, but still faces challenges in navigating intricate environments such as complex driving situations. To overcome these challenges, we present the safe constraint reward (Safe CoR)… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted to the Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

  7. arXiv:2407.00693  [pdf, other

    cs.AI cs.CL cs.LG

    BAPO: Base-Anchored Preference Optimization for Personalized Alignment in Large Language Models

    Authors: Gihun Lee, Minchan Jeong, Yujin Kim, Hojung Jung, Jaehoon Oh, Sangmook Kim, Se-Young Yun

    Abstract: While learning to align Large Language Models (LLMs) with human preferences has shown remarkable success, aligning these models to meet the diverse user preferences presents further challenges in preserving previous knowledge. This paper examines the impact of personalized preference optimization on LLMs, revealing that the extent of knowledge loss varies significantly with preference heterogeneit… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: under review

  8. arXiv:2406.16341  [pdf, other

    cs.CL

    EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

    Authors: Yeonsu Kwon, Jiho Kim, Gyubok Lee, Seongsu Bae, Daeun Kyung, Wonchul Cha, Tom Pollard, Alistair Johnson, Edward Choi

    Abstract: Electronic Health Records (EHRs) are integral for storing comprehensive patient medical records, combining structured data (e.g., medications) with detailed clinical notes (e.g., physician notes). These elements are essential for straightforward data retrieval and provide deep, contextual insights into patient care. However, they often suffer from discrepancies due to unintuitive EHR system design… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  9. arXiv:2406.15723  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation Assessment

    Authors: Heejin Do, Wonjun Lee, Gary Geunbae Lee

    Abstract: In automated pronunciation assessment, recent emphasis progressively lies on evaluating multiple aspects to provide enriched feedback. However, acquiring multi-aspect-score labeled data for non-native language learners' speech poses challenges; moreover, it often leads to score-imbalanced distributions. In this paper, we propose two Acoustic Feature Mixup strategies, linearly and non-linearly inte… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  10. arXiv:2406.13935  [pdf, other

    eess.AS cs.AI cs.SD

    CONMOD: Controllable Neural Frame-based Modulation Effects

    Authors: Gyubin Lee, Hounsu Kim, Junwon Lee, Juhan Nam

    Abstract: Deep learning models have seen widespread use in modelling LFO-driven audio effects, such as phaser and flanger. Although existing neural architectures exhibit high-quality emulation of individual effects, they do not possess the capability to manipulate the output via control parameters. To address this issue, we introduce Controllable Neural Frame-based Modulation Effects (CONMOD), a single blac… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  11. arXiv:2406.11311  [pdf, other

    cs.CV

    Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection

    Authors: Yunsong Wang, Na Zhao, Gim Hee Lee

    Abstract: The use of synthetic data in indoor 3D object detection offers the potential of greatly reducing the manual labor involved in 3D annotations and training effective zero-shot detectors. However, the complicated domain shifts across syn-to-real indoor datasets remains underexplored. In this paper, we propose a novel Object-wise Hierarchical Domain Alignment (OHDA) framework for syn-to-real unsupervi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  12. arXiv:2406.11283  [pdf, other

    cs.CV

    Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding

    Authors: Yunsong Wang, Na Zhao, Gim Hee Lee

    Abstract: The field of self-supervised 3D representation learning has emerged as a promising solution to alleviate the challenge presented by the scarcity of extensive, well-annotated datasets. However, it continues to be hindered by the lack of diverse, large-scale, real-world 3D scene datasets for source data. To address this shortfall, we propose Generalizable Representation Learning (GRL), where we devi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  13. arXiv:2406.10867  [pdf, other

    cs.LG q-bio.BM

    Geometric-informed GFlowNets for Structure-Based Drug Design

    Authors: Grayson Lee, Tony Shen, Martin Ester

    Abstract: The rise of cost involved with drug discovery and current speed of which they are discover, underscore the need for more efficient structure-based drug design (SBDD) methods. We employ Generative Flow Networks (GFlowNets), to effectively explore the vast combinatorial space of drug-like molecules, which traditional virtual screening methods fail to cover. We introduce a novel modification to the G… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted at MoML 2024 as Spotlight

  14. arXiv:2406.07800  [pdf, other

    cs.LG cs.DC

    Regularizing and Aggregating Clients with Class Distribution for Personalized Federated Learning

    Authors: Gyuejeong Lee, Daeyoung Choi

    Abstract: Personalized federated learning (PFL) enables customized models for clients with varying data distributions. However, existing PFL methods often incur high computational and communication costs, limiting their practical application. This paper proposes a novel PFL method, Class-wise Federated Averaging (cwFedAVG), that performs Federated Averaging (FedAVG) class-wise, creating multiple global mode… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  15. arXiv:2406.07231  [pdf, other

    cs.CL

    Decipherment-Aware Multilingual Learning in Jointly Trained Language Models

    Authors: Grandee Lee

    Abstract: The principle that governs unsupervised multilingual learning (UCL) in jointly trained language models (mBERT as a popular example) is still being debated. Many find it surprising that one can achieve UCL with multiple monolingual corpora. In this work, we anchor UCL in the context of language decipherment and show that the joint training methodology is a decipherment process pivotal for UCL. In a… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  16. arXiv:2406.06650  [pdf, other

    eess.IV cs.CV

    Predicting the risk of early-stage breast cancer recurrence using H\&E-stained tissue images

    Authors: Geongyu Lee, Joonho Lee, Tae-Yeong Kwak, Sun Woo Kim, Youngmee Kwon, Chungyeul Kim, Hyeyoon Chang

    Abstract: Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology. A total of 125 hematoxylin and eosin stained breast cancer whole slide images la… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 12 pages, 7 figures

  17. arXiv:2406.06050  [pdf, other

    cs.CV

    Generalizable Human Gaussians from Single-View Image

    Authors: Jinnan Chen, Chen Li, Jianfeng Zhang, Hanlin Chen, Buzhen Huang, Gim Hee Lee

    Abstract: In this work, we tackle the task of learning generalizable 3D human Gaussians from a single image. The main challenge for this task is to recover detailed geometry and appearance, especially for the unobserved regions. To this end, we propose single-view generalizable Human Gaussian model (HGM), a diffusion-guided framework for 3D human modeling from a single image. We design a diffusion-based coa… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  18. arXiv:2406.05774  [pdf, other

    cs.CV

    VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction

    Authors: Hanlin Chen, Fangyin Wei, Chen Li, Tianxin Huang, Yunsong Wang, Gim Hee Lee

    Abstract: Although 3D Gaussian Splatting has been widely studied because of its realistic and efficient novel-view synthesis, it is still challenging to extract a high-quality surface from the point-based representation. Previous works improve the surface by incorporating geometric priors from the off-the-shelf normal estimator. However, there are two main limitations: 1) Supervising normal rendered from 3D… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  19. arXiv:2406.04625  [pdf, other

    cs.CL cs.AI

    Key-Element-Informed sLLM Tuning for Document Summarization

    Authors: Sangwon Ryu, Heejin Do, Yunsu Kim, Gary Geunbae Lee, Jungseul Ok

    Abstract: Remarkable advances in large language models (LLMs) have enabled high-quality text summarization. However, this capability is currently accessible only through LLMs of substantial size or proprietary LLMs with usage fees. In response, smaller-scale LLMs (sLLMs) of easy accessibility and low costs have been extensively studied, yet they often suffer from missing key information and entities, i.e.,… ▽ More

    Submitted 25 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  20. arXiv:2406.02893  [pdf, other

    cs.CL

    Language Model Can Do Knowledge Tracing: Simple but Effective Method to Integrate Language Model and Knowledge Tracing Task

    Authors: Unggi Lee, Jiyeong Bae, Dohee Kim, Sookbun Lee, Jaekwon Park, Taekyung Ahn, Gunho Lee, Damji Stratton, Hyeoncheol Kim

    Abstract: Knowledge Tracing (KT) is a critical task in online learning for modeling student knowledge over time. Despite the success of deep learning-based KT models, which rely on sequences of numbers as data, most existing approaches fail to leverage the rich semantic information in the text of questions and concepts. This paper proposes Language model-based Knowledge Tracing (LKT), a novel framework that… ▽ More

    Submitted 9 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 11 pages, 5 figures, 3 tables

  21. arXiv:2406.00303  [pdf, other

    cs.CL cs.AI

    Multi-Dimensional Optimization for Text Summarization via Reinforcement Learning

    Authors: Sangwon Ryu, Heejin Do, Yunsu Kim, Gary Geunbae Lee, Jungseul Ok

    Abstract: The evaluation of summary quality encompasses diverse dimensions such as consistency, coherence, relevance, and fluency. However, existing summarization methods often target a specific dimension, facing challenges in generating well-balanced summaries across multiple dimensions. In this paper, we propose multi-objective reinforcement learning tailored to generate balanced summaries across all four… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: ACL 2024

  22. arXiv:2406.00019  [pdf, other

    cs.CL cs.AI cs.DB cs.IR

    EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records

    Authors: Jaehee Ryu, Seonhee Cho, Gyubok Lee, Edward Choi

    Abstract: In this paper, we introduce EHR-SeqSQL, a novel sequential text-to-SQL dataset for Electronic Health Record (EHR) databases. EHR-SeqSQL is designed to address critical yet underexplored aspects in text-to-SQL parsing: interactivity, compositionality, and efficiency. To the best of our knowledge, EHR-SeqSQL is not only the largest but also the first medical text-to-SQL dataset benchmark to include… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

    Comments: ACL 2024 (Findings)

  23. arXiv:2405.19046  [pdf, other

    cs.IR

    Continual Collaborative Distillation for Recommender System

    Authors: Gyuseok Lee, SeongKu Kang, Wonbin Kweon, Hwanjo Yu

    Abstract: Knowledge distillation (KD) has emerged as a promising technique for addressing the computational challenges associated with deploying large-scale recommender systems. KD transfers the knowledge of a massive teacher system to a compact student model, to reduce the huge computational burdens for inference while retaining high accuracy. The existing KD studies primarily focus on one-time distillatio… ▽ More

    Submitted 25 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024 research track. 9 main pages + 1 appendix page, 5 figures

  24. arXiv:2405.17958  [pdf, other

    cs.CV

    FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes

    Authors: Yunsong Wang, Tianxin Huang, Hanlin Chen, Gim Hee Lee

    Abstract: Empowering 3D Gaussian Splatting with generalization ability is appealing. However, existing generalizable 3D Gaussian Splatting methods are largely confined to narrow-range interpolation between stereo images due to their heavy backbones, thus lacking the ability to accurately localize 3D Gaussian and support free-view synthesis across wide view range. In this paper, we present a novel framework… ▽ More

    Submitted 9 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  25. arXiv:2405.13943  [pdf, other

    cs.CV

    DoGaussian: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus

    Authors: Yu Chen, Gim Hee Lee

    Abstract: The recent advances in 3D Gaussian Splatting (3DGS) show promising results on the novel view synthesis (NVS) task. With its superior rendering performance and high-fidelity rendering quality, 3DGS is excelling at its previous NeRF counterparts. The most recent 3DGS method focuses either on improving the instability of rendering efficiency or reducing the model size. On the other hand, the training… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  26. arXiv:2405.12900  [pdf, other

    cs.CL cs.AI

    Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents

    Authors: San Kim, Gary Geunbae Lee

    Abstract: Recent advancements in open-domain dialogue systems have been propelled by the emergence of high-quality large language models (LLMs) and various effective training methodologies. Nevertheless, the presence of toxicity within these models presents a significant challenge that can potentially diminish the user experience. In this study, we introduce an innovative training algorithm, an improvement… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 15 pages, 7 figures, accepted to NAACL findings 2024

    ACM Class: I.2.7

  27. arXiv:2405.07520  [pdf, ps, other

    cs.CV

    Dehazing Remote Sensing and UAV Imagery: A Review of Deep Learning, Prior-based, and Hybrid Approaches

    Authors: Gao Yu Lee, Jinkuan Chen, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N Duong

    Abstract: High-quality images are crucial in remote sensing and UAV applications, but atmospheric haze can severely degrade image quality, making image dehazing a critical research area. Since the introduction of deep convolutional neural networks, numerous approaches have been proposed, and even more have emerged with the development of vision transformers and contrastive/few-shot learning. Simultaneously,… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Submitted to journal and under review, once the paper is accepted, the copyright will be transferred to the corresponding journal

  28. arXiv:2405.07163  [pdf, other

    physics.ed-ph cs.AI

    Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI

    Authors: Gyeong-Geon Lee, Xiaoming Zhai

    Abstract: Educational scholars have analyzed various image data acquired from teaching and learning situations, such as photos that shows classroom dynamics, students' drawings with regard to the learning content, textbook illustrations, etc. Unquestioningly, most qualitative analysis of and explanation on image data have been conducted by human researchers, without machine-based automation. It was partiall… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  29. arXiv:2405.06673  [pdf, other

    cs.CL cs.AI

    Overview of the EHRSQL 2024 Shared Task on Reliable Text-to-SQL Modeling on Electronic Health Records

    Authors: Gyubok Lee, Sunjun Kweon, Seongsu Bae, Edward Choi

    Abstract: Electronic Health Records (EHRs) are relational databases that store the entire medical histories of patients within hospitals. They record numerous aspects of patients' medical care, from hospital admission and diagnosis to treatment and discharge. While EHRs are vital sources of clinical data, exploring them beyond a predefined set of queries requires skills in query languages like SQL. To make… ▽ More

    Submitted 23 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: The 6th Clinical Natural Language Processing Workshop at NAACL 2024; Minor Change from Camera-Ready

  30. arXiv:2405.02042  [pdf, other

    cs.IT

    Sampling to Achieve the Goal: An Age-aware Remote Markov Decision Process

    Authors: Aimin Li, Shaohua Wu, Gary C. F. Lee, Xiaomeng Cheng, Sumei Sun

    Abstract: Age of Information (AoI) has been recognized as an important metric to measure the freshness of information. Central to this consensus is that minimizing AoI can enhance the freshness of information, thereby facilitating the accuracy of subsequent decision-making processes. However, to date the direct causal relationship that links AoI to the utility of the decision-making process is unexplored. T… ▽ More

    Submitted 11 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures

  31. arXiv:2405.01884  [pdf, other

    cs.CL

    Beyond Single-Event Extraction: Towards Efficient Document-Level Multi-Event Argument Extraction

    Authors: Wanlong Liu, Li Zhou, Dingyi Zeng, Yichen Xiao, Shaohuan Cheng, Chen Zhang, Grandee Lee, Malu Zhang, Wenyu Chen

    Abstract: Recent mainstream event argument extraction methods process each event in isolation, resulting in inefficient inference and ignoring the correlations among multiple events. To address these limitations, here we propose a multiple-event argument extraction model DEEIA (Dependency-guided Encoding and Event-specific Information Aggregation), capable of extracting arguments from all events within a do… ▽ More

    Submitted 16 June, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted to Findings of ACL 2024

  32. arXiv:2405.01588  [pdf, other

    cs.CL cs.AI

    Towards Unbiased Evaluation of Detecting Unanswerable Questions in EHRSQL

    Authors: Yongjin Yang, Sihyeon Kim, SangMook Kim, Gyubok Lee, Se-Young Yun, Edward Choi

    Abstract: Incorporating unanswerable questions into EHR QA systems is crucial for testing the trustworthiness of a system, as providing non-existent responses can mislead doctors in their diagnoses. The EHRSQL dataset stands out as a promising benchmark because it is the only dataset that incorporates unanswerable questions in the EHR QA system alongside practical questions. However, in this work, we identi… ▽ More

    Submitted 28 April, 2024; originally announced May 2024.

    Comments: DPFM Workshop, ICLR 2024

  33. arXiv:2404.14329  [pdf, other

    cs.CV

    X-Ray: A Sequential 3D Representation For Generation

    Authors: Tao Hu, Wenhang Ge, Yuyang Zhao, Gim Hee Lee

    Abstract: We introduce X-Ray, a novel 3D sequential representation inspired by the penetrability of x-ray scans. X-Ray transforms a 3D object into a series of surface frames at different layers, making it suitable for generating 3D models from images. Our method utilizes ray casting from the camera center to capture geometric and textured details, including depth, normal, and color, across all intersected s… ▽ More

    Submitted 1 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  34. arXiv:2404.11291  [pdf, other

    cs.CV

    Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption

    Authors: Buzhen Huang, Chen Li, Chongyang Xu, Liang Pan, Yangang Wang, Gim Hee Lee

    Abstract: Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration, but overlook the modeling of close interactions. In this work, we tackle the task of reconstructing closely interactive humans from a monocular video. The main challenge of this task comes from insufficient visual information caused by depth ambiguity and severe inter-person occ… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  35. arXiv:2404.06814  [pdf, other

    cs.CV

    Zero-shot Point Cloud Completion Via 2D Priors

    Authors: Tianxin Huang, Zhiwen Yan, Yuyang Zhao, Gim Hee Lee

    Abstract: 3D point cloud completion is designed to recover complete shapes from partially observed point clouds. Conventional completion methods typically depend on extensive point cloud data for training %, with their effectiveness often constrained to object categories similar to those seen during training. In contrast, we propose a zero-shot framework aimed at completing partially observed point clouds a… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  36. arXiv:2404.02592  [pdf

    cs.CL cs.SD eess.AS

    Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation

    Authors: Yejin Jeon, Yunsu Kim, Gary Geunbae Lee

    Abstract: Contemporary neural speech synthesis models have indeed demonstrated remarkable proficiency in synthetic speech generation as they have attained a level of quality comparable to that of human-produced speech. Nevertheless, it is important to note that these achievements have predominantly been verified within the context of high-resource languages such as English. Furthermore, the Tacotron and Fas… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted to LREC-COLING 2024

  37. arXiv:2404.02157  [pdf, other

    cs.CV cs.AI

    Segment Any 3D Object with Language

    Authors: Seungjun Lee, Yuyang Zhao, Gim Hee Lee

    Abstract: In this paper, we investigate Open-Vocabulary 3D Instance Segmentation (OV-3DIS) with free-form language instructions. Earlier works that rely on only annotated base categories for training suffer from limited generalization to unseen novel categories. Recent works mitigate poor generalizability to novel categories by generating class-agnostic masks or projecting generalized masks from 2D to 3D, b… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Project Page: https://cvrp-sole.github.io

  38. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  39. arXiv:2404.01842  [pdf, other

    cs.CV

    Semi-Supervised Domain Adaptation for Wildfire Detection

    Authors: JooYoung Jang, Youngseo Cha, Jisu Kim, SooHyung Lee, Geonu Lee, Minkook Cho, Young Hwang, Nojun Kwak

    Abstract: Recently, both the frequency and intensity of wildfires have increased worldwide, primarily due to climate change. In this paper, we propose a novel protocol for wildfire detection, leveraging semi-supervised Domain Adaptation for object detection, accompanied by a corresponding dataset designed for use by both academics and industries. Our dataset encompasses 30 times more diverse labeled scenes… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 16 pages, 5 figures, 22 tables

  40. arXiv:2404.00931  [pdf, other

    cs.CV

    GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields

    Authors: Yunsong Wang, Hanlin Chen, Gim Hee Lee

    Abstract: Recent advancements in vision-language foundation models have significantly enhanced open-vocabulary 3D scene understanding. However, the generalizability of existing methods is constrained due to their framework designs and their reliance on 3D data. We address this limitation by introducing Generalizable Open-Vocabulary Neural Semantic Fields (GOV-NeSF), a novel approach offering a generalizable… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  41. arXiv:2404.00874  [pdf, other

    cs.CV

    DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF

    Authors: Jie Long Lee, Chen Li, Gim Hee Lee

    Abstract: We present DiSR-NeRF, a diffusion-guided framework for view-consistent super-resolution (SR) NeRF. Unlike prior works, we circumvent the requirement for high-resolution (HR) reference images by leveraging existing powerful 2D super-resolution models. Nonetheless, independent SR 2D images are often inconsistent across different views. We thus propose Iterative 3D Synchronization (I3DS) to mitigate… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  42. arXiv:2404.00571  [pdf, other

    cs.CL

    Explainable Multi-hop Question Generation: An End-to-End Approach without Intermediate Question Labeling

    Authors: Seonjeong Hwang, Yunsu Kim, Gary Geunbae Lee

    Abstract: In response to the increasing use of interactive artificial intelligence, the demand for the capacity to handle complex questions has increased. Multi-hop question generation aims to generate complex questions that requires multi-step reasoning over several documents. Previous studies have predominantly utilized end-to-end models, wherein questions are decoded based on the representation of contex… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: LREC-Coling 2024

  43. arXiv:2403.17611  [pdf, other

    cs.CL cs.AI

    Denoising Table-Text Retrieval for Open-Domain Question Answering

    Authors: Deokhyung Kang, Baikjin Jung, Yunsu Kim, Gary Geunbae Lee

    Abstract: In table-text open-domain question answering, a retriever system retrieves relevant evidence from tables and text to answer questions. Previous studies in table-text open-domain question answering have two common challenges: firstly, their retrievers can be affected by false-positive labels in training datasets; secondly, they may struggle to provide appropriate evidence for questions that require… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  44. arXiv:2403.15879  [pdf, other

    cs.AI

    TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring

    Authors: Gyubok Lee, Woosog Chay, Seonhee Cho, Edward Choi

    Abstract: Text-to-SQL enables users to interact with databases using natural language, simplifying the retrieval and synthesis of information. Despite the remarkable success of large language models (LLMs) in translating natural language questions into SQL queries, widespread deployment remains limited due to two primary challenges. First, the effective use of text-to-SQL models depends on users' understand… ▽ More

    Submitted 2 July, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

    Comments: under review

  45. arXiv:2403.14111  [pdf, other

    cs.CR cs.LG

    HETAL: Efficient Privacy-preserving Transfer Learning with Homomorphic Encryption

    Authors: Seewoo Lee, Garam Lee, Jung Woo Kim, Junbum Shin, Mun-Kyu Lee

    Abstract: Transfer learning is a de facto standard method for efficiently training machine learning models for data-scarce problems by adding and fine-tuning new classification layers to a model pre-trained on large datasets. Although numerous previous studies proposed to use homomorphic encryption to resolve the data privacy issue in transfer learning in the machine learning as a service setting, most of t… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: ICML 2023, Appendix D includes some updates after official publication

    Journal ref: PMLR 202:19010-19035, 2023

  46. arXiv:2403.11324  [pdf, other

    cs.CV

    GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering

    Authors: Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari

    Abstract: During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces. This degradation significantly affects the rendering quality of novel views that deviate significantly from the viewpoints in the training data. To mitigate this issue,… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  47. arXiv:2403.10119  [pdf, other

    cs.CV

    URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields

    Authors: Bo Xu, Ziao Liu, Mengqi Guo, Jiancheng Li, Gim Hee Lee

    Abstract: We propose a novel rolling shutter bundle adjustment method for neural radiance fields (NeRF), which utilizes the unordered rolling shutter (RS) images to obtain the implicit 3D representation. Existing NeRF methods suffer from low-quality images and inaccurate initial camera poses due to the RS effect in the image, whereas, the previous method that incorporates the RS into NeRF requires strict se… ▽ More

    Submitted 24 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  48. arXiv:2403.08332  [pdf, other

    cs.CL cs.AI

    Autoregressive Score Generation for Multi-trait Essay Scoring

    Authors: Heejin Do, Yunsu Kim, Gary Geunbae Lee

    Abstract: Recently, encoder-only pre-trained models such as BERT have been successfully applied in automated essay scoring (AES) to predict a single overall score. However, studies have yet to explore these models in multi-trait AES, possibly due to the inefficiency of replicating BERT-based models for each trait. Breaking away from the existing sole use of encoder, we propose an autoregressive prediction o… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted at EACL2024 Findings

  49. arXiv:2403.04111  [pdf

    cs.SD eess.AS

    Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication

    Authors: Yejin Jeon, Gary Geunbae Lee

    Abstract: This paper explores the task of language-agnostic speaker replication, a novel endeavor that seeks to replicate a speaker's voice irrespective of the language they are speaking. Towards this end, we introduce a multi-level attention aggregation approach that systematically probes and amplifies various speaker-specific attributes in a hierarchical manner. Through rigorous evaluations across a wide… ▽ More

    Submitted 3 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted to EACL Main 2024

  50. arXiv:2402.12974  [pdf, other

    cs.CV

    Visual Style Prompting with Swapping Self-Attention

    Authors: Jaeseok Jeong, Junho Kim, Yunjey Choi, Gayoung Lee, Youngjung Uh

    Abstract: In the evolving domain of text-to-image generation, diffusion models have emerged as powerful tools in content creation. Despite their remarkable capability, existing models still face challenges in achieving controlled generation with a consistent style, requiring costly fine-tuning or often inadequately transferring the visual elements due to content leakage. To address these challenges, we prop… ▽ More

    Submitted 21 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.