subscribe to arXiv mailings

ESM+: Modern Insights into Perspective on Text-to-SQL Evaluation in the Age of Large Language Models

Authors: Benjamin Ascoli, Ram Kandikonda, Jinho D. Choi

Abstract: The task of Text-to-SQL enables anyone to retrieve information from SQL databases using natural language. Despite several challenges, recent models have made remarkable advancements in this task using large language models (LLMs). Interestingly, we find that LLM-based models without fine-tuning exhibit distinct natures compared to their fine-tuned counterparts, leading to inadequacies in current e… ▽ More The task of Text-to-SQL enables anyone to retrieve information from SQL databases using natural language. Despite several challenges, recent models have made remarkable advancements in this task using large language models (LLMs). Interestingly, we find that LLM-based models without fine-tuning exhibit distinct natures compared to their fine-tuned counterparts, leading to inadequacies in current evaluation metrics to accurately convey their performance. Thus, we analyze the two primary metrics, Test Suite Execution Accuracy (EXE) and Exact Set Matching Accuracy (ESM), to examine their robustness for this task and address shortcomings. We compare the performance of 9 LLM-based models using EXE, the original ESM, and our improved ESM (called ESM+). Our results show that EXE and ESM have high false positive and negative rates of 11.3% and 13.9%, while ESM+ gives those of 0.1% and 2.6% respectively, providing a significantly more stable evaluation. We release the ESM+ script as open-source for the community to contribute, while enjoying a more reliable assessment of Text-to-SQL. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2406.19634 [pdf, other]

CLOi-Mapper: Consistent, Lightweight, Robust, and Incremental Mapper With Embedded Systems for Commercial Robot Services

Authors: DongKi Noh, Hyungtae Lim, Gyuho Eoh, Duckyu Choi, Jeongsik Choi, Hyunjun Lim, SeungMin Baek, Hyun Myung

Abstract: In commercial autonomous service robots with several form factors, simultaneous localization and mapping (SLAM) is an essential technology for providing proper services such as cleaning and guidance. Such robots require SLAM algorithms suitable for specific applications and environments. Hence, several SLAM frameworks have been proposed to address various requirements in the past decade. However,… ▽ More In commercial autonomous service robots with several form factors, simultaneous localization and mapping (SLAM) is an essential technology for providing proper services such as cleaning and guidance. Such robots require SLAM algorithms suitable for specific applications and environments. Hence, several SLAM frameworks have been proposed to address various requirements in the past decade. However, we have encountered challenges in implementing recent innovative frameworks when handling service robots with low-end processors and insufficient sensor data, such as low-resolution 2D LiDAR sensors. Specifically, regarding commercial robots, consistent performance in different hardware configurations and environments is more crucial than the performance dedicated to specific sensors or environments. Therefore, we propose a) a multi-stage %hierarchical approach for global pose estimation in embedded systems; b) a graph generation method with zero constraints for synchronized sensors; and c) a robust and memory-efficient method for long-term pose-graph optimization. As verified in in-home and large-scale indoor environments, the proposed method yields consistent global pose estimation for services in commercial fields. Furthermore, the proposed method exhibits potential commercial viability considering the consistent performance verified via mass production and long-term (> 5 years) operation. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Journal ref: IEEE Robotics and Automation Letters, 2024

arXiv:2406.14546 [pdf, other]

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Authors: Johannes Treutlein, Dami Choi, Jan Betley, Cem Anil, Samuel Marks, Roger Baker Grosse, Owain Evans

Abstract: One way to address safety risks from large language models (LLMs) is to censor dangerous knowledge from their training data. While this removes the explicit information, implicit information can remain scattered across various training documents. Could an LLM infer the censored knowledge by piecing together these implicit hints? As a step towards answering this question, we study inductive out-of-… ▽ More One way to address safety risks from large language models (LLMs) is to censor dangerous knowledge from their training data. While this removes the explicit information, implicit information can remain scattered across various training documents. Could an LLM infer the censored knowledge by piecing together these implicit hints? As a step towards answering this question, we study inductive out-of-context reasoning (OOCR), a type of generalization in which LLMs infer latent information from evidence distributed across training documents and apply it to downstream tasks without in-context learning. Using a suite of five tasks, we demonstrate that frontier LLMs can perform inductive OOCR. In one experiment we finetune an LLM on a corpus consisting only of distances between an unknown city and other known cities. Remarkably, without in-context examples or Chain of Thought, the LLM can verbalize that the unknown city is Paris and use this fact to answer downstream questions. Further experiments show that LLMs trained only on individual coin flip outcomes can verbalize whether the coin is biased, and those trained only on pairs $(x,f(x))$ can articulate a definition of $f$ and compute inverses. While OOCR succeeds in a range of cases, we also show that it is unreliable, particularly for smaller LLMs learning complex structures. Overall, the ability of LLMs to "connect the dots" without explicit in-context learning poses a potential obstacle to monitoring and controlling the knowledge acquired by LLMs. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.09138 [pdf, other]

Leveraging Explicit Reasoning for Inference Integration in Commonsense-Augmented Dialogue Models

Authors: Sarah E. Finch, Jinho D. Choi

Abstract: Open-domain dialogue systems need to grasp social commonsense to understand and respond effectively to human users. Commonsense-augmented dialogue models have been proposed that aim to infer commonsense knowledge from dialogue contexts in order to improve response quality. However, existing approaches to commonsense-augmented dialogue rely on implicit reasoning to integrate commonsense inferences… ▽ More Open-domain dialogue systems need to grasp social commonsense to understand and respond effectively to human users. Commonsense-augmented dialogue models have been proposed that aim to infer commonsense knowledge from dialogue contexts in order to improve response quality. However, existing approaches to commonsense-augmented dialogue rely on implicit reasoning to integrate commonsense inferences during response generation. In this study, we explore the impact of explicit reasoning against implicit reasoning over commonsense for dialogue response generation. Our findings demonstrate that separating commonsense reasoning into explicit steps for generating, selecting, and integrating commonsense into responses leads to better dialogue interactions, improving naturalness, engagement, specificity, and overall quality. Subsequent analyses of these findings unveil insights into the effectiveness of various types of commonsense in generating responses and the particular response traits enhanced through explicit reasoning for commonsense integration. Our work advances research in open-domain dialogue by achieving a new state-of-the-art in commonsense-augmented response generation. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.07800 [pdf, other]

Regularizing and Aggregating Clients with Class Distribution for Personalized Federated Learning

Authors: Gyuejeong Lee, Daeyoung Choi

Abstract: Personalized federated learning (PFL) enables customized models for clients with varying data distributions. However, existing PFL methods often incur high computational and communication costs, limiting their practical application. This paper proposes a novel PFL method, Class-wise Federated Averaging (cwFedAVG), that performs Federated Averaging (FedAVG) class-wise, creating multiple global mode… ▽ More Personalized federated learning (PFL) enables customized models for clients with varying data distributions. However, existing PFL methods often incur high computational and communication costs, limiting their practical application. This paper proposes a novel PFL method, Class-wise Federated Averaging (cwFedAVG), that performs Federated Averaging (FedAVG) class-wise, creating multiple global models per class on the server. Each local model integrates these global models weighted by its estimated local class distribution, derived from the L2-norms of deep network weights, avoiding privacy violations. Afterward, each global model does the same with local models using the same method. We also newly designed Weight Distribution Regularizer (WDR) to further enhance the accuracy of estimating a local class distribution by minimizing the Euclidean distance between the class distribution and the weight norms' distribution. Experimental results demonstrate that cwFedAVG matches or outperforms several existing PFL methods. Notably, cwFedAVG is conceptually simple yet computationally efficient as it mitigates the need for extensive calculation to collaborate between clients by leveraging shared global models. Visualizations provide insights into how cwFedAVG enables local model specialization on respective class distributions while global models capture class-relevant information across clients. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.03705 [pdf, other]

Coherent control of a triangular exchange-only spin qubit

Authors: Edwin Acuna, Joseph D. Broz, Kaushal Shyamsundar, Antonio B. Mei, Colin P. Feeney, Valerie Smetanka, Tiffany Davis, Kangmu Lee, Maxwell D. Choi, Brydon Boyd, June Suh, Wonill D. Ha, Cameron Jennings, Andrew S. Pan, Daniel S. Sanchez, Matthew D. Reed, Jason R. Petta

Abstract: We demonstrate coherent control of a three-electron exchange-only spin qubit with the quantum dots arranged in a close-packed triangular geometry. The device is tuned to confine one electron in each quantum dot, as evidenced by pairwise charge stability diagrams. Time-domain control of the exchange coupling is demonstrated and qubit performance is characterized using blind randomized benchmarking,… ▽ More We demonstrate coherent control of a three-electron exchange-only spin qubit with the quantum dots arranged in a close-packed triangular geometry. The device is tuned to confine one electron in each quantum dot, as evidenced by pairwise charge stability diagrams. Time-domain control of the exchange coupling is demonstrated and qubit performance is characterized using blind randomized benchmarking, with an average single-qubit gate fidelity F = 99.84%. The compact triangular device geometry can be readily scaled to larger two-dimensional quantum dot arrays with high connectivity. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.03663 [pdf]

A Hybrid Deep Learning Classification of Perimetric Glaucoma Using Peripapillary Nerve Fiber Layer Reflectance and Other OCT Parameters from Three Anatomy Regions

Authors: Ou Tan, David S. Greenfield, Brian A. Francis, Rohit Varma, Joel S. Schuman, David Huang, Dongseok Choi

Abstract: Precis: A hybrid deep-learning model combines NFL reflectance and other OCT parameters to improve glaucoma diagnosis. Objective: To investigate if a deep learning model could be used to combine nerve fiber layer (NFL) reflectance and other OCT parameters for glaucoma diagnosis. Patients and Methods: This is a prospective observational study where of 106 normal subjects and 164 perimetric glaucoma… ▽ More Precis: A hybrid deep-learning model combines NFL reflectance and other OCT parameters to improve glaucoma diagnosis. Objective: To investigate if a deep learning model could be used to combine nerve fiber layer (NFL) reflectance and other OCT parameters for glaucoma diagnosis. Patients and Methods: This is a prospective observational study where of 106 normal subjects and 164 perimetric glaucoma (PG) patients. Peripapillary NFL reflectance map, NFL thickness map, optic head analysis of disc, and macular ganglion cell complex thickness were obtained using spectral domain OCT. A hybrid deep learning model combined a fully connected network (FCN) and a convolution neural network (CNN) to develop and combine those OCT maps and parameters to distinguish normal and PG eyes. Two deep learning models were compared based on whether the NFL reflectance map was used as part of the input or not. Results: The hybrid deep learning model with reflectance achieved 0.909 sensitivity at 99% specificity and 0.926 at 95%. The overall accuracy was 0.948 with 0.893 sensitivity and 1.000 specificity, and the AROC was 0.979, which is significantly better than the logistic regression models (p < 0.001). The second best model is the hybrid deep learning model w/o reflectance, which also had significantly higher AROC than logistic regression models (p < 0.001). Logistic regression with reflectance model had slightly higher AROC or sensitivity than the other logistic regression model without reflectance (p = 0.024). Conclusions: Hybrid deep learning model significantly improved the diagnostic accuracy, without or without NFL reflectance. Hybrid deep learning model, combining reflectance/NFL thickness/GCC thickness/ONH parameter, may be a practical model for glaucoma screen purposes. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 12 pages

arXiv:2406.00170 [pdf]

Focal Loss Analysis of Peripapillary Nerve Fiber Layer Reflectance for Glaucoma Diagnosis

Authors: Ou Tan, Dongseok Choi, Aiyin Chen, David S. Greenfield, Brian A. Francis, Rohit Varma, Joel S. Schuman, David Huang, Advanced Imaging for Glaucoma Study Group

Abstract: Purpose: To evaluate nerve fiber layer (NFL) reflectance for glaucoma diagnosis using a large dataset. Methods: Participants were imaged with 4.9mm ONH scans using spectral-domain optical coherence tomography (OCT). The NFL reflectance map was reconstructed from 13 concentric rings of optic nerve head(ONH) scan, then processed by an azimuthal filter to reduce directional reflectance bias due to va… ▽ More Purpose: To evaluate nerve fiber layer (NFL) reflectance for glaucoma diagnosis using a large dataset. Methods: Participants were imaged with 4.9mm ONH scans using spectral-domain optical coherence tomography (OCT). The NFL reflectance map was reconstructed from 13 concentric rings of optic nerve head(ONH) scan, then processed by an azimuthal filter to reduce directional reflectance bias due to variation of beam incidence angle. The peripapillary thickness and reflectance maps were both divided into 96 superpixels. Low-reflectance and low-thickness superpixels were defined as values below the 5th percentile normative reference for that location. Focal reflectance loss was measured by summing loss, relative to the normal reference average, in low-reflectance superpixels. Focal thickness loss was calculated in a similar fashion. The area under receiving characteristic curve (AROC) was used to assess diagnostic accuracy. Results: Fifty-three normal, 196 pre-perimetric, 132 early perimetric, and 59 moderate and advanced perimetric glaucoma participants were included from the Advanced Imaging for Glaucoma Study. Sixty-seven percent of glaucomatous reflectance maps showed characteristic contiguous wedge or diffuse defects. Focal NFL reflectance loss had significantly higher diagnostic accuracy than the best NFL thickness parameters (both map-based and profile-based): AROC 0.80 v. 0.75 (p<0.004) for distinguishing glaucoma eyes from healthy control eyes. The diagnostic sensitivity was also significantly higher at both 99% and 95% specificity operating points. Conclusions: Focal NFL reflectance loss improved glaucoma diagnostic accuracy compared to the standard NFL thickness parameters. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 18 pages. arXiv admin note: text overlap with arXiv:2006.13522

arXiv:2406.00168 [pdf]

Reliability for Nerve Fiber Layer Reflectance Using Spectral Domain Optical Coherence Tomography

Authors: Kabir Hossain, Ou Tan, Po-Han Yeh, Jie Wang, Elizabeth White, Dongseok Choi, David Huang

Abstract: Purpose: Reliability for Nerve Fiber Layer Reflectance Using Spectral Domain Optical Coherence Tomography (OCT) Methods: The study utilized OCT to scan participants with a cubic 6x6 mm disc scan. NFL reflectance were normalized by the average of bands below NFL and summarized. We selected several reference bands, including the pigment epithelium complex (PPEC), the band between NFL and Bruch's mem… ▽ More Purpose: Reliability for Nerve Fiber Layer Reflectance Using Spectral Domain Optical Coherence Tomography (OCT) Methods: The study utilized OCT to scan participants with a cubic 6x6 mm disc scan. NFL reflectance were normalized by the average of bands below NFL and summarized. We selected several reference bands, including the pigment epithelium complex (PPEC), the band between NFL and Bruch's membrane (Post-NFL), and the top 50% of pixels with higher values were selected from the Post-NFL band by Post-NFL-Bright. Especially, we also included NFL attenuation coefficient (AC), which was equivalent to NFL reflectance normalized by all pixels below NFL. An experiment was designed to test the NFL reflectance against different levels of attenuation using neutral density filter (NDF). We also evaluated the within-visit and between-visit repeatability using a clinical dataset with normal and glaucoma eyes. Results: The experiment enrolled 20 healthy participants. The clinical dataset selected 22 normal and 55 glaucoma eyes with at least two visits form functional and structural OCT (FSOCT) study. The experiment showed that NFL reflectance normalized PPEC Max and Post-NFL-Bright had lowest dependence, slope=-0.77 and -1.34 dB/optical density on NDF levels, respectively. The clinical data showed that the NFL reflectance metrics normalized by Post-NFL-Bright or Post-NFL-Mean metrics had a trend of better repeatability and reproducibility than others, but the trend was not significant. All metrics demonstrated similar diagnostic accuracy (0.82-0.87), but Post-NFL-Bright provide the best result. Conclusions: The NFL reflectance normalized by the maximum in PPEC had less dependence of the global attenuation followed by Post-NFL-Bright, PPEC/Mean, Post-NFL-Mean and NFL/AC. But NFL reflectance normalized by Post-NFL-Bright had better result in two datasets. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 13 pages

arXiv:2405.15229 [pdf, other]

Multi-Orbital Interactions and Spin Polarization in Single Rare-Earth Adatoms

Authors: Massine Kelai, Stefano Reale, Roberto Robles, Jaehyun Lee, Divya Jyoti, Philippe Ohresser, Edwige Otero, Fadi Choueikani, Fabrice Scheurer, Nicolás Lorente, Deung-Jang Choi, Aparajita Singha, Fabio Donati

Abstract: Surface-adsorbed rare-earth nanostructures are ideal platforms to investigate the interplay between intra-atomic interactions and multi-orbital spin configurations. However, addressing these properties has posed severe experimental and theoretical challenges. Here, we use the orbital selectivity offered by X-ray absorption spectroscopy to quantify the Coulomb integrals of Nd atoms on conductive su… ▽ More Surface-adsorbed rare-earth nanostructures are ideal platforms to investigate the interplay between intra-atomic interactions and multi-orbital spin configurations. However, addressing these properties has posed severe experimental and theoretical challenges. Here, we use the orbital selectivity offered by X-ray absorption spectroscopy to quantify the Coulomb integrals of Nd atoms on conductive surfaces, as well as the variation of individual orbital occupation upon cluster nucleation. Using X-ray magnetic circular dichroism we identify magnetic moments of the order of \MK{few tens of}~$μ_{\rm{B}}$ at the $5d$ orbitals and their magnetic coupling with the $4f$ spins. Our results validate orbital-resolved X-ray spectroscopy as a reliable method for quantifying complex multi-orbital interactions in surface-adsorbed lanthanides. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.12856 [pdf, other]

LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language

Authors: James Requeima, John Bronskill, Dami Choi, Richard E. Turner, David Duvenaud

Abstract: Machine learning practitioners often face significant challenges in formally integrating their prior knowledge and beliefs into predictive models, limiting the potential for nuanced and context-aware analyses. Moreover, the expertise needed to integrate this prior knowledge into probabilistic modeling typically limits the application of these models to specialists. Our goal is to build a regressio… ▽ More Machine learning practitioners often face significant challenges in formally integrating their prior knowledge and beliefs into predictive models, limiting the potential for nuanced and context-aware analyses. Moreover, the expertise needed to integrate this prior knowledge into probabilistic modeling typically limits the application of these models to specialists. Our goal is to build a regression model that can process numerical data and make probabilistic predictions at arbitrary locations, guided by natural language text which describes a user's prior knowledge. Large Language Models (LLMs) provide a useful starting point for designing such a tool since they 1) provide an interface where users can incorporate expert insights in natural language and 2) provide an opportunity for leveraging latent problem-relevant knowledge encoded in LLMs that users may not have themselves. We start by exploring strategies for eliciting explicit, coherent numerical predictive distributions from LLMs. We examine these joint predictive distributions, which we call LLM Processes, over arbitrarily-many quantities in settings such as forecasting, multi-dimensional regression, black-box optimization, and image modeling. We investigate the practical details of prompting to elicit coherent predictive distributions, and demonstrate their effectiveness at regression. Finally, we demonstrate the ability to usefully incorporate text into numerical predictions, improving predictive performance and giving quantitative structure that reflects qualitative descriptions. This lets us begin to explore the rich, grounded hypothesis space that LLMs implicitly encode. △ Less

Submitted 25 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.12468 [pdf, other]

Diverse and Effective Synthetic Data Generation for Adaptable Zero-Shot Dialogue State Tracking

Authors: James D. Finch, Jinho D. Choi

Abstract: We demonstrate substantial performance gains in zero-shot dialogue state tracking (DST) by enhancing training data diversity through synthetic data generation. Existing DST datasets are severely limited in the number of application domains and slot types they cover due to the high costs of data collection, restricting their adaptability to new domains. This work addresses this challenge with a nov… ▽ More We demonstrate substantial performance gains in zero-shot dialogue state tracking (DST) by enhancing training data diversity through synthetic data generation. Existing DST datasets are severely limited in the number of application domains and slot types they cover due to the high costs of data collection, restricting their adaptability to new domains. This work addresses this challenge with a novel, fully automatic data generation approach that creates synthetic zero-shot DST datasets. Distinguished from previous methods, our approach can generate dialogues across a massive range of application domains, complete with silver-standard dialogue state annotations and slot descriptions. This technique is used to create the D0T dataset for training zero-shot DST models, encompassing an unprecedented 1,000+ domains. Experiments on the MultiWOZ benchmark show that training models on diverse synthetic data improves Joint Goal Accuracy by 6.7%, achieving results competitive with models 13.5 times larger than ours. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.11178 [pdf, other]

Automating PTSD Diagnostics in Clinical Interviews: Leveraging Large Language Models for Trauma Assessments

Authors: Sichang Tu, Abigail Powers, Natalie Merrill, Negar Fani, Sierra Carter, Stephen Doogan, Jinho D. Choi

Abstract: The shortage of clinical workforce presents significant challenges in mental healthcare, limiting access to formal diagnostics and services. We aim to tackle this shortage by integrating a customized large language model (LLM) into the workflow, thus promoting equity in mental healthcare for the general population. Although LLMs have showcased their capability in clinical decision-making, their ad… ▽ More The shortage of clinical workforce presents significant challenges in mental healthcare, limiting access to formal diagnostics and services. We aim to tackle this shortage by integrating a customized large language model (LLM) into the workflow, thus promoting equity in mental healthcare for the general population. Although LLMs have showcased their capability in clinical decision-making, their adaptation to severe conditions like Post-traumatic Stress Disorder (PTSD) remains largely unexplored. Therefore, we collect 411 clinician-administered diagnostic interviews and devise a novel approach to obtain high-quality data. Moreover, we build a comprehensive framework to automate PTSD diagnostic assessments based on interview contents by leveraging two state-of-the-art LLMs, GPT-4 and Llama-2, with potential for broader clinical diagnoses. Our results illustrate strong promise for LLMs, tested on our dataset, to aid clinicians in diagnostic validation. To the best of our knowledge, this is the first AI system that fully automates assessments for mental illness based on clinician-administered interviews. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.04497 [pdf, other]

Unveiling Disparities in Web Task Handling Between Human and Web Agent

Authors: Kihoon Son, Jinhyeon Kwon, DaEun Choi, Tae Soo Kim, Young-Ho Kim, Sangdoo Yun, Juho Kim

Abstract: With the advancement of Large-Language Models (LLMs) and Large Vision-Language Models (LVMs), agents have shown significant capabilities in various tasks, such as data analysis, gaming, or code generation. Recently, there has been a surge in research on web agents, capable of performing tasks within the web environment. However, the web poses unforeseeable scenarios, challenging the generalizabili… ▽ More With the advancement of Large-Language Models (LLMs) and Large Vision-Language Models (LVMs), agents have shown significant capabilities in various tasks, such as data analysis, gaming, or code generation. Recently, there has been a surge in research on web agents, capable of performing tasks within the web environment. However, the web poses unforeseeable scenarios, challenging the generalizability of these agents. This study investigates the disparities between human and web agents' performance in web tasks (e.g., information search) by concentrating on planning, action, and reflection aspects during task execution. We conducted a web task study with a think-aloud protocol, revealing distinct cognitive actions and operations on websites employed by humans. Comparative examination of existing agent structures and human behavior with thought processes highlighted differences in knowledge updating and ambiguity handling when performing the task. Humans demonstrated a propensity for exploring and modifying plans based on additional information and investigating reasons for failure. These findings offer insights into designing planning, reflection, and information discovery modules for web agents and designing the capturing method for implicit human knowledge in a web task. △ Less

Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.00523 [pdf, other]

CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions

Authors: Donghee Choi, Mogan Gim, Donghyeon Park, Mujeen Sung, Hyunjae Kim, Jaewoo Kang, Jihun Choi

Abstract: This paper introduces CookingSense, a descriptive collection of knowledge assertions in the culinary domain extracted from various sources, including web data, scientific papers, and recipes, from which knowledge covering a broad range of aspects is acquired. CookingSense is constructed through a series of dictionary-based filtering and language model-based semantic filtering techniques, which res… ▽ More This paper introduces CookingSense, a descriptive collection of knowledge assertions in the culinary domain extracted from various sources, including web data, scientific papers, and recipes, from which knowledge covering a broad range of aspects is acquired. CookingSense is constructed through a series of dictionary-based filtering and language model-based semantic filtering techniques, which results in a rich knowledgebase of multidisciplinary food-related assertions. Additionally, we present FoodBench, a novel benchmark to evaluate culinary decision support systems. From evaluations with FoodBench, we empirically prove that CookingSense improves the performance of retrieval augmented language models. We also validate the quality and variety of assertions in CookingSense through qualitative analysis. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: LREC-COLING 2024 Accepted

arXiv:2404.14392 [pdf, other]

Direct observation of Floquet-Bloch states in monolayer graphene

Authors: Dongsung Choi, Masataka Mogi, Umberto De Giovannini, Doron Azoury, Baiqing Lv, Yifan Su, Hannes Hübener, Angel Rubio, Nuh Gedik

Abstract: Floquet engineering is a novel method of manipulating quantum phases of matter via periodic driving [1, 2]. It has successfully been utilized in different platforms ranging from photonic systems [3] to optical lattice of ultracold atoms [4, 5]. In solids, light can be used as the periodic drive via coherent light-matter interaction. This leads to hybridization of Bloch electrons with photons resul… ▽ More Floquet engineering is a novel method of manipulating quantum phases of matter via periodic driving [1, 2]. It has successfully been utilized in different platforms ranging from photonic systems [3] to optical lattice of ultracold atoms [4, 5]. In solids, light can be used as the periodic drive via coherent light-matter interaction. This leads to hybridization of Bloch electrons with photons resulting in replica bands known as Floquet-Bloch states. After the direct observation of Floquet-Bloch states in a topological insulator [6], their manifestations have been seen in a number of other experiments [7-14]. By engineering the electronic band structure using Floquet-Bloch states, various exotic phase transitions have been predicted [15-22] to occur. To realize these phases, it is necessary to better understand the nature of Floquet-Bloch states in different materials. However, direct energy and momentum resolved observation of these states is still limited to only few material systems [6, 10, 14, 23, 24]. Here, we report direct observation of Floquet-Bloch states in monolayer epitaxial graphene which was the first proposed material platform [15] for Floquet engineering. By using time- and angle-resolved photoemission spectroscopy (trARPES) with mid-infrared (mid-IR) pump excitation, we detected replicas of the Dirac cone. Pump polarization dependence of these replica bands unequivocally shows that they originate from the scattering between Floquet-Bloch states and photon-dressed free-electron-like photoemission final states, called Volkov states. Beyond graphene, our method can potentially be used to directly observe Floquet-Bloch states in other systems paving the way for Floquet engineering in a wide range of quantum materials. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.10860 [pdf, ps, other]

Line bundles on Contractions of $\overline{\rm{M}}_{0,n}$ via Conformal Block Divisors

Authors: Daebeom Choi

Abstract: The moduli space of stable curves of genus $g$ with $n$ marked points, $\overline{\rm{M}}_{g,n}$, is a central object in algebraic geometry, and plays a crucial role in $2$-dimensional conformal field theory. In this paper, we apply the sheaf of coinvariants and conformal block divisors to study the geometry of $\overline{\rm{M}}_{0,n}$. The main theorem characterizes the line bundles on certain c… ▽ More The moduli space of stable curves of genus $g$ with $n$ marked points, $\overline{\rm{M}}_{g,n}$, is a central object in algebraic geometry, and plays a crucial role in $2$-dimensional conformal field theory. In this paper, we apply the sheaf of coinvariants and conformal block divisors to study the geometry of $\overline{\rm{M}}_{0,n}$. The main theorem characterizes the line bundles on certain contractions of $\overline{\rm{M}}_{0,n}$ via F-curves, using Fakhruddin's basis for the Picard group given by conformal block divisors. This reveals a distinguished property of Knudsen's construction of $\overline{\rm{M}}_{0,n}$. As a notable consequence of this property, using the global generation of sheaves of coinvariants by Fakhruddin, we refine Knudsen's construction by describing every possible contraction of $\overline{\rm{M}}_{0,n}$ over it. An application of this refinement is given, and its potential for generalization is discussed. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 28 pages, Comments are welcome!

MSC Class: 14H10; 17B69 (primary); 81R10 (secondary)

arXiv:2404.09182 [pdf, other]

doi 10.1103/PhysRevLett.132.206401

Coexistence of interacting charge density waves in a layered semiconductor

Authors: B. Q. Lv, Alfred Zong, Dong Wu, Zhengwei Nie, Yifan Su, Dongsung Choi, Batyr Ilyas, Bryan T. Fichera, Jiarui Li, Edoardo Baldini, Masataka Mogi, Y. -B. Huang, Hoi Chun Po, Sheng Meng, Yao Wang, N. L. Wang, Nuh Gedik

Abstract: Coexisting orders are key features of strongly correlated materials and underlie many intriguing phenomena from unconventional superconductivity to topological orders. Here, we report the coexistence of two interacting charge-density-wave (CDW) orders in EuTe4, a layered crystal that has drawn considerable attention owing to its anomalous thermal hysteresis and a semiconducting CDW state despite t… ▽ More Coexisting orders are key features of strongly correlated materials and underlie many intriguing phenomena from unconventional superconductivity to topological orders. Here, we report the coexistence of two interacting charge-density-wave (CDW) orders in EuTe4, a layered crystal that has drawn considerable attention owing to its anomalous thermal hysteresis and a semiconducting CDW state despite the absence of perfect FS nesting. By accessing unoccupied conduction bands with time- and angle-resolved photoemission measurements, we find that mono- and bi-layers of Te in the unit cell host different CDWs that are associated with distinct energy gaps. The two gaps display dichotomous evolutions following photoexcitation, where the larger bilayer CDW gap exhibits less renormalization and faster recovery. Surprisingly, the CDW in the Te monolayer displays an additional momentum-dependent gap renormalization that cannot be captured by density-functional theory calculations. This phenomenon is attributed to interlayer interactions between the two CDW orders, which account for the semiconducting nature of the equilibrium state. Our findings not only offer microscopic insights into the correlated ground state of EuTe4 but also provide a general non-equilibrium approach to understand coexisting, layer-dependent orders in a complex system. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: To appear in PRL

Journal ref: Physical Review Letters 132, 206401 (2024)

arXiv:2404.06764 [pdf]

A mid-infrared Brillouin laser using ultra-high-Q on-chip resonators

Authors: Kiyoung Ko, Daewon Suk, Dohyeong Kim, Soobong Park, Betul Sen, Dae-Gon Kim, Yingying Wang, Shixun Dai, Xunsi Wang, Rongping Wang, Byung Jae Chun, Kwang-Hoon Ko, Peter T. Rakich, Duk-Yong Choi, Hansuek Lee

Abstract: Ultra-high-Q optical resonators have facilitated recent advancements in on-chip photonics by effectively harnessing nonlinear phenomena providing useful functionalities. While these breakthroughs, primarily focused on the near-infrared region, have extended interest to longer wavelengths holding importance for monitoring and manipulating molecules, the absence of ultra-high-Q resonators in this re… ▽ More Ultra-high-Q optical resonators have facilitated recent advancements in on-chip photonics by effectively harnessing nonlinear phenomena providing useful functionalities. While these breakthroughs, primarily focused on the near-infrared region, have extended interest to longer wavelengths holding importance for monitoring and manipulating molecules, the absence of ultra-high-Q resonators in this region remains a significant challenge. Here, we have developed on-chip microresonators with a remarkable Q-factor of 38 million, surpassing previous mid-infrared records by over 30 times. Employing innovative fabrication techniques, including the spontaneous formation of light-guiding geometries during material deposition, resonators with internal multilayer structures have been seamlessly created and passivated with chalcogenide glasses within a single chamber. Major loss factors, especially airborne-chemical absorption, were thoroughly investigated and mitigated by extensive optimization of resonator geometries and fabrication procedures. This allowed us to access the fundamental loss performance offered by doubly purified chalcogenide glass sources, as demonstrated in their fiber form. Exploiting this ultra-high-Q resonator, we successfully demonstrated Brillouin lasing on a chip for the first time in the mid-infrared, with a threshold power of 91.9 μW and a theoretical Schawlow-Townes linewidth of 83.45 Hz, far surpassing carrier phase noise. Our results showcase the effective integration of cavity-enhanced optical nonlinearities into on-chip mid-infrared photonics. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 10 pages, 5 figures in main script, and 1 figure in methods

arXiv:2404.06621 [pdf, other]

What is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models

Authors: Jeongrok Yu, Seong Ug Kim, Jacob Choi, Jinho D. Choi

Abstract: Bias is a disproportionate prejudice in favor of one side against another. Due to the success of transformer-based Masked Language Models (MLMs) and their impact on many NLP tasks, a systematic evaluation of bias in these models is needed more than ever. While many studies have evaluated gender bias in English MLMs, only a few works have been conducted for the task in other languages. This paper p… ▽ More Bias is a disproportionate prejudice in favor of one side against another. Due to the success of transformer-based Masked Language Models (MLMs) and their impact on many NLP tasks, a systematic evaluation of bias in these models is needed more than ever. While many studies have evaluated gender bias in English MLMs, only a few works have been conducted for the task in other languages. This paper proposes a multilingual approach to estimate gender bias in MLMs from 5 languages: Chinese, English, German, Portuguese, and Spanish. Unlike previous work, our approach does not depend on parallel corpora coupled with English to detect gender bias in other languages using multilingual lexicons. Moreover, a novel model-based method is presented to generate sentence pairs for a more robust analysis of gender bias, compared to the traditional lexicon-based method. For each language, both the lexicon-based and model-based methods are applied to create two datasets respectively, which are used to evaluate gender bias in an MLM specifically trained for that language using one existing and 3 new scoring metrics. Our results show that the previous approach is data-sensitive and not stable as it does not remove contextual dependencies irrelevant to gender. In fact, the results often flip when different scoring metrics are used on the same dataset, suggesting that gender bias should be studied on a large dataset using multiple evaluation metrics for best practice. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.00676 [pdf, other]

OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos

Authors: Dongyoung Choi, Hyeonjoong Jang, Min H. Kim

Abstract: Omnidirectional cameras are extensively used in various applications to provide a wide field of vision. However, they face a challenge in synthesizing novel views due to the inevitable presence of dynamic objects, including the photographer, in their wide field of view. In this paper, we introduce a new approach called Omnidirectional Local Radiance Fields (OmniLocalRF) that can render static-only… ▽ More Omnidirectional cameras are extensively used in various applications to provide a wide field of vision. However, they face a challenge in synthesizing novel views due to the inevitable presence of dynamic objects, including the photographer, in their wide field of view. In this paper, we introduce a new approach called Omnidirectional Local Radiance Fields (OmniLocalRF) that can render static-only scene views, removing and inpainting dynamic objects simultaneously. Our approach combines the principles of local radiance fields with the bidirectional optimization of omnidirectional rays. Our input is an omnidirectional video, and we evaluate the mutual observations of the entire angle between the previous and current frames. To reduce ghosting artifacts of dynamic objects and inpaint occlusions, we devise a multi-resolution motion mask prediction module. Unlike existing methods that primarily separate dynamic components through the temporal domain, our method uses multi-resolution neural feature planes for precise segmentation, which is more suitable for long 360-degree videos. Our experiments validate that OmniLocalRF outperforms existing methods in both qualitative and quantitative metrics, especially in scenarios with complex real-world scenes. In particular, our approach eliminates the need for manual interaction, such as drawing motion masks by hand and additional pose estimation, making it a highly effective and efficient solution. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2404.00376 [pdf, other]

Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks

Authors: Hyunjae Kim, Hyeon Hwang, Jiwoo Lee, Sihyeon Park, Dain Kim, Taewhoo Lee, Chanwoong Yoon, Jiwoong Sohn, Donghee Choi, Jaewoo Kang

Abstract: While recent advancements in commercial large language models (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving co… ▽ More While recent advancements in commercial large language models (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving complex medical problems. To address this, we introduce Meerkat, a new family of medical AI systems ranging from 7 to 70 billion parameters. The models were trained using our new synthetic dataset consisting of high-quality chain-of-thought reasoning paths sourced from 18 medical textbooks, along with diverse instruction-following datasets. Our systems achieved remarkable accuracy across six medical benchmarks, surpassing the previous best models such as MediTron and BioMistral, and GPT-3.5 by a large margin. Notably, Meerkat-7B surpassed the passing threshold of the United States Medical Licensing Examination (USMLE) for the first time for a 7B-parameter model, while Meerkat-70B outperformed GPT-4 by an average of 1.3%. Additionally, Meerkat-70B correctly diagnosed 21 out of 38 complex clinical cases, outperforming humans' 13.8 and closely matching GPT-4's 21.8. Our systems offered more detailed free-form responses to clinical queries compared to existing small models, approaching the performance level of large commercial models. This significantly narrows the performance gap with large LMs, showcasing its effectiveness in addressing complex medical challenges. △ Less

Submitted 30 June, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: Added new LLaMA-3-based models and experiments on NEJM case challenges

arXiv:2403.15714 [pdf, ps, other]

Analytic asymptotic formulas for effective parameters of planar elastic composites

Authors: Daehee Cho, Doosung Choi, Mikyoung Lim

Abstract: We investigate the effective elastic properties of periodic dilute two-phase composites consisting of an homogeneous isotropic matrix and a periodic array of rigid inclusions. We assume the rigid inclusion in a unit cell is a simply connected, bounded domain so that there exists an exterior conformal mapping corresponding the inclusion. Recently, an analytical series solution method for the elasti… ▽ More We investigate the effective elastic properties of periodic dilute two-phase composites consisting of an homogeneous isotropic matrix and a periodic array of rigid inclusions. We assume the rigid inclusion in a unit cell is a simply connected, bounded domain so that there exists an exterior conformal mapping corresponding the inclusion. Recently, an analytical series solution method for the elastic problem with a rigid inclusion was developed based on the layer potential technique and the geometric function theory \cite{Mattei:2021:EAS}. In this paper, by using the series solution method, we derive expression formulas for the elastic moment tensors--the coefficients of the multipole expansion associated with an elastic inclusion--of an inclusion of arbitrary shape. These formulas for the elastic moment tensors lead us to analytic asymptotic formulas for the effective parameters of the periodic elastic composites with rigid inclusions in terms of the associated exterior conformal mapping. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.15713 [pdf, ps, other]

Geometric series solution for the plane elastostatic problem in the presence of a cavity

Authors: Daehee Cho, Doosung Choi, Mikyoung Lim

Abstract: This paper presents an analytic series solution method for the elastic inclusion problem in a two-dimensional unbounded isotropic medium with a cavity. Generalizing the work of Mattei and Lim \cite{Mattei:2021:EAS}, this study develops an analytic series solution method for the elastic inclusion problem to encompass a cavity problem. The central mathematical challenge tackled in this research is t… ▽ More This paper presents an analytic series solution method for the elastic inclusion problem in a two-dimensional unbounded isotropic medium with a cavity. Generalizing the work of Mattei and Lim \cite{Mattei:2021:EAS}, this study develops an analytic series solution method for the elastic inclusion problem to encompass a cavity problem. The central mathematical challenge tackled in this research is to deal with the conormal derivative condition. By using the complex-variable formulation for the conormal derivative, we effectively deal with the boundary condition and derive an explicit series solution for the plane elastostatic problem with a cavity of arbitrary shape subject to arbitrary far-field loading. The solution is expressed as a series expansion in terms of the given far-field loading and the exterior conformal mapping associated with the cavity. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.14110 [pdf, other]

Heuristic Algorithm-based Action Masking Reinforcement Learning (HAAM-RL) with Ensemble Inference Method

Authors: Kyuwon Choi, Cheolkyun Rho, Taeyoun Kim, Daewoo Choi

Abstract: This paper presents a novel reinforcement learning (RL) approach called HAAM-RL (Heuristic Algorithm-based Action Masking Reinforcement Learning) for optimizing the color batching re-sequencing problem in automobile painting processes. The existing heuristic algorithms have limitations in adequately reflecting real-world constraints and accurately predicting logistics performance. Our methodology… ▽ More This paper presents a novel reinforcement learning (RL) approach called HAAM-RL (Heuristic Algorithm-based Action Masking Reinforcement Learning) for optimizing the color batching re-sequencing problem in automobile painting processes. The existing heuristic algorithms have limitations in adequately reflecting real-world constraints and accurately predicting logistics performance. Our methodology incorporates several key techniques including a tailored Markov Decision Process (MDP) formulation, reward setting including Potential-Based Reward Shaping, action masking using heuristic algorithms (HAAM-RL), and an ensemble inference method that combines multiple RL models. The RL agent is trained and evaluated using FlexSim, a commercial 3D simulation software, integrated with our RL MLOps platform BakingSoDA. Experimental results across 30 scenarios demonstrate that HAAM-RL with an ensemble inference method achieves a 16.25% performance improvement over the conventional heuristic algorithm, with stable and consistent results. The proposed approach exhibits superior performance and generalization capability, indicating its effectiveness in optimizing complex manufacturing processes. The study also discusses future research directions, including alternative state representations, incorporating model-based RL methods, and integrating additional real-world constraints. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 7 pages, 8 figures

arXiv:2403.06252 [pdf, other]

Demystifying Tacit Knowledge in Graphic Design: Characteristics, Instances, Approaches, and Guidelines

Authors: Kihoon Son, DaEun Choi, Tae Soo Kim, Juho Kim

Abstract: Despite the growing demand for professional graphic design knowledge, the tacit nature of design inhibits knowledge sharing. However, there is a limited understanding on the characteristics and instances of tacit knowledge in graphic design. In this work, we build a comprehensive set of tacit knowledge characteristics through a literature review. Through interviews with 10 professional graphic des… ▽ More Despite the growing demand for professional graphic design knowledge, the tacit nature of design inhibits knowledge sharing. However, there is a limited understanding on the characteristics and instances of tacit knowledge in graphic design. In this work, we build a comprehensive set of tacit knowledge characteristics through a literature review. Through interviews with 10 professional graphic designers, we collected 123 tacit knowledge instances and labeled their characteristics. By qualitatively coding the instances, we identified the prominent elements, actions, and purposes of tacit knowledge. To identify which instances have been addressed the least, we conducted a systematic literature review of prior system support to graphic design. By understanding the reasons for the lack of support on these instances based on their characteristics, we propose design guidelines for capturing and applying tacit knowledge in design tools. This work takes a step towards understanding tacit knowledge, and how this knowledge can be communicated. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.03082 [pdf, other]

Recall-Oriented Continual Learning with Generative Adversarial Meta-Model

Authors: Haneol Kang, Dong-Wan Choi

Abstract: The stability-plasticity dilemma is a major challenge in continual learning, as it involves balancing the conflicting objectives of maintaining performance on previous tasks while learning new tasks. In this paper, we propose the recall-oriented continual learning framework to address this challenge. Inspired by the human brain's ability to separate the mechanisms responsible for stability and pla… ▽ More The stability-plasticity dilemma is a major challenge in continual learning, as it involves balancing the conflicting objectives of maintaining performance on previous tasks while learning new tasks. In this paper, we propose the recall-oriented continual learning framework to address this challenge. Inspired by the human brain's ability to separate the mechanisms responsible for stability and plasticity, our framework consists of a two-level architecture where an inference network effectively acquires new knowledge and a generative network recalls past knowledge when necessary. In particular, to maximize the stability of past knowledge, we investigate the complexity of knowledge depending on different representations, and thereby introducing generative adversarial meta-model (GAMM) that incrementally learns task-specific parameters instead of input data samples of the task. Through our experiments, we show that our framework not only effectively learns new knowledge without any disruption but also achieves high stability of previous knowledge in both task-aware and task-agnostic learning scenarios. Our code is available at: https://github.com/bigdata-inha/recall-oriented-cl-framework. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: Accepted in AAAI-2024 (Oral presentation)

arXiv:2402.14340 [pdf, other]

doi 10.1016/j.imavis.2024.105110

TIE-KD: Teacher-Independent and Explainable Knowledge Distillation for Monocular Depth Estimation

Authors: Sangwon Choi, Daejune Choi, Duksu Kim

Abstract: Monocular depth estimation (MDE) is essential for numerous applications yet is impeded by the substantial computational demands of accurate deep learning models. To mitigate this, we introduce a novel Teacher-Independent Explainable Knowledge Distillation (TIE-KD) framework that streamlines the knowledge transfer from complex teacher models to compact student networks, eliminating the need for arc… ▽ More Monocular depth estimation (MDE) is essential for numerous applications yet is impeded by the substantial computational demands of accurate deep learning models. To mitigate this, we introduce a novel Teacher-Independent Explainable Knowledge Distillation (TIE-KD) framework that streamlines the knowledge transfer from complex teacher models to compact student networks, eliminating the need for architectural similarity. The cornerstone of TIE-KD is the Depth Probability Map (DPM), an explainable feature map that interprets the teacher's output, enabling feature-based knowledge distillation solely from the teacher's response. This approach allows for efficient student learning, leveraging the strengths of feature-based distillation. Extensive evaluation of the KITTI dataset indicates that TIE-KD not only outperforms conventional response-based KD methods but also demonstrates consistent efficacy across diverse teacher and student architectures. The robustness and adaptability of TIE-KD underscore its potential for applications requiring efficient and interpretable models, affirming its practicality for real-world deployment. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 13 pages, 8 figures, under review for a journal

Journal ref: Image and Vision Computing, 148 (2024), 105110

arXiv:2402.12821 [pdf, other]

Identifying Factual Inconsistencies in Summaries: Grounding Model Inference via Task Taxonomy

Authors: Liyan Xu, Zhenlin Su, Mo Yu, Jin Xu, Jinho D. Choi, Jie Zhou, Fei Liu

Abstract: Factual inconsistencies pose a significant hurdle for the faithful summarization by generative models. While a major direction to enhance inconsistency detection is to derive stronger Natural Language Inference (NLI) models, we propose an orthogonal aspect that underscores the importance of incorporating task-specific taxonomy into the inference. To this end, we consolidate key error types of inco… ▽ More Factual inconsistencies pose a significant hurdle for the faithful summarization by generative models. While a major direction to enhance inconsistency detection is to derive stronger Natural Language Inference (NLI) models, we propose an orthogonal aspect that underscores the importance of incorporating task-specific taxonomy into the inference. To this end, we consolidate key error types of inconsistent facts in summaries, and incorporate them to facilitate both the zero-shot and supervised paradigms of LLMs. Extensive experiments on ten datasets of five distinct domains suggest that, zero-shot LLM inference could benefit from the explicit solution space depicted by the error type taxonomy, and achieves state-of-the-art performance overall, surpassing specialized non-LLM baselines, as well as recent LLM baselines. We further distill models that fuse the taxonomy into parameters through our designed prompt completions and supervised training strategies, efficiently substituting state-of-the-art zero-shot inference with much larger LLMs. △ Less

Submitted 19 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.12406 [pdf, other]

Teacher as a Lenient Expert: Teacher-Agnostic Data-Free Knowledge Distillation

Authors: Hyunjune Shin, Dong-Wan Choi

Abstract: Data-free knowledge distillation (DFKD) aims to distill pretrained knowledge to a student model with the help of a generator without using original data. In such data-free scenarios, achieving stable performance of DFKD is essential due to the unavailability of validation data. Unfortunately, this paper has discovered that existing DFKD methods are quite sensitive to different teacher models, occa… ▽ More Data-free knowledge distillation (DFKD) aims to distill pretrained knowledge to a student model with the help of a generator without using original data. In such data-free scenarios, achieving stable performance of DFKD is essential due to the unavailability of validation data. Unfortunately, this paper has discovered that existing DFKD methods are quite sensitive to different teacher models, occasionally showing catastrophic failures of distillation, even when using well-trained teacher models. Our observation is that the generator in DFKD is not always guaranteed to produce precise yet diverse samples using the existing representative strategy of minimizing both class-prior and adversarial losses. Through our empirical study, we focus on the fact that class-prior not only decreases the diversity of generated samples, but also cannot completely address the problem of generating unexpectedly low-quality samples depending on teacher models. In this paper, we propose the teacher-agnostic data-free knowledge distillation (TA-DFKD) method, with the goal of more robust and stable performance regardless of teacher models. Our basic idea is to assign the teacher model a lenient expert role for evaluating samples, rather than a strict supervisor that enforces its class-prior on the generator. Specifically, we design a sample selection approach that takes only clean samples verified by the teacher model without imposing restrictions on the power of generating diverse samples. Through extensive experiments, we show that our method successfully achieves both robustness and training stability across various teacher models, while outperforming the existing DFKD methods. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: Accepted in AAAI-2024

arXiv:2402.11761 [pdf, ps, other]

The number of automorphic representations of $\mathrm{GL}_2$ with exceptional eigenvalues

Authors: Dohoon Choi, Min Lee, Youngmin Lee, Subong Lim

Abstract: We obtain an upper bound for the dimension of the cuspidal automorphic forms for $\mathrm{GL}_2$ over a number field, whose archimedean local representations are not tempered. More precisely, we prove the following result. Let $F$ be a number field and $\mathbb{A}_{F}$ be the ring of adeles of $F$. Let $\mathcal{O}_{F}$ be the ring of integers of $F$. Let $\mathfrak{X}_{F,\mathrm{ex}}$ be the se… ▽ More We obtain an upper bound for the dimension of the cuspidal automorphic forms for $\mathrm{GL}_2$ over a number field, whose archimedean local representations are not tempered. More precisely, we prove the following result. Let $F$ be a number field and $\mathbb{A}_{F}$ be the ring of adeles of $F$. Let $\mathcal{O}_{F}$ be the ring of integers of $F$. Let $\mathfrak{X}_{F,\mathrm{ex}}$ be the set of irreducible cuspidal automorphic representations $π$ of $\mathrm{GL}_2(\mathbb{A}_{F})$ with the trivial central character such that for each archimedean place $v$ of $F$, the local representation of $π$ at $v$ is an unramified principal series and is not tempered. For an ideal $J$ of $\mathcal{O}_{F}$, let $\mathrm{K}_{0}(J)$ be the subgroup of $\mathrm{GL}_2(\mathbb{A}_{F})$ corresponding to $Γ_0(J) \subset \mathrm{SL}_2(\mathcal{O}_F)$. Let $r_1$ be the number of real embeddings of $F$ and $r_2$ be the number of conjugate pairs of complex embeddings of $F$. Using the Arthur-Selberg trace formula, we have \begin{equation*} \sum_{π\in \mathfrak{X}_{F,\mathrm{ex}}} \dim π^{\mathrm{K}_0(J)} \ll_{F} \frac{[\mathrm{SL}_2(\mathcal{O}_{F}) : Γ_0(J)]}{(\log (N_{F/\mathbb{Q}}(J)))^{2r_1+3r_2}} \quad \text{ as } \quad |N_{F/\mathbb{Q}}(J)|\to \infty. \end{equation*} From this result, we obtain the result on an upper bound for the number of Hecke-Maass cusp forms of weight $0$ on $Γ_0(N)$ which do not satisfy the Selberg eigenvalue conjecture. △ Less

Submitted 18 February, 2024; originally announced February 2024.

MSC Class: 11F72 (Primary); 11F12 (Secondary)

arXiv:2402.01280 [pdf, other]

In-gap states induced by magnetic impurities on wide-band s-wave superconductors: self-consistent calculations

Authors: Divya Jyoti, Deung-Jang Choi, Nicolas Lorente

Abstract: The role of self-consistency in Bogoliubov-de Gennes equations is frequently underestimated in the investigation of in-gap states created by magnetic impurities in s-wave superconductors. Our research focuses on the impact of self-consistency on the in-gap states produced by magnetic stuctures on superconductors, specifically evaluating the density of states, the in-gap bands, and their topologica… ▽ More The role of self-consistency in Bogoliubov-de Gennes equations is frequently underestimated in the investigation of in-gap states created by magnetic impurities in s-wave superconductors. Our research focuses on the impact of self-consistency on the in-gap states produced by magnetic stuctures on superconductors, specifically evaluating the density of states, the in-gap bands, and their topological attributes. Here, we show results ranging from single impurity to finite chains, and infinite ferromagnetic spin chains in wide-band s-wave superconductors. These results show that the order parameter contains important information regarding quantum phase transitions and their topological nature, underscoring the importance of self-consistency in such studies. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2402.00644 [pdf, other]

Two molecular devices for superconducting spintronics

Authors: Cristina Mier, Alex Fétida, Roberto Robles, Parmenio Boronat, Divya Jyoti, Nicolás Lorente, Laurent Limot, Deung-Jang Choi

Abstract: We create two molecular devices with superconducting junctions, using nickelocene molecules, single Fe atoms, and Pb electrodes at low temperature. We find contrasting behavior based on the coordination of the Fe atom: one device shows low-bias features in its differential conductance due to the superposition of multiple Andreev reflections (MAR) and Fe-induced in-gap states. The other reveals int… ▽ More We create two molecular devices with superconducting junctions, using nickelocene molecules, single Fe atoms, and Pb electrodes at low temperature. We find contrasting behavior based on the coordination of the Fe atom: one device shows low-bias features in its differential conductance due to the superposition of multiple Andreev reflections (MAR) and Fe-induced in-gap states. The other reveals interference between MAR and in-gap states, showcasing the diversity achievable in atomically engineered devices with identical components. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.15471 [pdf, other]

ConvoSense: Overcoming Monotonous Commonsense Inferences for Conversational AI

Authors: Sarah E. Finch, Jinho D. Choi

Abstract: Mastering commonsense understanding and reasoning is a pivotal skill essential for conducting engaging conversations. While there have been several attempts to create datasets that facilitate commonsense inferences in dialogue contexts, existing datasets tend to lack in-depth details, restate information already present in the conversation, and often fail to capture the multifaceted nature of comm… ▽ More Mastering commonsense understanding and reasoning is a pivotal skill essential for conducting engaging conversations. While there have been several attempts to create datasets that facilitate commonsense inferences in dialogue contexts, existing datasets tend to lack in-depth details, restate information already present in the conversation, and often fail to capture the multifaceted nature of commonsense reasoning. In response to these limitations, we compile a new synthetic dataset for commonsense reasoning in dialogue contexts using GPT, ConvoSense, that boasts greater contextual novelty, offers a higher volume of inferences per example, and substantially enriches the detail conveyed by the inferences. Our dataset contains over 500,000 inferences across 12,000 dialogues with 10 popular inference types, which empowers the training of generative commonsense models for dialogue that are superior in producing plausible inferences with high novelty when compared to models trained on the previous datasets. To the best of our knowledge, ConvoSense is the first of its kind to provide such a multitude of novel inferences at such a large scale. △ Less

Submitted 27 January, 2024; originally announced January 2024.

Comments: accepted to TACL 2024; final author's version of paper; pre-MIT Press publication version

arXiv:2312.15514 [pdf, other]

Towards Reliable AI Model Deployments: Multiple Input Mixup for Out-of-Distribution Detection

Authors: Dasol Choi, Dongbin Na

Abstract: Recent remarkable success in the deep-learning industries has unprecedentedly increased the need for reliable model deployment. For example, the model should alert the user if the produced model outputs might not be reliable. Previous studies have proposed various methods to solve the Out-of-Distribution (OOD) detection problem, however, they generally require a burden of resources. In this work,… ▽ More Recent remarkable success in the deep-learning industries has unprecedentedly increased the need for reliable model deployment. For example, the model should alert the user if the produced model outputs might not be reliable. Previous studies have proposed various methods to solve the Out-of-Distribution (OOD) detection problem, however, they generally require a burden of resources. In this work, we propose a novel and simple method, Multiple Input Mixup (MIM). Our method can help improve the OOD detection performance with only single epoch fine-tuning. Our method does not require training the model from scratch and can be attached to the classifier simply. Despite its simplicity, our MIM shows competitive performance. Our method can be suitable for various environments because our method only utilizes the In-Distribution (ID) samples to generate the synthesized OOD data. With extensive experiments with CIFAR10 and CIFAR100 benchmarks that have been largely adopted in out-of-distribution detection fields, we have demonstrated our MIM shows comprehensively superior performance compared to the SOTA method. Especially, our method does not need additional computation on the feature vectors compared to the previous studies. All source codes are publicly available at https://github.com/ndb796/MultipleInputMixup. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Comments: Accepted to the AAAI 2024 Workshop on Deployable AI (DAI)

arXiv:2312.15449 [pdf, other]

iDet3D: Towards Efficient Interactive Object Detection for LiDAR Point Clouds

Authors: Dongmin Choi, Wonwoo Cho, Kangyeol Kim, Jaegul Choo

Abstract: Accurately annotating multiple 3D objects in LiDAR scenes is laborious and challenging. While a few previous studies have attempted to leverage semi-automatic methods for cost-effective bounding box annotation, such methods have limitations in efficiently handling numerous multi-class objects. To effectively accelerate 3D annotation pipelines, we propose iDet3D, an efficient interactive 3D object… ▽ More Accurately annotating multiple 3D objects in LiDAR scenes is laborious and challenging. While a few previous studies have attempted to leverage semi-automatic methods for cost-effective bounding box annotation, such methods have limitations in efficiently handling numerous multi-class objects. To effectively accelerate 3D annotation pipelines, we propose iDet3D, an efficient interactive 3D object detector. Supporting a user-friendly 2D interface, which can ease the cognitive burden of exploring 3D space to provide click interactions, iDet3D enables users to annotate the entire objects in each scene with minimal interactions. Taking the sparse nature of 3D point clouds into account, we design a negative click simulation (NCS) to improve accuracy by reducing false-positive predictions. In addition, iDet3D incorporates two click propagation techniques to take full advantage of user interactions: (1) dense click guidance (DCG) for keeping user-provided information throughout the network and (2) spatial click propagation (SCP) for detecting other instances of the same class based on the user-specified objects. Through our extensive experiments, we present that our method can construct precise annotations in a few clicks, which shows the practicality as an efficient annotation tool for 3D object detection. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024

arXiv:2312.14129 [pdf, other]

WellFactor: Patient Profiling using Integrative Embedding of Healthcare Data

Authors: Dongjin Choi, Andy Xiang, Ozgur Ozturk, Deep Shrestha, Barry Drake, Hamid Haidarian, Faizan Javed, Haesun Park

Abstract: In the rapidly evolving healthcare industry, platforms now have access to not only traditional medical records, but also diverse data sets encompassing various patient interactions, such as those from healthcare web portals. To address this rich diversity of data, we introduce WellFactor: a method that derives patient profiles by integrating information from these sources. Central to our approach… ▽ More In the rapidly evolving healthcare industry, platforms now have access to not only traditional medical records, but also diverse data sets encompassing various patient interactions, such as those from healthcare web portals. To address this rich diversity of data, we introduce WellFactor: a method that derives patient profiles by integrating information from these sources. Central to our approach is the utilization of constrained low-rank approximation. WellFactor is optimized to handle the sparsity that is often inherent in healthcare data. Moreover, by incorporating task-specific label information, our method refines the embedding results, offering a more informed perspective on patients. One important feature of WellFactor is its ability to compute embeddings for new, previously unobserved patient data instantaneously, eliminating the need to revisit the entire data set or recomputing the embedding. Comprehensive evaluations on real-world healthcare data demonstrate WellFactor's effectiveness. It produces better results compared to other existing methods in classification performance, yields meaningful clustering of patients, and delivers consistent results in patient similarity searches and predictions. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 2023 IEEE International Conference on Big Data (IEEE BigData 2023)

arXiv:2312.11949 [pdf, other]

CreativeConnect: Supporting Reference Recombination for Graphic Design Ideation with Generative AI

Authors: DaEun Choi, Sumin Hong, Jeongeon Park, John Joon Young Chung, Juho Kim

Abstract: Graphic designers often get inspiration through the recombination of references. Our formative study (N=6) reveals that graphic designers focus on conceptual keywords during this process, and want support for discovering the keywords, expanding them, and exploring diverse recombination options of them, while still having room for designers' creativity. We propose CreativeConnect, a system with gen… ▽ More Graphic designers often get inspiration through the recombination of references. Our formative study (N=6) reveals that graphic designers focus on conceptual keywords during this process, and want support for discovering the keywords, expanding them, and exploring diverse recombination options of them, while still having room for designers' creativity. We propose CreativeConnect, a system with generative AI pipelines that helps users discover useful elements from the reference image using keywords, recommends relevant keywords, generates diverse recombination options with user-selected keywords, and shows recombinations as sketches with text descriptions. Our user study (N=16) showed that CreativeConnect helped users discover keywords from the reference and generate multiple ideas based on them, ultimately helping users produce more design ideas with higher self-reported creativity compared to the baseline system without generative pipelines. While CreativeConnect was shown effective in ideation, we discussed how CreativeConnect can be extended to support other types of tasks in creativity support. △ Less

Submitted 6 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.06134 [pdf, other]

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Authors: Dami Choi, Derrick Xin, Hamid Dadkhahi, Justin Gilmer, Ankush Garg, Orhan Firat, Chih-Kuan Yeh, Andrew M. Dai, Behrooz Ghorbani

Abstract: In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance. We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We provide a thorough empirical study and analysis of this method's be… ▽ More In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance. We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We provide a thorough empirical study and analysis of this method's benefits showing that it achieves consistent improvements relative to the performance trade-off profile of standard static weighting. We analyze under what data regimes this method is applicable and show its improvements empirically in neural machine translation (NMT) and multi-lingual language modeling. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2311.11602

A Multi-In-Single-Out Network for Video Frame Interpolation without Optical Flow

Authors: Jaemin Lee, Minseok Seo, Sangwoo Lee, Hyobin Park, Dong-Geol Choi

Abstract: In general, deep learning-based video frame interpolation (VFI) methods have predominantly focused on estimating motion vectors between two input frames and warping them to the target time. While this approach has shown impressive performance for linear motion between two input frames, it exhibits limitations when dealing with occlusions and nonlinear movements. Recently, generative models have be… ▽ More In general, deep learning-based video frame interpolation (VFI) methods have predominantly focused on estimating motion vectors between two input frames and warping them to the target time. While this approach has shown impressive performance for linear motion between two input frames, it exhibits limitations when dealing with occlusions and nonlinear movements. Recently, generative models have been applied to VFI to address these issues. However, as VFI is not a task focused on generating plausible images, but rather on predicting accurate intermediate frames between two given frames, performance limitations still persist. In this paper, we propose a multi-in-single-out (MISO) based VFI method that does not rely on motion vector estimation, allowing it to effectively model occlusions and nonlinear motion. Additionally, we introduce a novel motion perceptual loss that enables MISO-VFI to better capture the spatio-temporal correlations within the video frames. Our MISO-VFI method achieves state-of-the-art results on VFI benchmarks Vimeo90K, Middlebury, and UCF101, with a significant performance gap compared to existing approaches. △ Less

Submitted 4 December, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: Discovering a problem with the manuscript

arXiv:2311.06943 [pdf]

Friction Tubes to Generate Nanobubble Ozone Water with an Increased Half-Life for Virucidal Activity

Authors: Suk-Joo Byun, A-Ram You, Tae Seok Park, Chang-Hee Park, Dae-Hyun Choi, Eun-Hee Jun, Young-Ho Yoo, Taekeun Yoo

Abstract: Nanobubbles and related technologies are expected to be highly utilized in water resource-based industries such as water purification, crops, horticulture, medicine, bio, and sterilization. Ozone, a chemical with high sterilizing power, is known as a natural substance that is reduced to oxygen and water after reacting with pollutants. Ozone water, which is generated by dissolving ozone in water, h… ▽ More Nanobubbles and related technologies are expected to be highly utilized in water resource-based industries such as water purification, crops, horticulture, medicine, bio, and sterilization. Ozone, a chemical with high sterilizing power, is known as a natural substance that is reduced to oxygen and water after reacting with pollutants. Ozone water, which is generated by dissolving ozone in water, has been used in various industrial sectors such as medical care, food, and environment. Due to the unstable molecular state of ozone, however, it is difficult to produce, use, and supply ozone at industrial sites in a stable manner. This study proposed a method for constructing a system that can generate high-concentration ozone water in large quantities using low power in real time and maintaining the concentration of the generated ozone water over the long term. Friction tubes (called 'nanotube') played a key role to generate nanobubble ozone water with an increased half-life for virus killing activity. In addition, the safety of ozone water during its spray into the air was explained, and virucidal activity test cases for the influenza A (H1N1/A/PR8) and COVID-19 (SARS-CoV-2) virus using high-concentration ozone water as well as its technical efficacy were described. △ Less

Submitted 12 November, 2023; originally announced November 2023.

arXiv:2311.05187 [pdf]

Ultrafast all-optical second harmonic wavefront shaping

Authors: A. Sinelnik, S. H. Lam, F. Coviello, S. Klimmer, G. Della Valle, D. -Y. Choi, T. Pertsch, G. Soavi, I. Staude

Abstract: Optical communication can be revolutionized by encoding data into the orbital angular momentum of light beams. However, state-of-the-art approaches for dynamic control of complex optical wavefronts are mainly based on liquid crystal spatial light modulators or miniaturized mirrors, which suffer from intrinsically slow response times. Here, we experimentally realize a hybrid meta-optical system tha… ▽ More Optical communication can be revolutionized by encoding data into the orbital angular momentum of light beams. However, state-of-the-art approaches for dynamic control of complex optical wavefronts are mainly based on liquid crystal spatial light modulators or miniaturized mirrors, which suffer from intrinsically slow response times. Here, we experimentally realize a hybrid meta-optical system that enables complex control of the wavefront of light with pulse-duration limited dynamics. Specifically, by combining ultrafast polarization switching in a WSe2 monolayer with a dielectric metasurface, we demonstrate second harmonic beam deflection and structuring of orbital angular momentum on the femtosecond timescale. Our results pave the way to robust encoding of information for free space optical links, while reaching response times compatible with real-world telecom applications. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.03383 [pdf, other]

Toward Reinforcement Learning-based Rectilinear Macro Placement Under Human Constraints

Authors: Tuyen P. Le, Hieu T. Nguyen, Seungyeol Baek, Taeyoun Kim, Jungwoo Lee, Seongjung Kim, Hyunjin Kim, Misu Jung, Daehoon Kim, Seokyong Lee, Daewoo Choi

Abstract: Macro placement is a critical phase in chip design, which becomes more intricate when involving general rectilinear macros and layout areas. Furthermore, macro placement that incorporates human-like constraints, such as design hierarchy and peripheral bias, has the potential to significantly reduce the amount of additional manual labor required from designers. This study proposes a methodology tha… ▽ More Macro placement is a critical phase in chip design, which becomes more intricate when involving general rectilinear macros and layout areas. Furthermore, macro placement that incorporates human-like constraints, such as design hierarchy and peripheral bias, has the potential to significantly reduce the amount of additional manual labor required from designers. This study proposes a methodology that leverages an approach suggested by Google's Circuit Training (G-CT) to provide a learning-based macro placer that not only supports placing rectilinear cases, but also adheres to crucial human-like design principles. Our experimental results demonstrate the effectiveness of our framework in achieving power-performance-area (PPA) metrics and in obtaining placements of high quality, comparable to those produced with human intervention. Additionally, our methodology shows potential as a generalized model to address diverse macro shapes and layout areas. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: Fast ML for Science @ ICCAD 2023

arXiv:2311.02240 [pdf, other]

Towards Machine Unlearning Benchmarks: Forgetting the Personal Identities in Facial Recognition Systems

Authors: Dasol Choi, Dongbin Na

Abstract: Machine unlearning is a crucial tool for enabling a classification model to forget specific data that are used in the training time. Recently, various studies have presented machine unlearning algorithms and evaluated their methods on several datasets. However, most of the current machine unlearning algorithms have been evaluated solely on traditional computer vision datasets such as CIFAR-10, MNI… ▽ More Machine unlearning is a crucial tool for enabling a classification model to forget specific data that are used in the training time. Recently, various studies have presented machine unlearning algorithms and evaluated their methods on several datasets. However, most of the current machine unlearning algorithms have been evaluated solely on traditional computer vision datasets such as CIFAR-10, MNIST, and SVHN. Furthermore, previous studies generally evaluate the unlearning methods in the class-unlearning setup. Most previous work first trains the classification models and then evaluates the machine unlearning performance of machine unlearning algorithms by forgetting selected image classes (categories) in the experiments. Unfortunately, these class-unlearning settings might not generalize to real-world scenarios. In this work, we propose a machine unlearning setting that aims to unlearn specific instance that contains personal privacy (identity) while maintaining the original task of a given model. Specifically, we propose two machine unlearning benchmark datasets, MUFAC and MUCAC, that are greatly useful to evaluate the performance and robustness of a machine unlearning algorithm. In our benchmark datasets, the original model performs facial feature recognition tasks: face age estimation (multi-class classification) and facial attribute classification (binary class classification), where a class does not depend on any single target subject (personal identity), which can be a realistic setting. Moreover, we also report the performance of the state-of-the-art machine unlearning methods on our proposed benchmark datasets. All the datasets, source codes, and trained models are publicly available at https://github.com/ndb796/MachineUnlearning. △ Less

Submitted 24 December, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

Comments: Accepted to the AAAI 2024 Workshop on Privacy-Preserving Artificial Intelligence (PPAI)

arXiv:2310.16538 [pdf, other]

FedTherapist: Mental Health Monitoring with User-Generated Linguistic Expressions on Smartphones via Federated Learning

Authors: Jaemin Shin, Hyungjun Yoon, Seungjoo Lee, Sungjoon Park, Yunxin Liu, Jinho D. Choi, Sung-Ju Lee

Abstract: Psychiatrists diagnose mental disorders via the linguistic use of patients. Still, due to data privacy, existing passive mental health monitoring systems use alternative features such as activity, app usage, and location via mobile devices. We propose FedTherapist, a mobile mental health monitoring system that utilizes continuous speech and keyboard input in a privacy-preserving way via federated… ▽ More Psychiatrists diagnose mental disorders via the linguistic use of patients. Still, due to data privacy, existing passive mental health monitoring systems use alternative features such as activity, app usage, and location via mobile devices. We propose FedTherapist, a mobile mental health monitoring system that utilizes continuous speech and keyboard input in a privacy-preserving way via federated learning. We explore multiple model designs by comparing their performance and overhead for FedTherapist to overcome the complex nature of on-device language model training on smartphones. We further propose a Context-Aware Language Learning (CALL) methodology to effectively utilize smartphones' large and noisy text for mental health signal sensing. Our IRB-approved evaluation of the prediction of self-reported depression, stress, anxiety, and mood from 46 participants shows higher accuracy of FedTherapist compared with the performance with non-language features, achieving 0.15 AUROC improvement and 8.21% MAE reduction. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted to the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)

arXiv:2310.16318 [pdf, other]

Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder

Authors: Huiwon Jang, Jihoon Tack, Daewon Choi, Jongheon Jeong, Jinwoo Shin

Abstract: Despite its practical importance across a wide range of modalities, recent advances in self-supervised learning (SSL) have been primarily focused on a few well-curated domains, e.g., vision and language, often relying on their domain-specific knowledge. For example, Masked Auto-Encoder (MAE) has become one of the popular architectures in these domains, but less has explored its potential in other… ▽ More Despite its practical importance across a wide range of modalities, recent advances in self-supervised learning (SSL) have been primarily focused on a few well-curated domains, e.g., vision and language, often relying on their domain-specific knowledge. For example, Masked Auto-Encoder (MAE) has become one of the popular architectures in these domains, but less has explored its potential in other modalities. In this paper, we develop MAE as a unified, modality-agnostic SSL framework. In turn, we argue meta-learning as a key to interpreting MAE as a modality-agnostic learner, and propose enhancements to MAE from the motivation to jointly improve its SSL across diverse modalities, coined MetaMAE as a result. Our key idea is to view the mask reconstruction of MAE as a meta-learning task: masked tokens are predicted by adapting the Transformer meta-learner through the amortization of unmasked tokens. Based on this novel interpretation, we propose to integrate two advanced meta-learning techniques. First, we adapt the amortized latent of the Transformer encoder using gradient-based meta-learning to enhance the reconstruction. Then, we maximize the alignment between amortized and adapted latents through task contrastive learning which guides the Transformer encoder to better encode the task-specific knowledge. Our experiment demonstrates the superiority of MetaMAE in the modality-agnostic SSL benchmark (called DABS), significantly outperforming prior baselines. Code is available at https://github.com/alinlab/MetaMAE. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023. The first two authors contributed equally

arXiv:2310.16151 [pdf]

Generation of high concentration nanobubbles based on friction tubes

Authors: Taekeun Yoo, Young-Ho Yoo, Suk-Joo Byun, A-Ram You, Chang-Hee Park, Dae-Hyun Choi, Eun-Hee Jun

Abstract: Nanobubble-related technologies have been confirmed to be useful in various fields such as climate change and the environment as well as water-based industries such as water purification, crops, horticulture, medical care, bio, and sterilization. However, a method of mass production in real time enough to apply nano-bubbles to the industry has not yet been developed. We explored the mechanism of n… ▽ More Nanobubble-related technologies have been confirmed to be useful in various fields such as climate change and the environment as well as water-based industries such as water purification, crops, horticulture, medical care, bio, and sterilization. However, a method of mass production in real time enough to apply nano-bubbles to the industry has not yet been developed. We explored the mechanism of nano-bubble water generation by friction between water and walls and developed a tube device applying the shape of the flow path to maximize the friction in the fluid passing through the flow path. It also describes the case of real-time and low-power mass production of nanobubbles and its technical utility. We found that the friction of nanotubes alone can easily and quickly improve the production of nanobubbles with small particle size in real time; by increasing the shearing pressure while increasing the effective friction constant value, the particle size of nanobubbles can be smaller while increasing the particle concentration. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: 24 pages, 24 figures, 6 tables

arXiv:2310.04313 [pdf, other]

KoMultiText: Large-Scale Korean Text Dataset for Classifying Biased Speech in Real-World Online Services

Authors: Dasol Choi, Jooyoung Song, Eunsun Lee, Jinwoo Seo, Heejune Park, Dongbin Na

Abstract: With the growth of online services, the need for advanced text classification algorithms, such as sentiment analysis and biased text detection, has become increasingly evident. The anonymous nature of online services often leads to the presence of biased and harmful language, posing challenges to maintaining the health of online communities. This phenomenon is especially relevant in South Korea, w… ▽ More With the growth of online services, the need for advanced text classification algorithms, such as sentiment analysis and biased text detection, has become increasingly evident. The anonymous nature of online services often leads to the presence of biased and harmful language, posing challenges to maintaining the health of online communities. This phenomenon is especially relevant in South Korea, where large-scale hate speech detection algorithms have not yet been broadly explored. In this paper, we introduce "KoMultiText", a new comprehensive, large-scale dataset collected from a well-known South Korean SNS platform. Our proposed dataset provides annotations including (1) Preferences, (2) Profanities, and (3) Nine types of Bias for the text samples, enabling multi-task learning for simultaneous classification of user-generated texts. Leveraging state-of-the-art BERT-based language models, our approach surpasses human-level accuracy across diverse classification tasks, as measured by various metrics. Beyond academic contributions, our work can provide practical solutions for real-world hate speech and bias mitigation, contributing directly to the improvement of online community health. Our work provides a robust foundation for future research aiming to improve the quality of online discourse and foster societal well-being. All source codes and datasets are publicly accessible at https://github.com/Dasol-Choi/KoMultiText. △ Less

Submitted 12 November, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: Accepted to the NeurIPS 2023 Workshop on Socially Responsible Language Modelling Research (SoLaR)

Showing 1–50 of 344 results for author: Choi, D