subscribe to arXiv mailings

Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation

Authors: Kangyeol Kim, Wooseok Seo, Sehyun Nam, Bodam Kim, Suhyeon Jeong, Wonwoo Cho, Jaegul Choo, Youngjae Yu

Abstract: Personalized text-to-image (P-T2I) generation aims to create new, text-guided images featuring the personalized subject with a few reference images. However, balancing the trade-off relationship between prompt fidelity and identity preservation remains a critical challenge. To address the issue, we propose a novel P-T2I method called Layout-and-Retouch, consisting of two stages: 1) layout generati… ▽ More Personalized text-to-image (P-T2I) generation aims to create new, text-guided images featuring the personalized subject with a few reference images. However, balancing the trade-off relationship between prompt fidelity and identity preservation remains a critical challenge. To address the issue, we propose a novel P-T2I method called Layout-and-Retouch, consisting of two stages: 1) layout generation and 2) retouch. In the first stage, our step-blended inference utilizes the inherent sample diversity of vanilla T2I models to produce diversified layout images, while also enhancing prompt fidelity. In the second stage, multi-source attention swapping integrates the context image from the first stage with the reference image, leveraging the structure from the context image and extracting visual features from the reference image. This achieves high prompt fidelity while preserving identity characteristics. Through our extensive experiments, we demonstrate that our method generates a wide variety of images with diverse layouts while maintaining the unique identity features of the personalized objects, even with challenging text prompts. This versatility highlights the potential of our framework to handle complex conditions, significantly enhancing the diversity and applicability of personalized image synthesis. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2406.12223 [pdf, other]

ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations

Authors: Yunze Xiao, Yujia Hu, Kenny Tsu Wei Choo, Roy Ka-wei Lee

Abstract: Detecting hate speech and offensive language is essential for maintaining a safe and respectful digital environment. This study examines the limitations of state-of-the-art large language models (LLMs) in identifying offensive content within systematically perturbed data, with a focus on Chinese, a language particularly susceptible to such perturbations. We introduce \textsf{ToxiCloakCN}, an enhan… ▽ More Detecting hate speech and offensive language is essential for maintaining a safe and respectful digital environment. This study examines the limitations of state-of-the-art large language models (LLMs) in identifying offensive content within systematically perturbed data, with a focus on Chinese, a language particularly susceptible to such perturbations. We introduce \textsf{ToxiCloakCN}, an enhanced dataset derived from ToxiCN, augmented with homophonic substitutions and emoji transformations, to test the robustness of LLMs against these cloaking perturbations. Our findings reveal that existing models significantly underperform in detecting offensive content when these perturbations are applied. We provide an in-depth analysis of how different types of offensive content are affected by these perturbations and explore the alignment between human and model explanations of offensiveness. Our work highlights the urgent need for more advanced techniques in offensive language detection to combat the evolving tactics used to evade detection mechanisms. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 10 pages,5 Tables, 2 Figures

arXiv:2406.11135 [pdf, other]

doi 10.1145/3656156.3663694

Towards Understanding Emotions for Engaged Mental Health Conversations

Authors: Kellie Yu Hui Sim, Kohleen Tijing Fortuno, Kenny Tsu Wei Choo

Abstract: Providing timely support and intervention is crucial in mental health settings. As the need to engage youth comfortable with texting increases, mental health providers are exploring and adopting text-based media such as chatbots, community-based forums, online therapies with licensed professionals, and helplines operated by trained responders. To support these text-based media for mental health--p… ▽ More Providing timely support and intervention is crucial in mental health settings. As the need to engage youth comfortable with texting increases, mental health providers are exploring and adopting text-based media such as chatbots, community-based forums, online therapies with licensed professionals, and helplines operated by trained responders. To support these text-based media for mental health--particularly for crisis care--we are developing a system to perform passive emotion-sensing using a combination of keystroke dynamics and sentiment analysis. Our early studies of this system posit that the analysis of short text messages and keyboard typing patterns can provide emotion information that may be used to support both clients and responders. We use our preliminary findings to discuss the way forward for applying AI to support mental health providers in providing better care. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 5 pages, 1 figure, to be published in DIS Companion '24

ACM Class: H.5.2; I.2.7

arXiv:2406.05071 [pdf, other]

Massively Multiagent Minigames for Training Generalist Agents

Authors: Kyoung Whan Choe, Ryan Sullivan, Joseph Suárez

Abstract: We present Meta MMO, a collection of many-agent minigames for use as a reinforcement learning benchmark. Meta MMO is built on top of Neural MMO, a massively multiagent environment that has been the subject of two previous NeurIPS competitions. Our work expands Neural MMO with several computationally efficient minigames. We explore generalization across Meta MMO by learning to play several minigame… ▽ More We present Meta MMO, a collection of many-agent minigames for use as a reinforcement learning benchmark. Meta MMO is built on top of Neural MMO, a massively multiagent environment that has been the subject of two previous NeurIPS competitions. Our work expands Neural MMO with several computationally efficient minigames. We explore generalization across Meta MMO by learning to play several minigames with a single set of weights. We release the environment, baselines, and training code under the MIT license. We hope that Meta MMO will spur additional progress on Neural MMO and, more generally, will serve as a useful benchmark for many-agent generalization. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.20165 [pdf, other]

Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation

Authors: Wooseong Cho, Taehyun Hwang, Joongkyu Lee, Min-hwan Oh

Abstract: We study reinforcement learning with multinomial logistic (MNL) function approximation where the underlying transition probability kernel of the Markov decision processes (MDPs) is parametrized by an unknown transition core with features of state and action. For the finite horizon episodic setting with inhomogeneous state transitions, we propose provably efficient algorithms with randomized explor… ▽ More We study reinforcement learning with multinomial logistic (MNL) function approximation where the underlying transition probability kernel of the Markov decision processes (MDPs) is parametrized by an unknown transition core with features of state and action. For the finite horizon episodic setting with inhomogeneous state transitions, we propose provably efficient algorithms with randomized exploration having frequentist regret guarantees. For our first algorithm, $\texttt{RRL-MNL}$, we adapt optimistic sampling to ensure the optimism of the estimated value function with sufficient frequency and establish that $\texttt{RRL-MNL}$ is both statistically and computationally efficient, achieving a $\tilde{O}(κ^{-1} d^{\frac{3}{2}} H^{\frac{3}{2}} \sqrt{T})$ frequentist regret bound with constant-time computational cost per episode. Here, $d$ is the dimension of the transition core, $H$ is the horizon length, $T$ is the total number of steps, and $κ$ is a problem-dependent constant. Despite the simplicity and practicality of $\texttt{RRL-MNL}$, its regret bound scales with $κ^{-1}$, which is potentially large in the worst case. To improve the dependence on $κ^{-1}$, we propose $\texttt{ORRL-MNL}$, which estimates the value function using local gradient information of the MNL transition model. We show that its frequentist regret bound is $\tilde{O}(d^{\frac{3}{2}} H^{\frac{3}{2}} \sqrt{T} + κ^{-1} d^2 H^2)$. To the best of our knowledge, these are the first randomized RL algorithms for the MNL transition model that achieve both computational and statistical efficiency. Numerical experiments demonstrate the superior performance of the proposed algorithms. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.06754 [pdf, other]

Wall-Street: Smart Surface-Enabled 5G mmWave for Roadside Networking

Authors: Kun Woo Cho, Prasanthi Maddala, Ivan Seskar, Kyle Jamieson

Abstract: 5G mmWave roadside networks promise high-speed wireless connectivity, but face significant challenges in maintaining reliable connections for users moving at high speed. Frequent handovers, complex beam alignment, and signal attenuation due to obstacles like car bodies lead to service interruptions and degraded performance. We present Wall-Street, a smart surface installed on vehicles to enhance 5… ▽ More 5G mmWave roadside networks promise high-speed wireless connectivity, but face significant challenges in maintaining reliable connections for users moving at high speed. Frequent handovers, complex beam alignment, and signal attenuation due to obstacles like car bodies lead to service interruptions and degraded performance. We present Wall-Street, a smart surface installed on vehicles to enhance 5G mmWave connectivity for users inside. Wall-Street improves mobility management by (1) steering outdoor mmWave signals into the vehicle, ensuring coverage for all users; (2) enabling simultaneous serving cell data transfer and candidate handover cell measurement, allowing seamless handovers without service interruption; and (3) combining beams from source and target cells during a handover to increase reliability. Through its flexible and diverse signal manipulation capabilities, Wall-Street provides uninterrupted high-speed connectivity for latency-sensitive applications in challenging mobile environments. We have implemented and integrated Wall-Street in the COSMOS testbed and evaluated its real-time performance with four gNBs and a mobile client inside a surface-enabled vehicle, driving on a nearby road. Wall-Street achieves a 2.5-3.4x TCP throughput improvement and a 0.4-0.8x reduction in delay over a baseline 5G Standalone handover protocol. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 15 pages, 22 figures, under submission

arXiv:2405.01842 [pdf, ps, other]

SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore

Authors: Ri Chi Ng, Nirmalendu Prakash, Ming Shan Hee, Kenny Tsu Wei Choo, Roy Ka-Wei Lee

Abstract: To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native ann… ▽ More To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native annotators. \textsf{SGHateCheck} reveals critical flaws in state-of-the-art models, highlighting their inadequacy in sensitive content moderation. This work aims to foster the development of more effective hate speech detection tools for diverse linguistic environments, particularly for Singapore and Southeast Asia contexts. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.14873 [pdf, ps, other]

Estimating the Distribution of Parameters in Differential Equations with Repeated Cross-Sectional Data

Authors: Hyeontae Jo, Sung Woong Cho, Hyung Ju Hwang

Abstract: Differential equations are pivotal in modeling and understanding the dynamics of various systems, offering insights into their future states through parameter estimation fitted to time series data. In fields such as economy, politics, and biology, the observation data points in the time series are often independently obtained (i.e., Repeated Cross-Sectional (RCS) data). With RCS data, we found tha… ▽ More Differential equations are pivotal in modeling and understanding the dynamics of various systems, offering insights into their future states through parameter estimation fitted to time series data. In fields such as economy, politics, and biology, the observation data points in the time series are often independently obtained (i.e., Repeated Cross-Sectional (RCS) data). With RCS data, we found that traditional methods for parameter estimation in differential equations, such as using mean values of time trajectories or Gaussian Process-based trajectory generation, have limitations in estimating the shape of parameter distributions, often leading to a significant loss of data information. To address this issue, we introduce a novel method, Estimation of Parameter Distribution (EPD), providing accurate distribution of parameters without loss of data information. EPD operates in three main steps: generating synthetic time trajectories by randomly selecting observed values at each time point, estimating parameters of a differential equation that minimize the discrepancy between these trajectories and the true solution of the equation, and selecting the parameters depending on the scale of discrepancy. We then evaluated the performance of EPD across several models, including exponential growth, logistic population models, and target cell-limited models with delayed virus production, demonstrating its superiority in capturing the shape of parameter distributions. Furthermore, we applied EPD to real-world datasets, capturing various shapes of parameter distributions rather than a normal distribution. These results effectively address the heterogeneity within systems, marking a substantial progression in accurately modeling systems using RCS data. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 16 pages, 10 figures

MSC Class: 65L08; 65D17; 68U07

arXiv:2404.11539 [pdf, other]

Evaluating Span Extraction in Generative Paradigm: A Reflection on Aspect-Based Sentiment Analysis

Authors: Soyoung Yang, Won Ik Cho

Abstract: In the era of rapid evolution of generative language models within the realm of natural language processing, there is an imperative call to revisit and reformulate evaluation methodologies, especially in the domain of aspect-based sentiment analysis (ABSA). This paper addresses the emerging challenges introduced by the generative paradigm, which has moderately blurred traditional boundaries betwee… ▽ More In the era of rapid evolution of generative language models within the realm of natural language processing, there is an imperative call to revisit and reformulate evaluation methodologies, especially in the domain of aspect-based sentiment analysis (ABSA). This paper addresses the emerging challenges introduced by the generative paradigm, which has moderately blurred traditional boundaries between understanding and generation tasks. Building upon prevailing practices in the field, we analyze the advantages and shortcomings associated with the prevalent ABSA evaluation paradigms. Through an in-depth examination, supplemented by illustrative examples, we highlight the intricacies involved in aligning generative outputs with other evaluative metrics, specifically those derived from other tasks, including question answering. While we steer clear of advocating for a singular and definitive metric, our contribution lies in paving the path for a comprehensive guideline tailored for ABSA evaluations in this generative paradigm. In this position paper, we aim to provide practitioners with profound reflections, offering insights and directions that can aid in navigating this evolving landscape, ensuring evaluations that are both accurate and reflective of generative capabilities. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 10 pages

arXiv:2404.09041 [pdf, other]

Three Disclaimers for Safe Disclosure: A Cardwriter for Reporting the Use of Generative AI in Writing Process

Authors: Won Ik Cho, Eunjung Cho, Hyeonji Shin

Abstract: Generative artificial intelligence (AI) and large language models (LLMs) are increasingly being used in the academic writing process. This is despite the current lack of unified framework for reporting the use of machine assistance. In this work, we propose "Cardwriter", an intuitive interface that produces a short report for authors to declare their use of generative AI in their writing process.… ▽ More Generative artificial intelligence (AI) and large language models (LLMs) are increasingly being used in the academic writing process. This is despite the current lack of unified framework for reporting the use of machine assistance. In this work, we propose "Cardwriter", an intuitive interface that produces a short report for authors to declare their use of generative AI in their writing process. The demo is available online, at https://cardwriter.vercel.app △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: 6 pages; an implementation version of PaperCard project

arXiv:2404.00405 [pdf, other]

doi 10.1145/3613905.3650786

A Taxonomy for Human-LLM Interaction Modes: An Initial Exploration

Authors: Jie Gao, Simret Araya Gebreegziabher, Kenny Tsu Wei Choo, Toby Jia-Jun Li, Simon Tangi Perrault, Thomas W. Malone

Abstract: With ChatGPT's release, conversational prompting has become the most popular form of human-LLM interaction. However, its effectiveness is limited for more complex tasks involving reasoning, creativity, and iteration. Through a systematic analysis of HCI papers published since 2021, we identified four key phases in the human-LLM interaction flow - planning, facilitating, iterating, and testing - to… ▽ More With ChatGPT's release, conversational prompting has become the most popular form of human-LLM interaction. However, its effectiveness is limited for more complex tasks involving reasoning, creativity, and iteration. Through a systematic analysis of HCI papers published since 2021, we identified four key phases in the human-LLM interaction flow - planning, facilitating, iterating, and testing - to precisely understand the dynamics of this process. Additionally, we have developed a taxonomy of four primary interaction modes: Mode 1: Standard Prompting, Mode 2: User Interface, Mode 3: Context-based, and Mode 4: Agent Facilitator. This taxonomy was further enriched using the "5W1H" guideline method, which involved a detailed examination of definitions, participant roles (Who), the phases that happened (When), human objectives and LLM abilities (What), and the mechanics of each interaction mode (How). We anticipate this taxonomy will contribute to the future design and evaluation of human-LLM interaction. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 11 pages, 4 figures, 3 tables. Accepted at CHI Late-Breaking Work 2024

arXiv:2403.05209 [pdf, other]

Overcoming Data Inequality across Domains with Semi-Supervised Domain Generalization

Authors: Jinha Park, Wonguk Cho, Taesup Kim

Abstract: While there have been considerable advancements in machine learning driven by extensive datasets, a significant disparity still persists in the availability of data across various sources and populations. This inequality across domains poses challenges in modeling for those with limited data, which can lead to profound practical and ethical concerns. In this paper, we address a representative case… ▽ More While there have been considerable advancements in machine learning driven by extensive datasets, a significant disparity still persists in the availability of data across various sources and populations. This inequality across domains poses challenges in modeling for those with limited data, which can lead to profound practical and ethical concerns. In this paper, we address a representative case of data inequality problem across domains termed Semi-Supervised Domain Generalization (SSDG), in which only one domain is labeled while the rest are unlabeled. We propose a novel algorithm, ProUD, which can effectively learn domain-invariant features via domain-aware prototypes along with progressive generalization via uncertainty-adaptive mixing of labeled and unlabeled domains. Our experiments on three different benchmark datasets demonstrate the effectiveness of ProUD, outperforming all baseline models including single domain generalization and semi-supervised learning. Source code will be released upon acceptance of the paper. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 20 pages, 4 figures

arXiv:2402.08187 [pdf, other]

Learning time-dependent PDE via graph neural networks and deep operator network for robust accuracy on irregular grids

Authors: Sung Woong Cho, Jae Yong Lee, Hyung Ju Hwang

Abstract: Scientific computing using deep learning has seen significant advancements in recent years. There has been growing interest in models that learn the operator from the parameters of a partial differential equation (PDE) to the corresponding solutions. Deep Operator Network (DeepONet) and Fourier Neural operator, among other models, have been designed with structures suitable for handling functions… ▽ More Scientific computing using deep learning has seen significant advancements in recent years. There has been growing interest in models that learn the operator from the parameters of a partial differential equation (PDE) to the corresponding solutions. Deep Operator Network (DeepONet) and Fourier Neural operator, among other models, have been designed with structures suitable for handling functions as inputs and outputs, enabling real-time predictions as surrogate models for solution operators. There has also been significant progress in the research on surrogate models based on graph neural networks (GNNs), specifically targeting the dynamics in time-dependent PDEs. In this paper, we propose GraphDeepONet, an autoregressive model based on GNNs, to effectively adapt DeepONet, which is well-known for successful operator learning. GraphDeepONet exhibits robust accuracy in predicting solutions compared to existing GNN-based PDE solver models. It maintains consistent performance even on irregular grids, leveraging the advantages inherited from DeepONet and enabling predictions on arbitrary grids. Additionally, unlike traditional DeepONet and its variants, GraphDeepONet enables time extrapolation for time-dependent PDE solutions. We also provide theoretical analysis of the universal approximation capability of GraphDeepONet in approximating continuous operators across arbitrary time intervals. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: 25 pages, 11 figures

MSC Class: 65D17; 68U07

arXiv:2312.15949 [pdf, other]

HyperDeepONet: learning operator with complex target function space using the limited resources via hypernetwork

Authors: Jae Yong Lee, Sung Woong Cho, Hyung Ju Hwang

Abstract: Fast and accurate predictions for complex physical dynamics are a significant challenge across various applications. Real-time prediction on resource-constrained hardware is even more crucial in real-world problems. The deep operator network (DeepONet) has recently been proposed as a framework for learning nonlinear mappings between function spaces. However, the DeepONet requires many parameters a… ▽ More Fast and accurate predictions for complex physical dynamics are a significant challenge across various applications. Real-time prediction on resource-constrained hardware is even more crucial in real-world problems. The deep operator network (DeepONet) has recently been proposed as a framework for learning nonlinear mappings between function spaces. However, the DeepONet requires many parameters and has a high computational cost when learning operators, particularly those with complex (discontinuous or non-smooth) target functions. This study proposes HyperDeepONet, which uses the expressive power of the hypernetwork to enable the learning of a complex operator with a smaller set of parameters. The DeepONet and its variant models can be thought of as a method of injecting the input function information into the target function. From this perspective, these models can be viewed as a particular case of HyperDeepONet. We analyze the complexity of DeepONet and conclude that HyperDeepONet needs relatively lower complexity to obtain the desired accuracy for operator learning. HyperDeepONet successfully learned various operators with fewer computational resources compared to other benchmarks. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: 26 pages, 13 figures. Published as a conference paper at Eleventh International Conference on Learning Representations (ICLR 2023)

MSC Class: 65D17; 68U07

arXiv:2312.15449 [pdf, other]

iDet3D: Towards Efficient Interactive Object Detection for LiDAR Point Clouds

Authors: Dongmin Choi, Wonwoo Cho, Kangyeol Kim, Jaegul Choo

Abstract: Accurately annotating multiple 3D objects in LiDAR scenes is laborious and challenging. While a few previous studies have attempted to leverage semi-automatic methods for cost-effective bounding box annotation, such methods have limitations in efficiently handling numerous multi-class objects. To effectively accelerate 3D annotation pipelines, we propose iDet3D, an efficient interactive 3D object… ▽ More Accurately annotating multiple 3D objects in LiDAR scenes is laborious and challenging. While a few previous studies have attempted to leverage semi-automatic methods for cost-effective bounding box annotation, such methods have limitations in efficiently handling numerous multi-class objects. To effectively accelerate 3D annotation pipelines, we propose iDet3D, an efficient interactive 3D object detector. Supporting a user-friendly 2D interface, which can ease the cognitive burden of exploring 3D space to provide click interactions, iDet3D enables users to annotate the entire objects in each scene with minimal interactions. Taking the sparse nature of 3D point clouds into account, we design a negative click simulation (NCS) to improve accuracy by reducing false-positive predictions. In addition, iDet3D incorporates two click propagation techniques to take full advantage of user interactions: (1) dense click guidance (DCG) for keeping user-provided information throughout the network and (2) spatial click propagation (SCP) for detecting other instances of the same class based on the user-specified objects. Through our extensive experiments, we present that our method can construct precise annotations in a few clicks, which shows the practicality as an efficient annotation tool for 3D object detection. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024

arXiv:2312.12467 [pdf, other]

Learning Flexible Body Collision Dynamics with Hierarchical Contact Mesh Transformer

Authors: Youn-Yeol Yu, Jeongwhan Choi, Woojin Cho, Kookjin Lee, Nayong Kim, Kiseok Chang, Chang-Seung Woo, Ilho Kim, Seok-Woo Lee, Joon-Young Yang, Sooyoung Yoon, Noseong Park

Abstract: Recently, many mesh-based graph neural network (GNN) models have been proposed for modeling complex high-dimensional physical systems. Remarkable achievements have been made in significantly reducing the solving time compared to traditional numerical solvers. These methods are typically designed to i) reduce the computational cost in solving physical dynamics and/or ii) propose techniques to enhan… ▽ More Recently, many mesh-based graph neural network (GNN) models have been proposed for modeling complex high-dimensional physical systems. Remarkable achievements have been made in significantly reducing the solving time compared to traditional numerical solvers. These methods are typically designed to i) reduce the computational cost in solving physical dynamics and/or ii) propose techniques to enhance the solution accuracy in fluid and rigid body dynamics. However, it remains under-explored whether they are effective in addressing the challenges of flexible body dynamics, where instantaneous collisions occur within a very short timeframe. In this paper, we present Hierarchical Contact Mesh Transformer (HCMT), which uses hierarchical mesh structures and can learn long-range dependencies (occurred by collisions) among spatially distant positions of a body -- two close positions in a higher-level mesh correspond to two distant positions in a lower-level mesh. HCMT enables long-range interactions, and the hierarchical mesh structure quickly propagates collision effects to faraway positions. To this end, it consists of a contact mesh Transformer and a hierarchical mesh Transformer (CMT and HMT, respectively). Lastly, we propose a flexible body dynamics dataset, consisting of trajectories that reflect experimental settings frequently used in the display industry for product designs. We also compare the performance of several baselines using well-known benchmark datasets. Our results show that HCMT provides significant performance improvements over existing methods. Our code is available at https://github.com/yuyudeep/hcmt. △ Less

Submitted 25 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted at ICLR 2024

arXiv:2312.10274 [pdf, other]

Operator-learning-inspired Modeling of Neural Ordinary Differential Equations

Authors: Woojin Cho, Seunghyeon Cho, Hyundong Jin, Jinsung Jeon, Kookjin Lee, Sanghyun Hong, Dongeun Lee, Jonghyun Choi, Noseong Park

Abstract: Neural ordinary differential equations (NODEs), one of the most influential works of the differential equation-based deep learning, are to continuously generalize residual networks and opened a new field. They are currently utilized for various downstream tasks, e.g., image classification, time series classification, image generation, etc. Its key part is how to model the time-derivative of the hi… ▽ More Neural ordinary differential equations (NODEs), one of the most influential works of the differential equation-based deep learning, are to continuously generalize residual networks and opened a new field. They are currently utilized for various downstream tasks, e.g., image classification, time series classification, image generation, etc. Its key part is how to model the time-derivative of the hidden state, denoted dh(t)/dt. People have habitually used conventional neural network architectures, e.g., fully-connected layers followed by non-linear activations. In this paper, however, we present a neural operator-based method to define the time-derivative term. Neural operators were initially proposed to model the differential operator of partial differential equations (PDEs). Since the time-derivative of NODEs can be understood as a special type of the differential operator, our proposed method, called branched Fourier neural operator (BFNO), makes sense. In our experiments with general downstream tasks, our method significantly outperforms existing methods. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.09603 [pdf, other]

Stethoscope-guided Supervised Contrastive Learning for Cross-domain Adaptation on Respiratory Sound Classification

Authors: June-Woo Kim, Sangmin Bae, Won-Yang Cho, Byungjo Lee, Ho-Young Jung

Abstract: Despite the remarkable advances in deep learning technology, achieving satisfactory performance in lung sound classification remains a challenge due to the scarcity of available data. Moreover, the respiratory sound samples are collected from a variety of electronic stethoscopes, which could potentially introduce biases into the trained models. When a significant distribution shift occurs within t… ▽ More Despite the remarkable advances in deep learning technology, achieving satisfactory performance in lung sound classification remains a challenge due to the scarcity of available data. Moreover, the respiratory sound samples are collected from a variety of electronic stethoscopes, which could potentially introduce biases into the trained models. When a significant distribution shift occurs within the test dataset or in a practical scenario, it can substantially decrease the performance. To tackle this issue, we introduce cross-domain adaptation techniques, which transfer the knowledge from a source domain to a distinct target domain. In particular, by considering different stethoscope types as individual domains, we propose a novel stethoscope-guided supervised contrastive learning approach. This method can mitigate any domain-related disparities and thus enables the model to distinguish respiratory sounds of the recording variation of the stethoscope. The experimental results on the ICBHI dataset demonstrate that the proposed methods are effective in reducing the domain dependency and achieving the ICBHI Score of 61.71%, which is a significant improvement of 2.16% over the baseline. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: accepted to ICASSP 2024

arXiv:2311.03736 [pdf, other]

Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning

Authors: Joseph Suárez, Phillip Isola, Kyoung Whan Choe, David Bloomin, Hao Xiang Li, Nikhil Pinnaparaju, Nishaanth Kanna, Daniel Scott, Ryan Sullivan, Rose S. Shuman, Lucas de Alcântara, Herbie Bradley, Louis Castricato, Kirsty You, Yuhao Jiang, Qimai Li, Jiaxin Chen, Xiaolong Zhu

Abstract: Neural MMO 2.0 is a massively multi-agent environment for reinforcement learning research. The key feature of this new version is a flexible task system that allows users to define a broad range of objectives and reward signals. We challenge researchers to train agents capable of generalizing to tasks, maps, and opponents never seen during training. Neural MMO features procedurally generated maps… ▽ More Neural MMO 2.0 is a massively multi-agent environment for reinforcement learning research. The key feature of this new version is a flexible task system that allows users to define a broad range of objectives and reward signals. We challenge researchers to train agents capable of generalizing to tasks, maps, and opponents never seen during training. Neural MMO features procedurally generated maps with 128 agents in the standard setting and support for up to. Version 2.0 is a complete rewrite of its predecessor with three-fold improved performance and compatibility with CleanRL. We release the platform as free and open-source software with comprehensive documentation available at neuralmmo.github.io and an active community Discord. To spark initial research on this new platform, we are concurrently running a competition at NeurIPS 2023. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.11551 [pdf, other]

WaveFlex: A Smart Surface for Private CBRS Wireless Cellular Networks

Authors: Fan Yi, Kun Woo Cho, Yaxiong Xie, Kyle Jamieson

Abstract: We present the design and implementation of WaveFlex, the first smart surface that enhances Private LTE/5G networks operating under the shared-license framework in the Citizens Broadband Radio Service frequency band. WaveFlex works in the presence of frequency diversity: multiple nearby base stations operating on different frequencies, as dictated by a Spectrum Access System coordinator. It also h… ▽ More We present the design and implementation of WaveFlex, the first smart surface that enhances Private LTE/5G networks operating under the shared-license framework in the Citizens Broadband Radio Service frequency band. WaveFlex works in the presence of frequency diversity: multiple nearby base stations operating on different frequencies, as dictated by a Spectrum Access System coordinator. It also handles time dynamism: due to the dynamic sharing rules of the band, base stations occasionally switch channels, especially when priority users enter the network. Finally, WaveFlex operates independently of the network itself, not requiring access to nor modification of the base station or mobile users, yet it remain compliant with and effective on prevailing cellular protocols. We have designed and fabricated WaveFlex on a custom multi-layer PCB, software defined radio-based network monitor, and supporting control software and hardware. Our experimental evaluation benchmarks an operational Private LTE network running at full line rate. Results demonstrate an 8.50 dB average SNR gain, and an average throughput gain of 4.36 Mbps for a single small cell, and 3.19 Mbps for four small cells, in a realistic indoor office scenario. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: 15 pages

arXiv:2310.09528 [pdf, other]

Hypernetwork-based Meta-Learning for Low-Rank Physics-Informed Neural Networks

Authors: Woojin Cho, Kookjin Lee, Donsub Rim, Noseong Park

Abstract: In various engineering and applied science applications, repetitive numerical simulations of partial differential equations (PDEs) for varying input parameters are often required (e.g., aircraft shape optimization over many design parameters) and solvers are required to perform rapid execution. In this study, we suggest a path that potentially opens up a possibility for physics-informed neural net… ▽ More In various engineering and applied science applications, repetitive numerical simulations of partial differential equations (PDEs) for varying input parameters are often required (e.g., aircraft shape optimization over many design parameters) and solvers are required to perform rapid execution. In this study, we suggest a path that potentially opens up a possibility for physics-informed neural networks (PINNs), emerging deep-learning-based solvers, to be considered as one such solver. Although PINNs have pioneered a proper integration of deep-learning and scientific computing, they require repetitive time-consuming training of neural networks, which is not suitable for many-query scenarios. To address this issue, we propose a lightweight low-rank PINNs containing only hundreds of model parameters and an associated hypernetwork-based meta-learning algorithm, which allows efficient approximation of solutions of PDEs for varying ranges of PDE input parameters. Moreover, we show that the proposed method is effective in overcoming a challenging issue, known as "failure modes" of PINNs. △ Less

Submitted 14 October, 2023; originally announced October 2023.

arXiv:2310.04824 [pdf, other]

PaperCard for Reporting Machine Assistance in Academic Writing

Authors: Won Ik Cho, Eunjung Cho, Kyunghyun Cho

Abstract: Academic writing process has benefited from various technological developments over the years including search engines, automatic translators, and editing tools that review grammar and spelling mistakes. They have enabled human writers to become more efficient in writing academic papers, for example by helping with finding relevant literature more effectively and polishing texts. While these devel… ▽ More Academic writing process has benefited from various technological developments over the years including search engines, automatic translators, and editing tools that review grammar and spelling mistakes. They have enabled human writers to become more efficient in writing academic papers, for example by helping with finding relevant literature more effectively and polishing texts. While these developments have so far played a relatively assistive role, recent advances in large-scale language models (LLMs) have enabled LLMs to play a more major role in the writing process, such as coming up with research questions and generating key contents. This raises critical questions surrounding the concept of authorship in academia. ChatGPT, a question-answering system released by OpenAI in November 2022, has demonstrated a range of capabilities that could be utilised in producing academic papers. The academic community will have to address relevant pressing questions, including whether Artificial Intelligence (AI) should be merited authorship if it made significant contributions in the writing process, or whether its use should be restricted such that human authorship would not be undermined. In this paper, we aim to address such questions, and propose a framework we name "PaperCard", a documentation for human authors to transparently declare the use of AI in their writing process. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: Accepted at EAAMO'23 as a poster presentation

arXiv:2309.13858 [pdf, other]

Impact of Human-AI Interaction on User Trust and Reliance in AI-Assisted Qualitative Coding

Authors: Jie Gao, Junming Cao, ShunYi Yeo, Kenny Tsu Wei Choo, Zheng Zhang, Toby Jia-Jun Li, Shengdong Zhao, Simon Tangi Perrault

Abstract: While AI shows promise for enhancing the efficiency of qualitative analysis, the unique human-AI interaction resulting from varied coding strategies makes it challenging to develop a trustworthy AI-assisted qualitative coding system (AIQCs) that supports coding tasks effectively. We bridge this gap by exploring the impact of varying coding strategies on user trust and reliance on AI. We conducted… ▽ More While AI shows promise for enhancing the efficiency of qualitative analysis, the unique human-AI interaction resulting from varied coding strategies makes it challenging to develop a trustworthy AI-assisted qualitative coding system (AIQCs) that supports coding tasks effectively. We bridge this gap by exploring the impact of varying coding strategies on user trust and reliance on AI. We conducted a mixed-methods split-plot 3x3 study, involving 30 participants, and a follow-up study with 6 participants, exploring varying text selection and code length in the use of our AIQCs system for qualitative analysis. Our results indicate that qualitative open coding should be conceptualized as a series of distinct subtasks, each with differing levels of complexity, and therefore, should be given tailored design considerations. We further observed a discrepancy between perceived and behavioral measures, and emphasized the potential challenges of under- and over-reliance on AIQCs systems. Additional design implications were also proposed for consideration. △ Less

Submitted 24 September, 2023; originally announced September 2023.

Comments: 27 pages with references, 9 figures, 5 tables

arXiv:2309.11017 [pdf, other]

3SAT on an All-to-All-Connected CMOS Ising Solver Chip

Authors: Hüsrev Cılasun, Ziqing Zeng, Ramprasath S, Abhimanyu Kumar, Hao Lo, William Cho, Chris H. Kim, Ulya R. Karpuzcu, Sachin S. Sapatnekar

Abstract: This work solves 3SAT, a classical NP-complete problem, on a CMOS-based Ising hardware chip with all-to-all connectivity. The paper addresses practical issues in going from algorithms to hardware. It considers several degrees of freedom in mapping the 3SAT problem to the chip - using multiple Ising formulations for 3SAT; exploring multiple strategies for decomposing large problems into subproblems… ▽ More This work solves 3SAT, a classical NP-complete problem, on a CMOS-based Ising hardware chip with all-to-all connectivity. The paper addresses practical issues in going from algorithms to hardware. It considers several degrees of freedom in mapping the 3SAT problem to the chip - using multiple Ising formulations for 3SAT; exploring multiple strategies for decomposing large problems into subproblems that can be accommodated on the Ising chip; and executing a sequence of these subproblems on CMOS hardware to obtain the solution to the larger problem. These are evaluated within a software framework, and the results are used to identify the most promising formulations and decomposition techniques. These best approaches are then mapped to the all-to-all hardware, and the performance of 3SAT is evaluated on the chip. Experimental data shows that the deployed decomposition and mapping strategies impact SAT solution quality: without our methods, the CMOS hardware cannot achieve 3SAT solutions on SATLIB benchmarks. △ Less

Submitted 19 September, 2023; originally announced September 2023.

ACM Class: B.7

arXiv:2309.01961 [pdf, other]

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

Authors: Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh , et al. (17 additional authors not shown)

Abstract: In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested… ▽ More In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested using a new evaluation dataset that includes a large variety of visual concepts from many domains. There was no specific training data provided for the challenge, and therefore the challenge entries were required to adapt to new types of image descriptions that had not been seen during training. This report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries. We expect that the outcomes of the challenge will contribute to the improvement of AI models on various vision-language tasks. △ Less

Submitted 10 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: Tech report, project page https://nice.lgresearch.ai/

arXiv:2307.16887 [pdf]

Data-Based MHE for Agile Quadrotor Flight

Authors: Wonoo Choo, Erkan Kayacan

Abstract: This paper develops a data-based moving horizon estimation (MHE) method for agile quadrotors. Accurate state estimation of the system is paramount for precise trajectory control for agile quadrotors; however, the high level of aerodynamic forces experienced by the quadrotors during high-speed flights make this task extremely challenging. These complex turbulent effects are difficult to model and t… ▽ More This paper develops a data-based moving horizon estimation (MHE) method for agile quadrotors. Accurate state estimation of the system is paramount for precise trajectory control for agile quadrotors; however, the high level of aerodynamic forces experienced by the quadrotors during high-speed flights make this task extremely challenging. These complex turbulent effects are difficult to model and the unmodelled dynamics introduce inaccuracies in the state estimation. In this work, we propose a method to model these aerodynamic effects using Gaussian Processes which we integrate into the MHE to achieve efficient and accurate state estimation with minimal computational burden. Through extensive simulation and experimental studies, this method has demonstrated significant improvement in state estimation performance displaying superior robustness to poor state measurements. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: 8 pages, accepted in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023

arXiv:2305.17680 [pdf, other]

Evaluating GPT-3 Generated Explanations for Hateful Content Moderation

Authors: Han Wang, Ming Shan Hee, Md Rabiul Awal, Kenny Tsu Wei Choo, Roy Ka-Wei Lee

Abstract: Recent research has focused on using large language models (LLMs) to generate explanations for hate speech through fine-tuning or prompting. Despite the growing interest in this area, these generated explanations' effectiveness and potential limitations remain poorly understood. A key concern is that these explanations, generated by LLMs, may lead to erroneous judgments about the nature of flagged… ▽ More Recent research has focused on using large language models (LLMs) to generate explanations for hate speech through fine-tuning or prompting. Despite the growing interest in this area, these generated explanations' effectiveness and potential limitations remain poorly understood. A key concern is that these explanations, generated by LLMs, may lead to erroneous judgments about the nature of flagged content by both users and content moderators. For instance, an LLM-generated explanation might inaccurately convince a content moderator that a benign piece of content is hateful. In light of this, we propose an analytical framework for examining hate speech explanations and conducted an extensive survey on evaluating such explanations. Specifically, we prompted GPT-3 to generate explanations for both hateful and non-hateful content, and a survey was conducted with 2,400 unique respondents to evaluate the generated explanations. Our findings reveal that (1) human evaluators rated the GPT-generated explanations as high quality in terms of linguistic fluency, informativeness, persuasiveness, and logical soundness, (2) the persuasive nature of these explanations, however, varied depending on the prompting strategy employed, and (3) this persuasiveness may result in incorrect judgments about the hatefulness of the content. Our study underscores the need for caution in applying LLM-generated explanations for content moderation. Code and results are available at https://github.com/Social-AI-Studio/GPT3-HateEval. △ Less

Submitted 30 August, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: 9 pages, 2 figures, Accepted by International Joint Conference on Artificial Intelligence(IJCAI)

ACM Class: I.2.7

arXiv:2305.17254 [pdf]

Computationally Efficient Data-Driven MPC for Agile Quadrotor Flight

Authors: Wonoo Choo, Erkan Kayacan

Abstract: This paper develops computationally efficient data-driven model predictive control (MPC) for Agile quadrotor flight. Agile quadrotors in high-speed flights can experience high levels of aerodynamic effects. Modeling these turbulent aerodynamic effects is a cumbersome task and the resulting model may be overly complex and computationally infeasible. Combining Gaussian Process (GP) regression models… ▽ More This paper develops computationally efficient data-driven model predictive control (MPC) for Agile quadrotor flight. Agile quadrotors in high-speed flights can experience high levels of aerodynamic effects. Modeling these turbulent aerodynamic effects is a cumbersome task and the resulting model may be overly complex and computationally infeasible. Combining Gaussian Process (GP) regression models with a simple dynamic model of the system has demonstrated significant improvements in control performance. However, direct integration of the GP models to the MPC pipeline poses a significant computational burden to the optimization process. Therefore, we present an approach to separate the GP models to the MPC pipeline by computing the model corrections using reference trajectory and the current state measurements prior to the online MPC optimization. This method has been validated in the Gazebo simulation environment and has demonstrated of up to $50\%$ reduction in trajectory tracking error, matching the performance of the direct GP integration method with improved computational efficiency. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 6 pages, accepted in ACC 2023 (American Control Conference, 2023)

arXiv:2305.14032 [pdf, other]

doi 10.21437/Interspeech.2023-1426

Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification

Authors: Sangmin Bae, June-Woo Kim, Won-Yang Cho, Hyerim Baek, Soyoun Son, Byungjo Lee, Changwan Ha, Kyongpil Tae, Sungnyun Kim, Se-Young Yun

Abstract: Respiratory sound contains crucial information for the early diagnosis of fatal lung diseases. Since the COVID-19 pandemic, there has been a growing interest in contact-free medical care based on electronic stethoscopes. To this end, cutting-edge deep learning models have been developed to diagnose lung diseases; however, it is still challenging due to the scarcity of medical data. In this study,… ▽ More Respiratory sound contains crucial information for the early diagnosis of fatal lung diseases. Since the COVID-19 pandemic, there has been a growing interest in contact-free medical care based on electronic stethoscopes. To this end, cutting-edge deep learning models have been developed to diagnose lung diseases; however, it is still challenging due to the scarcity of medical data. In this study, we demonstrate that the pretrained model on large-scale visual and audio datasets can be generalized to the respiratory sound classification task. In addition, we introduce a straightforward Patch-Mix augmentation, which randomly mixes patches between different samples, with Audio Spectrogram Transformer (AST). We further propose a novel and effective Patch-Mix Contrastive Learning to distinguish the mixed representations in the latent space. Our method achieves state-of-the-art performance on the ICBHI dataset, outperforming the prior leading score by an improvement of 4.08%. △ Less

Submitted 22 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: INTERSPEECH 2023, Code URL: https://github.com/raymin0223/patch-mix_contrastive_learning

arXiv:2304.05560 [pdf, other]

CoAIcoder: Examining the Effectiveness of AI-assisted Human-to-Human Collaboration in Qualitative Analysis

Authors: Jie Gao, Kenny Tsu Wei Choo, Junming Cao, Roy Ka Wei Lee, Simon Perrault

Abstract: While AI-assisted individual qualitative analysis has been substantially studied, AI-assisted collaborative qualitative analysis (CQA)-a process that involves multiple researchers working together to interpret data-remains relatively unexplored. After identifying CQA practices and design opportunities through formative interviews, we designed and implemented CoAIcoder, a tool leveraging AI to enha… ▽ More While AI-assisted individual qualitative analysis has been substantially studied, AI-assisted collaborative qualitative analysis (CQA)-a process that involves multiple researchers working together to interpret data-remains relatively unexplored. After identifying CQA practices and design opportunities through formative interviews, we designed and implemented CoAIcoder, a tool leveraging AI to enhance human-to-human collaboration within CQA through four distinct collaboration methods. With a between-subject design, we evaluated CoAIcoder with 32 pairs of CQA-trained participants across common CQA phases under each collaboration method. Our findings suggest that while using a shared AI model as a mediator among coders could improve CQA efficiency and foster agreement more quickly in the early coding stage, it might affect the final code diversity. We also emphasize the need to consider the independence level when using AI to assist human-to-human collaboration in various CQA scenarios. Lastly, we suggest design implications for future AI-assisted CQA systems. △ Less

Submitted 24 July, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: Will appear on ACM Transactions on Computer-Human Interaction (TOCHI)

arXiv:2304.00350 [pdf, other]

When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus

Authors: Won Ik Cho, Yoon Kyung Lee, Seoyeon Bae, Jihwan Kim, Sangah Park, Moosung Kim, Sowon Hahn, Nam Soo Kim

Abstract: Building a natural language dataset requires caution since word semantics is vulnerable to subtle text change or the definition of the annotated concept. Such a tendency can be seen in generative tasks like question-answering and dialogue generation and also in tasks that create a categorization-based corpus, like topic classification or sentiment analysis. Open-domain conversations involve two or… ▽ More Building a natural language dataset requires caution since word semantics is vulnerable to subtle text change or the definition of the annotated concept. Such a tendency can be seen in generative tasks like question-answering and dialogue generation and also in tasks that create a categorization-based corpus, like topic classification or sentiment analysis. Open-domain conversations involve two or more crowdworkers freely conversing about any topic, and collecting such data is particularly difficult for two reasons: 1) the dataset should be ``crafted" rather than ``obtained" due to privacy concerns, and 2) paid creation of such dialogues may differ from how crowdworkers behave in real-world settings. In this study, we tackle these issues when creating a large-scale open-domain persona dialogue corpus, where persona implies that the conversation is performed by several actors with a fixed persona and user-side workers from an unspecified crowd. △ Less

Submitted 1 April, 2023; originally announced April 2023.

Comments: Presented at HCOMP 2022 as Works-in-Progress

arXiv:2303.15833 [pdf, other]

Complementary Domain Adaptation and Generalization for Unsupervised Continual Domain Shift Learning

Authors: Wonguk Cho, Jinha Park, Taesup Kim

Abstract: Continual domain shift poses a significant challenge in real-world applications, particularly in situations where labeled data is not available for new domains. The challenge of acquiring knowledge in this problem setting is referred to as unsupervised continual domain shift learning. Existing methods for domain adaptation and generalization have limitations in addressing this issue, as they focus… ▽ More Continual domain shift poses a significant challenge in real-world applications, particularly in situations where labeled data is not available for new domains. The challenge of acquiring knowledge in this problem setting is referred to as unsupervised continual domain shift learning. Existing methods for domain adaptation and generalization have limitations in addressing this issue, as they focus either on adapting to a specific domain or generalizing to unseen domains, but not both. In this paper, we propose Complementary Domain Adaptation and Generalization (CoDAG), a simple yet effective learning framework that combines domain adaptation and generalization in a complementary manner to achieve three major goals of unsupervised continual domain shift learning: adapting to a current domain, generalizing to unseen domains, and preventing forgetting of previously seen domains. Our approach is model-agnostic, meaning that it is compatible with any existing domain adaptation and generalization algorithms. We evaluate CoDAG on several benchmark datasets and demonstrate that our model outperforms state-of-the-art models in all datasets and evaluation metrics, highlighting its effectiveness and robustness in handling unsupervised continual domain shift learning. △ Less

Submitted 13 October, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

Comments: ICCV 2023

arXiv:2303.11771 [pdf, other]

Self-Sufficient Framework for Continuous Sign Language Recognition

Authors: Youngjoon Jang, Youngtaek Oh, Jae Won Cho, Myungchul Kim, Dong-Jin Kim, In So Kweon, Joon Son Chung

Abstract: The goal of this work is to develop self-sufficient framework for Continuous Sign Language Recognition (CSLR) that addresses key issues of sign language recognition. These include the need for complex multi-scale features such as hands, face, and mouth for understanding, and absence of frame-level annotations. To this end, we propose (1) Divide and Focus Convolution (DFConv) which extracts both ma… ▽ More The goal of this work is to develop self-sufficient framework for Continuous Sign Language Recognition (CSLR) that addresses key issues of sign language recognition. These include the need for complex multi-scale features such as hands, face, and mouth for understanding, and absence of frame-level annotations. To this end, we propose (1) Divide and Focus Convolution (DFConv) which extracts both manual and non-manual features without the need for additional networks or annotations, and (2) Dense Pseudo-Label Refinement (DPLR) which propagates non-spiky frame-level pseudo-labels by combining the ground truth gloss sequence labels with the predicted sequence. We demonstrate that our model achieves state-of-the-art performance among RGB-based methods on large-scale CSLR benchmarks, PHOENIX-2014 and PHOENIX-2014-T, while showing comparable results with better efficiency when compared to other approaches that use multi-modality or extra annotations. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2302.14368 [pdf, other]

Towards Enhanced Controllability of Diffusion Models

Authors: Wonwoong Cho, Hareesh Ravi, Midhun Harikumar, Vinh Khuc, Krishna Kumar Singh, Jingwan Lu, David I. Inouye, Ajinkya Kale

Abstract: Denoising Diffusion models have shown remarkable capabilities in generating realistic, high-quality and diverse images. However, the extent of controllability during generation is underexplored. Inspired by techniques based on GAN latent space for image manipulation, we train a diffusion model conditioned on two latent codes, a spatial content mask and a flattened style embedding. We rely on the i… ▽ More Denoising Diffusion models have shown remarkable capabilities in generating realistic, high-quality and diverse images. However, the extent of controllability during generation is underexplored. Inspired by techniques based on GAN latent space for image manipulation, we train a diffusion model conditioned on two latent codes, a spatial content mask and a flattened style embedding. We rely on the inductive bias of the progressive denoising process of diffusion models to encode pose/layout information in the spatial structure mask and semantic/style information in the style code. We propose two generic sampling techniques for improving controllability. We extend composable diffusion models to allow for some dependence between conditional inputs, to improve the quality of generations while also providing control over the amount of guidance from each latent code and their joint distribution. We also propose timestep dependent weight scheduling for content and style latents to further improve the translations. We observe better controllability compared to existing methods and show that without explicit training objectives, diffusion models can be used for effective image manipulation and image translation. △ Less

Submitted 15 March, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: 28 pages, 28 figures

arXiv:2212.10433 [pdf, other]

Scheduling with Predictions

Authors: Woo-Hyung Cho, Shane Henderson, David Shmoys

Abstract: There is significant interest in deploying machine learning algorithms for diagnostic radiology, as modern learning techniques have made it possible to detect abnormalities in medical images within minutes. While machine-assisted diagnoses cannot yet reliably replace human reviews of images by a radiologist, they could inform prioritization rules for determining the order by which to review patien… ▽ More There is significant interest in deploying machine learning algorithms for diagnostic radiology, as modern learning techniques have made it possible to detect abnormalities in medical images within minutes. While machine-assisted diagnoses cannot yet reliably replace human reviews of images by a radiologist, they could inform prioritization rules for determining the order by which to review patient cases so that patients with time-sensitive conditions could benefit from early intervention. We study this scenario by formulating it as a learning-augmented online scheduling problem. We are given information about each arriving patient's urgency level in advance, but these predictions are inevitably error-prone. In this formulation, we face the challenges of decision making under imperfect information, and of responding dynamically to prediction error as we observe better data in real-time. We propose a simple online policy and show that this policy is in fact the best possible in certain stylized settings. We also demonstrate that our policy achieves the two desiderata of online algorithms with predictions: consistency (performance improvement with prediction accuracy) and robustness (protection against the worst case). We complement our theoretical findings with empirical evaluations of the policy under settings that more accurately reflect clinical scenarios in the real world. △ Less

Submitted 20 December, 2022; originally announced December 2022.

arXiv:2212.07663 [pdf, other]

Cross-Link Channel Prediction for Massive IoT Networks

Authors: Kun Woo Cho, Marco Cominelli, Francesco Gringoli, Joerg Widmer, Kyle Jamieson

Abstract: Tomorrow's massive-scale IoT sensor networks are poised to drive uplink traffic demand, especially in areas of dense deployment. To meet this demand, however, network designers leverage tools that often require accurate estimates of Channel State Information (CSI), which incurs a high overhead and thus reduces network throughput. Furthermore, the overhead generally scales with the number of client… ▽ More Tomorrow's massive-scale IoT sensor networks are poised to drive uplink traffic demand, especially in areas of dense deployment. To meet this demand, however, network designers leverage tools that often require accurate estimates of Channel State Information (CSI), which incurs a high overhead and thus reduces network throughput. Furthermore, the overhead generally scales with the number of clients, and so is of special concern in such massive IoT sensor networks. While prior work has used transmissions over one frequency band to predict the channel of another frequency band on the same link, this paper takes the next step in the effort to reduce CSI overhead: predict the CSI of a nearby but distinct link. We propose Cross-Link Channel Prediction (CLCP), a technique that leverages multi-view representation learning to predict the channel response of a large number of users, thereby reducing channel estimation overhead further than previously possible. CLCP's design is highly practical, exploiting channel estimates obtained from existing transmissions instead of dedicated channel sounding or extra pilot signals. We have implemented CLCP for two different Wi-Fi versions, namely 802.11n and 802.11ax, the latter being the leading candidate for future IoT networks. We evaluate CLCP in two large-scale indoor scenarios involving both line-of-sight and non-line-of-sight transmissions with up to 144 different 802.11ax users. Moreover, we measure its performance with four different channel bandwidths, from 20 MHz up to 160 MHz. Our results show that CLCP provides a 2x throughput gain over baseline 802.11ax and a 30 percent throughput gain over existing cross-band prediction algorithms. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: 13 pages, 12 figures

arXiv:2211.00448 [pdf, other]

Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition

Authors: Youngjoon Jang, Youngtaek Oh, Jae Won Cho, Dong-Jin Kim, Joon Son Chung, In So Kweon

Abstract: The goal of this work is background-robust continuous sign language recognition. Most existing Continuous Sign Language Recognition (CSLR) benchmarks have fixed backgrounds and are filmed in studios with a static monochromatic background. However, signing is not limited only to studios in the real world. In order to analyze the robustness of CSLR models under background shifts, we first evaluate e… ▽ More The goal of this work is background-robust continuous sign language recognition. Most existing Continuous Sign Language Recognition (CSLR) benchmarks have fixed backgrounds and are filmed in studios with a static monochromatic background. However, signing is not limited only to studios in the real world. In order to analyze the robustness of CSLR models under background shifts, we first evaluate existing state-of-the-art CSLR models on diverse backgrounds. To synthesize the sign videos with a variety of backgrounds, we propose a pipeline to automatically generate a benchmark dataset utilizing existing CSLR benchmarks. Our newly constructed benchmark dataset consists of diverse scenes to simulate a real-world environment. We observe even the most recent CSLR method cannot recognize glosses well on our new dataset with changed backgrounds. In this regard, we also propose a simple yet effective training scheme including (1) background randomization and (2) feature disentanglement for CSLR models. The experimental results on our dataset demonstrate that our method generalizes well to other unseen background data with minimal additional training images. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: Our dataset is available at https://github.com/art-jang/Signing-Outside-the-Studio

arXiv:2209.11554 [pdf, other]

mmWall: A Transflective Metamaterial Surface for mmWave Networks

Authors: Kun Woo Cho, Mohammad H. Mazaheri, Jeremy Gummeson, Omid Abari, Kyle Jamieson

Abstract: Mobile operators are poised to leverage millimeter wave technology as 5G evolves, but despite efforts to bolster their reliability indoors and outdoors, mmWave links remain vulnerable to blockage by walls, people, and obstacles. Further, there is significant interest in bringing outdoor mmWave coverage indoors, which for similar reasons remains challenging today. This paper presents the design, ha… ▽ More Mobile operators are poised to leverage millimeter wave technology as 5G evolves, but despite efforts to bolster their reliability indoors and outdoors, mmWave links remain vulnerable to blockage by walls, people, and obstacles. Further, there is significant interest in bringing outdoor mmWave coverage indoors, which for similar reasons remains challenging today. This paper presents the design, hardware implementation, and experimental evaluation of mmWall, the first electronically almost-360 degree steerable metamaterial surface that operates above 24 GHz and both refracts or reflects incoming mmWave transmissions. Our metamaterial design consists of arrays of varactor-split ring resonator unit cells, miniaturized for mmWave. Custom control circuitry drives each resonator, overcoming coupling challenges that arise at scale. Leveraging beam steering algorithms, we integrate mmWall into the link layer discovery protocols of common mmWave networks. We have fabricated a 10 cm by 20 cm mmWall prototype consisting of a 28 by 76 unit cell array, and evaluate in indoor, outdoor-to-indoor, and multi-beam scenarios. Indoors, mmWall guarantees 91% of locations outage-free under 128-QAM mmWave data rates and boosts SNR by up to 15 dB. Outdoors, mmWall reduces the probability of complete link failure by a ratio of up to 40% under 0-80% path blockage and boosts SNR by up to 30 dB. △ Less

Submitted 25 September, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: 18 pages, 18 figures

arXiv:2209.07578 [pdf, other]

Pixel-wise classification in graphene-detection with tree-based machine learning algorithms

Authors: Woon Hyung Cho, Jiseon Shin, Young Duck Kim, George J. Jung

Abstract: Mechanical exfoliation of graphene and its identification by optical inspection is one of the milestones in condensed matter physics that sparked the field of 2D materials. Finding regions of interest from the entire sample space and identification of layer number is a routine task potentially amenable to automatization. We propose supervised pixel-wise classification methods showing a high perfor… ▽ More Mechanical exfoliation of graphene and its identification by optical inspection is one of the milestones in condensed matter physics that sparked the field of 2D materials. Finding regions of interest from the entire sample space and identification of layer number is a routine task potentially amenable to automatization. We propose supervised pixel-wise classification methods showing a high performance even with a small number of training image datasets that require short computational time without GPU. We introduce four different tree-based machine learning algorithms -- decision tree, random forest, extreme gradient boost, and light gradient boosting machine. We train them with five optical microscopy images of graphene, and evaluate their performances with multiple metrics and indices. We also discuss combinatorial machine learning models between the three single classifiers and assess their performances in identification and reliability. The code developed in this paper is open to the public and will be released at github.com/gjung-group/Graphene_segmentation. △ Less

Submitted 24 August, 2022; originally announced September 2022.

Comments: 12 pages, 6 figures

arXiv:2208.00690 [pdf, other]

Generative Bias for Robust Visual Question Answering

Authors: Jae Won Cho, Dong-jin Kim, Hyeonggon Ryu, In So Kweon

Abstract: The task of Visual Question Answering (VQA) is known to be plagued by the issue of VQA models exploiting biases within the dataset to make its final prediction. Various previous ensemble based debiasing methods have been proposed where an additional model is purposefully trained to be biased in order to train a robust target model. However, these methods compute the bias for a model simply from th… ▽ More The task of Visual Question Answering (VQA) is known to be plagued by the issue of VQA models exploiting biases within the dataset to make its final prediction. Various previous ensemble based debiasing methods have been proposed where an additional model is purposefully trained to be biased in order to train a robust target model. However, these methods compute the bias for a model simply from the label statistics of the training data or from single modal branches. In this work, in order to better learn the bias a target VQA model suffers from, we propose a generative method to train the bias model directly from the target model, called GenB. In particular, GenB employs a generative network to learn the bias in the target model through a combination of the adversarial objective and knowledge distillation. We then debias our target model with GenB as a bias model, and show through extensive experiments the effects of our method on various VQA bias datasets including VQA-CP2, VQA-CP1, GQA-OOD, and VQA-CE, and show state-of-the-art results with the LXMERT architecture on VQA-CP2. △ Less

Submitted 22 March, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

Comments: CVPR 2023

arXiv:2207.11534 [pdf, other]

Comparative Validation of AI and non-AI Methods in MRI Volumetry to Diagnose Parkinsonian Syndromes

Authors: Joomee Song, Juyoung Hahm, Jisoo Lee, Chae Yeon Lim, Myung Jin Chung, Jinyoung Youn, Jin Whan Cho, Jong Hyeon Ahn, Kyung-Su Kim

Abstract: Automated segmentation and volumetry of brain magnetic resonance imaging (MRI) scans are essential for the diagnosis of Parkinson's disease (PD) and Parkinson's plus syndromes (P-plus). To enhance the diagnostic performance, we adopt deep learning (DL) models in brain segmentation and compared their performance with the gold-standard non-DL method. We collected brain MRI scans of healthy controls… ▽ More Automated segmentation and volumetry of brain magnetic resonance imaging (MRI) scans are essential for the diagnosis of Parkinson's disease (PD) and Parkinson's plus syndromes (P-plus). To enhance the diagnostic performance, we adopt deep learning (DL) models in brain segmentation and compared their performance with the gold-standard non-DL method. We collected brain MRI scans of healthy controls (n=105) and patients with PD (n=105), multiple systemic atrophy (n=132), and progressive supranuclear palsy (n=69) at Samsung Medical Center from January 2017 to December 2020. Using the gold-standard non-DL model, FreeSurfer (FS), we segmented six brain structures: midbrain, pons, caudate, putamen, pallidum, and third ventricle, and considered them as annotating data for DL models, the representative V-Net and UNETR. The Dice scores and area under the curve (AUC) for differentiating normal, PD, and P-plus cases were calculated. The segmentation times of V-Net and UNETR for the six brain structures per patient were 3.48 +- 0.17 and 48.14 +- 0.97 s, respectively, being at least 300 times faster than FS (15,735 +- 1.07 s). Dice scores of both DL models were sufficiently high (>0.85), and their AUCs for disease classification were superior to that of FS. For classification of normal vs. P-plus and PD vs. multiple systemic atrophy (cerebellar type), the DL models and FS showed AUCs above 0.8. DL significantly reduces the analysis time without compromising the performance of brain segmentation and differential diagnosis. Our findings may contribute to the adoption of DL brain MRI segmentation in clinical settings and advance brain research. △ Less

Submitted 23 July, 2022; originally announced July 2022.

Comments: Joomee Song and Juyoung Hahm contributed equally to this work as the co-first author. Jong Hyeon Ahn and Kyung-Su Kim (kskim.doc@gmail.com) contributed equally to this work as the co-corresponding author

arXiv:2207.10287 [pdf, other]

Towards Accurate Open-Set Recognition via Background-Class Regularization

Authors: Wonwoo Cho, Jaegul Choo

Abstract: In open-set recognition (OSR), classifiers should be able to reject unknown-class samples while maintaining high closed-set classification accuracy. To effectively solve the OSR problem, previous studies attempted to limit latent feature space and reject data located outside the limited space via offline analyses, e.g., distance-based feature analyses, or complicated network architectures. To cond… ▽ More In open-set recognition (OSR), classifiers should be able to reject unknown-class samples while maintaining high closed-set classification accuracy. To effectively solve the OSR problem, previous studies attempted to limit latent feature space and reject data located outside the limited space via offline analyses, e.g., distance-based feature analyses, or complicated network architectures. To conduct OSR via a simple inference process (without offline analyses) in standard classifier architectures, we use distance-based classifiers instead of conventional Softmax classifiers. Afterwards, we design a background-class regularization strategy, which uses background-class data as surrogates of unknown-class ones during training phase. Specifically, we formulate a novel regularization loss suitable for distance-based classifiers, which reserves sufficiently large class-wise latent feature spaces for known classes and forces background-class samples to be located far away from the limited spaces. Through our extensive experiments, we show that the proposed method provides robust OSR results, while maintaining high closed-set classification accuracy. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Comments: Accepted to ECCV 2022

arXiv:2207.05022 [pdf, other]

doi 10.1145/3575693.3575698

STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining

Authors: Liwei Guo, Wonkyo Choe, Felix Xiaozhu Lin

Abstract: Natural Language Processing (NLP) inference is seeing increasing adoption by mobile applications, where on-device inference is desirable for crucially preserving user data privacy and avoiding network roundtrips. Yet, the unprecedented size of an NLP model stresses both latency and memory, creating a tension between the two key resources of a mobile device. To meet a target latency, holding the wh… ▽ More Natural Language Processing (NLP) inference is seeing increasing adoption by mobile applications, where on-device inference is desirable for crucially preserving user data privacy and avoiding network roundtrips. Yet, the unprecedented size of an NLP model stresses both latency and memory, creating a tension between the two key resources of a mobile device. To meet a target latency, holding the whole model in memory launches execution as soon as possible but increases one app's memory footprints by several times, limiting its benefits to only a few inferences before being recycled by mobile memory management. On the other hand, loading the model from storage on demand incurs IO as long as a few seconds, far exceeding the delay range satisfying to a user; pipelining layerwise model loading and execution does not hide IO either, due to the high skewness between IO and computation delays. To this end, we propose Speedy Transformer Inference (STI). Built on the key idea of maximizing IO/compute resource utilization on the most important parts of a model, STI reconciles the latency v.s. memory tension via two novel techniques. First, model sharding. STI manages model parameters as independently tunable shards, and profiles their importance to accuracy. Second, elastic pipeline planning with a preload buffer. STI instantiates an IO/compute pipeline and uses a small buffer for preload shards to bootstrap execution without stalling at early stages; it judiciously selects, tunes, and assembles shards per their importance for resource-elastic execution, maximizing inference accuracy. Atop two commodity SoCs, we build STI and evaluate it against a wide range of NLP tasks, under a practical range of target latencies, and on both CPU and GPU. We demonstrate that STI delivers high accuracies with 1-2 orders of magnitude lower memory, outperforming competitive baselines. △ Less

Submitted 31 January, 2023; v1 submitted 11 July, 2022; originally announced July 2022.

Comments: ASPLOS'23

arXiv:2207.02286 [pdf, other]

Cooperative Distribution Alignment via JSD Upper Bound

Authors: Wonwoong Cho, Ziyu Gong, David I. Inouye

Abstract: Unsupervised distribution alignment estimates a transformation that maps two or more source distributions to a shared aligned distribution given only samples from each distribution. This task has many applications including generative modeling, unsupervised domain adaptation, and socially aware learning. Most prior works use adversarial learning (i.e., min-max optimization), which can be challengi… ▽ More Unsupervised distribution alignment estimates a transformation that maps two or more source distributions to a shared aligned distribution given only samples from each distribution. This task has many applications including generative modeling, unsupervised domain adaptation, and socially aware learning. Most prior works use adversarial learning (i.e., min-max optimization), which can be challenging to optimize and evaluate. A few recent works explore non-adversarial flow-based (i.e., invertible) approaches, but they lack a unified perspective and are limited in efficiently aligning multiple distributions. Therefore, we propose to unify and generalize previous flow-based approaches under a single non-adversarial framework, which we prove is equivalent to minimizing an upper bound on the Jensen-Shannon Divergence (JSD). Importantly, our problem reduces to a min-min, i.e., cooperative, problem and can provide a natural evaluation metric for unsupervised distribution alignment. We show empirical results on both simulated and real-world datasets to demonstrate the benefits of our approach. Code is available at https://github.com/inouye-lab/alignment-upper-bound. △ Less

Submitted 31 October, 2022; v1 submitted 5 July, 2022; originally announced July 2022.

Comments: Accepted for publication in Advances in Neural Information Processing Systems 36 (NeurIPS 2022)

arXiv:2206.14939 [pdf, other]

Towards Dual-band Reconfigurable Metamaterial Surfaces for Satellite Networking

Authors: Kun Woo Cho, Yasaman Ghasempour, Kyle Jamieson

Abstract: The first low earth orbit satellite networks for internet service have recently been deployed and are growing in size, yet will face deployment challenges in many practical circumstances of interest. This paper explores how a dual-band, electronically tunable smart surface can enable dynamic beam alignment between the satellite and mobile users, make service possible in urban canyons, and improve… ▽ More The first low earth orbit satellite networks for internet service have recently been deployed and are growing in size, yet will face deployment challenges in many practical circumstances of interest. This paper explores how a dual-band, electronically tunable smart surface can enable dynamic beam alignment between the satellite and mobile users, make service possible in urban canyons, and improve service in rural areas. Our design is the first of its kind to target dual channels in the Ku radio frequency band with a novel dual Huygens resonator design that leverages radio reciprocity to allow our surface to simultaneously steer and modulate energy in the satellite uplink and downlink directions, and in both reflective and transmissive modes of operation. Our surface, Wall-E, is designed and evaluated in an electromagnetic simulator and demonstrates 94% transmission efficiency and an 85% reflection efficiency, with at most 6 dB power loss at steering angles over a 150-degree field of view for both transmission and reflection. With a 75 sq cm surface, our link budget calculations predict a 4 dB and 24 dB improvement in the SNR of a link entering the window of a rural home in comparison to the free-space path and brick wall penetration, respectively. △ Less

Submitted 29 June, 2022; originally announced June 2022.

Comments: 9 pages including references, 9 figures

ACM Class: C.2.5; C.3

arXiv:2206.09885 [pdf, other]

KOLOMVERSE: KRISO open large-scale image dataset for object detection in the maritime universe

Authors: Abhilasha Nanda, Sung Won Cho, Hyeopwoo Lee, Jin Hyoung Park

Abstract: Over the years, datasets have been developed for various object detection tasks. Object detection in the maritime domain is essential for the safety and navigation of ships. However, there is still a lack of publicly available large-scale datasets in the maritime domain. To overcome this challenge, we present KOLOMVERSE, an open large-scale image dataset for object detection in the maritime domain… ▽ More Over the years, datasets have been developed for various object detection tasks. Object detection in the maritime domain is essential for the safety and navigation of ships. However, there is still a lack of publicly available large-scale datasets in the maritime domain. To overcome this challenge, we present KOLOMVERSE, an open large-scale image dataset for object detection in the maritime domain by KRISO (Korea Research Institute of Ships and Ocean Engineering). We collected 5,845 hours of video data captured from 21 territorial waters of South Korea. Through an elaborate data quality assessment process, we gathered around 2,151,470 4K resolution images from the video data. This dataset considers various environments: weather, time, illumination, occlusion, viewpoint, background, wind speed, and visibility. The KOLOMVERSE consists of five classes (ship, buoy, fishnet buoy, lighthouse and wind farm) for maritime object detection. The dataset has images of 3840$\times$2160 pixels and to our knowledge, it is by far the largest publicly available dataset for object detection in the maritime domain. We performed object detection experiments and evaluated our dataset on several pre-trained state-of-the-art architectures to show the effectiveness and usefulness of our dataset. The dataset is available at: \url{https://github.com/MaritimeDataset/KOLOMVERSE}. △ Less

Submitted 20 June, 2022; originally announced June 2022.

Comments: 13 Pages, 12 figures, submitted to NeurIPS 2022 Datasets and Benchmarks Track (Under Review)

arXiv:2205.01059 [pdf, other]

Enhanced Physics-Informed Neural Networks with Augmented Lagrangian Relaxation Method (AL-PINNs)

Authors: Hwijae Son, Sung Woong Cho, Hyung Ju Hwang

Abstract: Physics-Informed Neural Networks (PINNs) have become a prominent application of deep learning in scientific computation, as they are powerful approximators of solutions to nonlinear partial differential equations (PDEs). There have been numerous attempts to facilitate the training process of PINNs by adjusting the weight of each component of the loss function, called adaptive loss-balancing algori… ▽ More Physics-Informed Neural Networks (PINNs) have become a prominent application of deep learning in scientific computation, as they are powerful approximators of solutions to nonlinear partial differential equations (PDEs). There have been numerous attempts to facilitate the training process of PINNs by adjusting the weight of each component of the loss function, called adaptive loss-balancing algorithms. In this paper, we propose an Augmented Lagrangian relaxation method for PINNs (AL-PINNs). We treat the initial and boundary conditions as constraints for the optimization problem of the PDE residual. By employing Augmented Lagrangian relaxation, the constrained optimization problem becomes a sequential max-min problem so that the learnable parameters $λ$ adaptively balance each loss component. Our theoretical analysis reveals that the sequence of minimizers of the proposed loss functions converges to an actual solution for the Helmholtz, viscous Burgers, and Klein--Gordon equations. We demonstrate through various numerical experiments that AL-PINNs yield a much smaller relative error compared with that of state-of-the-art adaptive loss-balancing algorithms. △ Less

Submitted 30 May, 2023; v1 submitted 29 April, 2022; originally announced May 2022.

arXiv:2204.12687 [pdf, other]

Multiresolution community analysis of international trade networks

Authors: Wonguk Cho, Daekyung Lee, Beom Jun Kim

Abstract: The international trade network is a complex system where multiple trade blocs with varying sizes coexist and overlap with each other. However, the resulting structures of community detection in trade networks are often inconsistent and fails to capture the complex landscape of international trade. To address these problems, we propose a multiresolution framework that aggregates all the configurat… ▽ More The international trade network is a complex system where multiple trade blocs with varying sizes coexist and overlap with each other. However, the resulting structures of community detection in trade networks are often inconsistent and fails to capture the complex landscape of international trade. To address these problems, we propose a multiresolution framework that aggregates all the configuration information from a range of resolutions. This allows us to consider trade communities of different sizes and illuminate the underlying hierarchical structure of trade networks and its constituting blocks. Furthermore, by measuring membership inconsistency (MeI) of each country and conducting multiple regression analysis with various economic and political indicators, we demonstrate that there exists a positive correlation between the external instability of countries and their structural inconsistency in terms of network topology. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: 19 pages, 5 figures, 1 table

arXiv:2204.02633 [pdf]

DAGAM: Data Augmentation with Generation And Modification

Authors: Byeong-Cheol Jo, Tak-Sung Heo, Yeongjoon Park, Yongmin Yoo, Won Ik Cho, Kyungsun Kim

Abstract: Text classification is a representative downstream task of natural language processing, and has exhibited excellent performance since the advent of pre-trained language models based on Transformer architecture. However, in pre-trained language models, under-fitting often occurs due to the size of the model being very large compared to the amount of available training data. Along with significant i… ▽ More Text classification is a representative downstream task of natural language processing, and has exhibited excellent performance since the advent of pre-trained language models based on Transformer architecture. However, in pre-trained language models, under-fitting often occurs due to the size of the model being very large compared to the amount of available training data. Along with significant importance of data collection in modern machine learning paradigm, studies have been actively conducted for natural language data augmentation. In light of this, we introduce three data augmentation schemes that help reduce underfitting problems of large-scale language models. Primarily we use a generation model for data augmentation, which is defined as Data Augmentation with Generation (DAG). Next, we augment data using text modification techniques such as corruption and word order change (Data Augmentation with Modification, DAM). Finally, we propose Data Augmentation with Generation And Modification (DAGAM), which combines DAG and DAM techniques for a boosted performance. We conduct data augmentation for six benchmark datasets of text classification task, and verify the usefulness of DAG, DAM, and DAGAM through BERT-based fine-tuning and evaluation, deriving better results compared to the performance with original datasets. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2204.00089 [pdf, other]

Investigating Top-$k$ White-Box and Transferable Black-box Attack

Authors: Chaoning Zhang, Philipp Benz, Adil Karjauv, Jae Won Cho, Kang Zhang, In So Kweon

Abstract: Existing works have identified the limitation of top-$1$ attack success rate (ASR) as a metric to evaluate the attack strength but exclusively investigated it in the white-box setting, while our work extends it to a more practical black-box setting: transferable attack. It is widely reported that stronger I-FGSM transfers worse than simple FGSM, leading to a popular belief that transferability is… ▽ More Existing works have identified the limitation of top-$1$ attack success rate (ASR) as a metric to evaluate the attack strength but exclusively investigated it in the white-box setting, while our work extends it to a more practical black-box setting: transferable attack. It is widely reported that stronger I-FGSM transfers worse than simple FGSM, leading to a popular belief that transferability is at odds with the white-box attack strength. Our work challenges this belief with empirical finding that stronger attack actually transfers better for the general top-$k$ ASR indicated by the interest class rank (ICR) after attack. For increasing the attack strength, with an intuitive interpretation of the logit gradient from the geometric perspective, we identify that the weakness of the commonly used losses lie in prioritizing the speed to fool the network instead of maximizing its strength. To this end, we propose a new normalized CE loss that guides the logit to be updated in the direction of implicitly maximizing its rank distance from the ground-truth class. Extensive results in various settings have verified that our proposed new loss is simple yet effective for top-$k$ attack. Code is available at: \url{https://bit.ly/3uCiomP} △ Less

Submitted 30 March, 2022; originally announced April 2022.

Comments: Accepted by CVPR2022

Showing 1–50 of 98 results for author: Cho, W