subscribe to arXiv mailings

Representation Type of the Descent Algebras of Type $\mathbb{A}$

Abstract: We classify the representation type of the descent algebras of type $\mathbb{A}$ in the positive characteristic case. We classify the representation type of the descent algebras of type $\mathbb{A}$ in the positive characteristic case. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05531 [pdf, ps, other]

Projective Modules and Cohomology for Integral Basic Algebras

Authors: David J. Benson, Kay Jin Lim

Abstract: Algebras defined over fields of characteristic zero and positive characteristic usually do not behave the same way. However, for certain algebras, for example the group algebras, they behave the same way as the characteristic zero case at good enough prime. In this paper, we initiate the study of this topic by imposing increasingly strong hypotheses on basic algebras. When the algebras satisfy the… ▽ More Algebras defined over fields of characteristic zero and positive characteristic usually do not behave the same way. However, for certain algebras, for example the group algebras, they behave the same way as the characteristic zero case at good enough prime. In this paper, we initiate the study of this topic by imposing increasingly strong hypotheses on basic algebras. When the algebras satisfy the right hypotheses, we have equalities of the dimensions of their cohomology groups between simple modules and equalities of graded Cartan numbers. The examples include the Solomon descent algebras of finite Coxeter groups at large enough primes, nil-Coxeter algebra, and certain finite semigroup algebras at an arbitrary prime. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.04903 [pdf, other]

MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension

Authors: Zekun Li, Xianjun Yang, Kyuri Choi, Wanrong Zhu, Ryan Hsieh, HyeonJung Kim, Jin Hyuk Lim, Sungyoung Ji, Byungju Lee, Xifeng Yan, Linda Ruth Petzold, Stephen D. Wilson, Woosang Lim, William Yang Wang

Abstract: The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models (LMMs) has heightened the demand for AI-based scientific assistants capable of understanding scientific articles and figures. Despite progress, there remains a significant gap in evaluating models' comprehension of professional, graduate-level, and even PhD-level scientific content. Current datasets and benchmarks pr… ▽ More The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models (LMMs) has heightened the demand for AI-based scientific assistants capable of understanding scientific articles and figures. Despite progress, there remains a significant gap in evaluating models' comprehension of professional, graduate-level, and even PhD-level scientific content. Current datasets and benchmarks primarily focus on relatively simple scientific tasks and figures, lacking comprehensive assessments across diverse advanced scientific disciplines. To bridge this gap, we collected a multimodal, multidisciplinary dataset from open-access scientific articles published in Nature Communications journals. This dataset spans 72 scientific disciplines, ensuring both diversity and quality. We created benchmarks with various tasks and settings to comprehensively evaluate LMMs' capabilities in understanding scientific figures and content. Our evaluation revealed that these tasks are highly challenging: many open-source models struggled significantly, and even GPT-4V and GPT-4o faced difficulties. We also explored using our dataset as training resources by constructing visual instruction-following data, enabling the 7B LLaVA model to achieve performance comparable to GPT-4V/o on our benchmark. Additionally, we investigated the use of our interleaved article texts and figure images for pre-training LMMs, resulting in improvements on the material generation task. The source dataset, including articles, figures, constructed benchmarks, and visual instruction-following data, is open-sourced. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Code and data are available at https://github.com/Leezekun/MMSci

arXiv:2407.00649 [pdf, other]

Particle Semi-Implicit Variational Inference

Authors: Jen Ning Lim, Adam M. Johansen

Abstract: Semi-implicit variational inference (SIVI) enriches the expressiveness of variational families by utilizing a kernel and a mixing distribution to hierarchically define the variational distribution. Existing SIVI methods parameterize the mixing distribution using implicit distributions, leading to intractable variational densities. As a result, directly maximizing the evidence lower bound (ELBO) is… ▽ More Semi-implicit variational inference (SIVI) enriches the expressiveness of variational families by utilizing a kernel and a mixing distribution to hierarchically define the variational distribution. Existing SIVI methods parameterize the mixing distribution using implicit distributions, leading to intractable variational densities. As a result, directly maximizing the evidence lower bound (ELBO) is not possible and so, they resort to either: optimizing bounds on the ELBO, employing costly inner-loop Markov chain Monte Carlo runs, or solving minimax objectives. In this paper, we propose a novel method for SIVI called Particle Variational Inference (PVI) which employs empirical measures to approximate the optimal mixing distributions characterized as the minimizer of a natural free energy functional via a particle approximation of an Euclidean--Wasserstein gradient flow. This approach means that, unlike prior works, PVI can directly optimize the ELBO; furthermore, it makes no parametric assumption about the mixing distribution. Our empirical results demonstrate that PVI performs favourably against other SIVI methods across various tasks. Moreover, we provide a theoretical analysis of the behaviour of the gradient flow of a related free energy functional: establishing the existence and uniqueness of solutions as well as propagation of chaos results. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.17768 [pdf, other]

EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data

Authors: Jesse Zhang, Minho Heo, Zuxin Liu, Erdem Biyik, Joseph J Lim, Yao Liu, Rasool Fakoor

Abstract: Most reinforcement learning (RL) methods focus on learning optimal policies over low-level action spaces. While these methods can perform well in their training environments, they lack the flexibility to transfer to new tasks. Instead, RL agents that can act over useful, temporally extended skills rather than low-level actions can learn new tasks more easily. Prior work in skill-based RL either re… ▽ More Most reinforcement learning (RL) methods focus on learning optimal policies over low-level action spaces. While these methods can perform well in their training environments, they lack the flexibility to transfer to new tasks. Instead, RL agents that can act over useful, temporally extended skills rather than low-level actions can learn new tasks more easily. Prior work in skill-based RL either requires expert supervision to define useful skills, which is hard to scale, or learns a skill-space from offline data with heuristics that limit the adaptability of the skills, making them difficult to transfer during downstream RL. Our approach, EXTRACT, instead utilizes pre-trained vision language models to extract a discrete set of semantically meaningful skills from offline data, each of which is parameterized by continuous arguments, without human supervision. This skill parameterization allows robots to learn new tasks by only needing to learn when to select a specific skill and how to modify its arguments for the specific task. We demonstrate through experiments in sparse-reward, image-based, robot manipulation environments that EXTRACT can more quickly learn new tasks than prior works, with major gains in sample efficiency and performance over prior skill-based RL. Website at https://www.jessezhang.net/projects/extract/. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 22 pages, 13 figures

arXiv:2406.16003 [pdf]

Unidirectional Chiral Emission via Twisted Bi-layer Metasurfaces

Authors: Dmitrii Gromyko, Shu An, Sergey Gorelik, Jiahui Xu, Li Jun Lim, Henry Yit Loong Lee, Febiana Tjiptoharsono, Zhi-Kuang Tan, Cheng-Wei Qiu, Zhaogang Dong, Lin Wu

Abstract: Controlling and channelling light emissions from unpolarized quantum dots into specific directions with chiral polarization remains a key challenge in modern photonics. Stacked metasurface designs offer a potential compact solution for chirality and directionality engineering. However, experimental observations of directional chiral radiation from resonant metasurfaces with quantum emitters remain… ▽ More Controlling and channelling light emissions from unpolarized quantum dots into specific directions with chiral polarization remains a key challenge in modern photonics. Stacked metasurface designs offer a potential compact solution for chirality and directionality engineering. However, experimental observations of directional chiral radiation from resonant metasurfaces with quantum emitters remain obscure. In this paper, we present experimental observations of unidirectional chiral emission from a twisted bi-layer metasurface via multi-dimensional control, including twist angle, interlayer distance, and lateral displacement between the top and bottom layers, as enabled by doublet alignment lithography (DAL). First, maintaining alignment, the metasurface demonstrates a resonant intrinsic optical chirality with near-unity circular dichroism of 0.94 and reflectance difference of 74%, where a high circular dichroism greater than 0.9 persists across a wide range of angles from -11 to 11 degrees. Second, engineered lateral displacement induces a unidirectional chiral resonance, resulting in unidirectional chiral emission from the quantum dots deposited onto the metasurface. Our bi-layer metasurfaces offer a universal compact platform for efficient radiation manipulation over a wide angular range, promising potential applications in miniaturized lasers, grating couplers, and chiral nanoantennas. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 16 pages, 4 figures

arXiv:2406.15527 [pdf, other]

Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling

Authors: Cong Xu, Gayathri Saranathan, Mahammad Parwez Alam, Arpit Shah, James Lim, Soon Yee Wong, Foltin Martin, Suparna Bhattacharya

Abstract: Evaluating LLMs and text-to-image models is a computationally intensive task often overlooked. Efficient evaluation is crucial for understanding the diverse capabilities of these models and enabling comparisons across a growing number of new models and benchmarks. To address this, we introduce SubLIME, a data-efficient evaluation framework that employs adaptive sampling techniques, such as cluster… ▽ More Evaluating LLMs and text-to-image models is a computationally intensive task often overlooked. Efficient evaluation is crucial for understanding the diverse capabilities of these models and enabling comparisons across a growing number of new models and benchmarks. To address this, we introduce SubLIME, a data-efficient evaluation framework that employs adaptive sampling techniques, such as clustering and quality-based methods, to create representative subsets of benchmarks. Our approach ensures statistically aligned model rankings compared to full datasets, evidenced by high Pearson correlation coefficients. Empirical analysis across six NLP benchmarks reveals that: (1) quality-based sampling consistently achieves strong correlations (0.85 to 0.95) with full datasets at a 10\% sampling rate such as Quality SE and Quality CPD (2) clustering methods excel in specific benchmarks such as MMLU (3) no single method universally outperforms others across all metrics. Extending this framework, we leverage the HEIM leaderboard to cover 25 text-to-image models on 17 different benchmarks. SubLIME dynamically selects the optimal technique for each benchmark, significantly reducing evaluation costs while preserving ranking integrity and score distribution. Notably, a minimal sampling rate of 1% proves effective for benchmarks like MMLU. Additionally, we demonstrate that employing difficulty-based sampling to target more challenging benchmark segments enhances model differentiation with broader score distributions. We also combine semantic search, tool use, and GPT-4 review to identify redundancy across benchmarks within specific LLM categories, such as coding benchmarks. This allows us to further reduce the number of samples needed to maintain targeted rank preservation. Overall, SubLIME offers a versatile and cost-effective solution for the robust evaluation of LLMs and text-to-image models. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14924 [pdf, other]

DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

Authors: Jia Syuen Lim, Zhuoxiao Chen, Mahsa Baktashmotlagh, Zhi Chen, Xin Yu, Zi Huang, Yadan Luo

Abstract: Class-agnostic object detection (OD) can be a cornerstone or a bottleneck for many downstream vision tasks. Despite considerable advancements in bottom-up and multi-object discovery methods that leverage basic visual cues to identify salient objects, consistently achieving a high recall rate remains difficult due to the diversity of object types and their contextual complexity. In this work, we in… ▽ More Class-agnostic object detection (OD) can be a cornerstone or a bottleneck for many downstream vision tasks. Despite considerable advancements in bottom-up and multi-object discovery methods that leverage basic visual cues to identify salient objects, consistently achieving a high recall rate remains difficult due to the diversity of object types and their contextual complexity. In this work, we investigate using vision-language models (VLMs) to enhance object detection via a self-supervised prompt learning strategy. Our initial findings indicate that manually crafted text queries often result in undetected objects, primarily because detection confidence diminishes when the query words exhibit semantic overlap. To address this, we propose a Dispersing Prompt Expansion (DiPEx) approach. DiPEx progressively learns to expand a set of distinct, non-overlapping hyperspherical prompts to enhance recall rates, thereby improving performance in downstream tasks such as out-of-distribution OD. Specifically, DiPEx initiates the process by self-training generic parent prompts and selecting the one with the highest semantic uncertainty for further expansion. The resulting child prompts are expected to inherit semantics from their parent prompts while capturing more fine-grained semantics. We apply dispersion losses to ensure high inter-class discrepancy among child prompts while preserving semantic consistency between parent-child prompt pairs. To prevent excessive growth of the prompt sets, we utilize the maximum angular coverage (MAC) of the semantic space as a criterion for early termination. We demonstrate the effectiveness of DiPEx through extensive class-agnostic OD and OOD-OD experiments on MS-COCO and LVIS, surpassing other prompting methods by up to 20.1% in AR and achieving a 21.3% AP improvement over SAM. The code is available at https://github.com/jason-lim26/DiPEx. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 19 pages

arXiv:2406.12721 [pdf]

Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4

Authors: Sang Won Son, Jongyeon Park, Hong Kook Kim, Sulaiman Vesal, Jeong Eun Lim

Abstract: In this report, we propose three novel methods for developing a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main de… ▽ More In this report, we propose three novel methods for developing a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main decoder, enhancing performance of the convolutional block during the initial training stages by assigning a different weight strategy between main and auxiliary decoder losses. Next, to address the time interval issue between the DESED and MAESTRO datasets, we propose maximum probability aggregation (MPA) during the training step. The proposed MPA method enables the model's output to be aligned with soft labels of 1 s in the MAESTRO dataset. Finally, we propose a multi-channel input feature that employs various versions of logmel and MFCC features to generate time-frequency pattern. The experimental results demonstrate the efficacy of these proposed methods in a view of improving SED performance by achieving a balanced enhancement across different datasets and label types. Ultimately, this approach presents a significant step forward in developing more robust and flexible SED models △ Less

Submitted 24 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: DCASE 2024 challenge Task4, 4 pages

arXiv:2406.10809 [pdf, other]

Post-hoc Utterance Refining Method by Entity Mining for Faithful Knowledge Grounded Conversations

Authors: Yoonna Jang, Suhyune Son, Jeongwoo Lee, Junyoung Son, Yuna Hur, Jungwoo Lim, Hyeonseok Moon, Kisu Yang, Heuiseok Lim

Abstract: Despite the striking advances in recent language generation performance, model-generated responses have suffered from the chronic problem of hallucinations that are either untrue or unfaithful to a given source. Especially in the task of knowledge grounded conversation, the models are required to generate informative responses, but hallucinated utterances lead to miscommunication. In particular, e… ▽ More Despite the striking advances in recent language generation performance, model-generated responses have suffered from the chronic problem of hallucinations that are either untrue or unfaithful to a given source. Especially in the task of knowledge grounded conversation, the models are required to generate informative responses, but hallucinated utterances lead to miscommunication. In particular, entity-level hallucination that causes critical misinformation and undesirable conversation is one of the major concerns. To address this issue, we propose a post-hoc refinement method called REM. It aims to enhance the quality and faithfulness of hallucinated utterances by refining them based on the source knowledge. If the generated utterance has a low source-faithfulness score with the given knowledge, REM mines the key entities in the knowledge and implicitly uses them for refining the utterances. We verify that our method reduces entity hallucination in the utterance. Also, we show the adaptability and efficacy of REM with extensive experiments and generative results. Our code is available at https://github.com/YOONNAJANG/REM. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: Accepted at EMNLP 2023

arXiv:2406.08537 [pdf, other]

A high-resolution view of the source-plane magnification near cluster caustics in wave dark matter models

Authors: Jose M. Diego, Alfred Amruth, Jose M. Palencia, Tom Broadhurst, Sung Kei Li, Jeremy Lim, Rogier A. Windhorst, Adi Zitrin, Alexei V. Filippenko, Liliya L. R. Williams, Ashish K. Meena, Wenlei Chen, Patrick L. Kelly

Abstract: We present the highest resolution images to date of caustics formed by wave dark matter ($ψ$DM) fluctuations near the critical curves of cluster gravitational lenses. We describe the basic magnification features of $ψ$DM in the source plane at high macromodel magnification and discuss specific differences between the $ψ$DM and standard cold dark matter (CDM) models. The unique generation of demagn… ▽ More We present the highest resolution images to date of caustics formed by wave dark matter ($ψ$DM) fluctuations near the critical curves of cluster gravitational lenses. We describe the basic magnification features of $ψ$DM in the source plane at high macromodel magnification and discuss specific differences between the $ψ$DM and standard cold dark matter (CDM) models. The unique generation of demagnified counterimages formed outside the Einstein radius for $ψ$DM is highlighted. Substructure in CDM cannot generate such demagnified images of positive parity, thus providing a definitive way to distinguish $ψ$DM from CDM. Highly magnified background sources with sizes $r\approx 1pc$, or approximately a factor of ten smaller than the expected de Broglie wavelength of $ψ$DM, offer the best possibility of discriminating between $ψ$DM and CDM. These include objects such as very compact stellar clusters at high redshift that JWST is finding in abundance. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 8 pages

arXiv:2406.08528 [pdf, other]

Adaptive Teaching with Shared Classifier for Knowledge Distillation

Authors: Jaeyeon Jang, Young-Ik Kim, Jisu Lim, Hyeonseong Lee

Abstract: Knowledge distillation (KD) is a technique used to transfer knowledge from an overparameterized teacher network to a less-parameterized student network, thereby minimizing the incurred performance loss. KD methods can be categorized into offline and online approaches. Offline KD leverages a powerful pretrained teacher network, while online KD allows the teacher network to be adjusted dynamically t… ▽ More Knowledge distillation (KD) is a technique used to transfer knowledge from an overparameterized teacher network to a less-parameterized student network, thereby minimizing the incurred performance loss. KD methods can be categorized into offline and online approaches. Offline KD leverages a powerful pretrained teacher network, while online KD allows the teacher network to be adjusted dynamically to enhance the learning effectiveness of the student network. Recently, it has been discovered that sharing the classifier of the teacher network can significantly boost the performance of the student network with only a minimal increase in the number of network parameters. Building on these insights, we propose adaptive teaching with a shared classifier (ATSC). In ATSC, the pretrained teacher network self-adjusts to better align with the learning needs of the student network based on its capabilities, and the student network benefits from the shared classifier, enhancing its performance. Additionally, we extend ATSC to environments with multiple teachers. We conduct extensive experiments, demonstrating the effectiveness of the proposed KD method. Our approach achieves state-of-the-art results on the CIFAR-100 and ImageNet datasets in both single-teacher and multiteacher scenarios, with only a modest increase in the number of required model parameters. The source code is publicly available at https://github.com/random2314235/ATSC. △ Less

Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.06367 [pdf, other]

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

Authors: Xuanyu Yi, Zike Wu, Qiuhong Shen, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Shuicheng Yan, Xinchao Wang, Hanwang Zhang

Abstract: Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-vi… ▽ More Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-view inconsistency and blurred textures. We attribute this to the compromise of multi-view information propagation in favor of adopting powerful yet computationally intensive architectures (e.g., Transformers). To address this issue, we introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor based on the RNN-like State Space Model (SSM). Our Gaussian reconstructor propagates causal context containing multi-view information for cross-view self-refinement while generating a long sequence of Gaussians for fine-detail modeling with linear complexity. With off-the-shelf multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts. Extensive experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1\times$ of the model size. △ Less

Submitted 20 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.05313 [pdf, other]

Traversing Mars: Cooperative Informative Path Planning to Efficiently Navigate Unknown Scenes

Authors: Friedrich M. Rockenbauer, Jaeyoung Lim, Marcus G. Müller, Roland Siegwart, Lukas Schmid

Abstract: The ability to traverse an unknown environment is crucial for autonomous robot operations. However, due to the limited sensing capabilities and system constraints, approaching this problem with a single robot agent can be slow, costly, and unsafe. For example, in planetary exploration missions, the wear on the wheels of a rover from abrasive terrain should be minimized at all costs as reparations… ▽ More The ability to traverse an unknown environment is crucial for autonomous robot operations. However, due to the limited sensing capabilities and system constraints, approaching this problem with a single robot agent can be slow, costly, and unsafe. For example, in planetary exploration missions, the wear on the wheels of a rover from abrasive terrain should be minimized at all costs as reparations are infeasible. On the other hand, utilizing a scouting robot such as a micro aerial vehicle (MAV) has the potential to reduce wear and time costs and increasing safety of a follower robot. This work proposes a novel cooperative IPP framework that allows a scout (e.g., an MAV) to efficiently explore the minimum-cost-path for a follower (e.g., a rover) to reach the goal. We derive theoretic guarantees for our algorithm, and prove that the algorithm always terminates, always finds the optimal path if it exists, and terminates early when the found path is shown to be optimal or infeasible. We show in thorough experimental evaluation that the guarantees hold in practice, and that our algorithm is 22.5% quicker to find the optimal path and 15% quicker to terminate compared to existing methods. △ Less

Submitted 12 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

Comments: 8 pages, 9 figures, code will be available at https://github.com/ethz-asl/scouting-ipp

arXiv:2406.01915 [pdf, other]

Enhancing Human-Robot Collaborative Assembly in Manufacturing Systems Using Large Language Models

Authors: Jonghan Lim, Sujani Patel, Alex Evans, John Pimley, Yifei Li, Ilya Kovalenko

Abstract: The development of human-robot collaboration has the ability to improve manufacturing system performance by leveraging the unique strengths of both humans and robots. On the shop floor, human operators contribute with their adaptability and flexibility in dynamic situations, while robots provide precision and the ability to perform repetitive tasks. However, the communication gap between human ope… ▽ More The development of human-robot collaboration has the ability to improve manufacturing system performance by leveraging the unique strengths of both humans and robots. On the shop floor, human operators contribute with their adaptability and flexibility in dynamic situations, while robots provide precision and the ability to perform repetitive tasks. However, the communication gap between human operators and robots limits the collaboration and coordination of human-robot teams in manufacturing systems. Our research presents a human-robot collaborative assembly framework that utilizes a large language model for enhancing communication in manufacturing environments. The framework facilitates human-robot communication by integrating voice commands through natural language for task management. A case study for an assembly task demonstrates the framework's ability to process natural language inputs and address real-time assembly challenges, emphasizing adaptability to language variation and efficiency in error resolution. The results suggest that large language models have the potential to improve human-robot interaction for collaborative manufacturing assembly applications. △ Less

Submitted 21 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01893 [pdf, other]

Large Language Model-Enabled Multi-Agent Manufacturing Systems

Authors: Jonghan Lim, Birgit Vogel-Heuser, Ilya Kovalenko

Abstract: Traditional manufacturing faces challenges adapting to dynamic environments and quickly responding to manufacturing changes. The use of multi-agent systems has improved adaptability and coordination but requires further advancements in rapid human instruction comprehension, operational adaptability, and coordination through natural language integration. Large language models like GPT-3.5 and GPT-4… ▽ More Traditional manufacturing faces challenges adapting to dynamic environments and quickly responding to manufacturing changes. The use of multi-agent systems has improved adaptability and coordination but requires further advancements in rapid human instruction comprehension, operational adaptability, and coordination through natural language integration. Large language models like GPT-3.5 and GPT-4 enhance multi-agent manufacturing systems by enabling agents to communicate in natural language and interpret human instructions for decision-making. This research introduces a novel framework where large language models enhance the capabilities of agents in manufacturing, making them more adaptable, and capable of processing context-specific instructions. A case study demonstrates the practical application of this framework, showing how agents can effectively communicate, understand tasks, and execute manufacturing processes, including precise G-code allocation among agents. The findings highlight the importance of continuous large language model integration into multi-agent manufacturing systems and the development of sophisticated agent communication protocols for a more flexible manufacturing system. △ Less

Submitted 21 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.20602 [pdf, other]

Masked Language Modeling Becomes Conditional Density Estimation for Tabular Data Synthesis

Authors: Seunghwan An, Gyeongdong Woo, Jaesung Lim, ChangHyun Kim, Sungchul Hong, Jong-June Jeon

Abstract: In this paper, our goal is to generate synthetic data for heterogeneous (mixed-type) tabular datasets with high machine learning utility (MLu). Given that the MLu performance relies on accurately approximating the conditional distributions, we focus on devising a synthetic data generation method based on conditional distribution estimation. We propose a novel synthetic data generation method, MaCo… ▽ More In this paper, our goal is to generate synthetic data for heterogeneous (mixed-type) tabular datasets with high machine learning utility (MLu). Given that the MLu performance relies on accurately approximating the conditional distributions, we focus on devising a synthetic data generation method based on conditional distribution estimation. We propose a novel synthetic data generation method, MaCoDE, by redefining the multi-class classification task of Masked Language Modeling (MLM) as histogram-based non-parametric conditional density estimation. Our proposed method enables estimating conditional densities across arbitrary combinations of target and conditional variables. Furthermore, we demonstrate that our proposed method bridges the theoretical gap between distributional learning and MLM. To validate the effectiveness of our proposed model, we conduct synthetic data generation experiments on 10 real-world datasets. Given the analogy between predicting masked input tokens in MLM and missing data imputation, we also evaluate the performance of multiple imputations on incomplete datasets with various missing data mechanisms. Moreover, our proposed model offers the advantage of enabling adjustments to data privacy levels without requiring re-training. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19422 [pdf, other]

Dark Matter distinguished by skewed microlensing in the "Dragon Arc"

Authors: Tom Broadhurst, Sung Kei Li, Amruth Alfred, Jose M. Diego, Paloma Morilla, Patrick L. Kelly, Fengwu Sun, Masamune Oguri, Hayley Williams, Rogier Windhorst, Adi Zitrin, Katsuya T. Abe, Wenlei Chen, Yoshinobu Fudamoto, Hiroki Kawai, Jeremy Lim, Tao Liu, Ashish K. Meena, Jose M. Palencia, George F. Smoot, Liliya L. R. Williams

Abstract: Microlensed stars recently discovered by JWST & HST follow closely the winding critical curve of A370 along all sections of the ``Dragon Arc" traversed by the critical curve. These transients are fainter than $m_{AB}>26.5$, corresponding to the Asymptotic Giant Branch (AGB) and microlensed by diffuse cluster stars observed with $\simeq 18M_\odot/pc^2$, or about $\simeq 1$\% of the projected dark m… ▽ More Microlensed stars recently discovered by JWST & HST follow closely the winding critical curve of A370 along all sections of the ``Dragon Arc" traversed by the critical curve. These transients are fainter than $m_{AB}>26.5$, corresponding to the Asymptotic Giant Branch (AGB) and microlensed by diffuse cluster stars observed with $\simeq 18M_\odot/pc^2$, or about $\simeq 1$\% of the projected dark matter density. Most microlensed stars appear along the inner edge of the critical curve, following an asymmetric band of width $\simeq 4$kpc that is skewed by $-0.7\pm0.2$kpc. Some skewness is expected as the most magnified images should form along the inner edge of the critical curve with negative parity, but the predicted shift is small $\simeq -0.04$kpc and the band of predicted detections is narrow, $\simeq 1.4$kpc. Adding CDM-like dark halos of $10^{6-8}M_\odot$ broadens the band as desired but favours detections along the outer edge of the critical curve, in the wrong direction, where sub-halos generate local Einstein rings. Instead, the interference inherent to ``Wave Dark Matter" as a Bose-Einstein condensate ($ψ$DM) forms a symmetric band of critical curves that favours negative parity detections. A de Broglie wavelength of $\simeq 10$pc matches well the observed $4$kpc band of microlenses and predicts negative skewness $\simeq -0.6$kpc, similar to the data. The implied corresponding boson mass is $\simeq 10^{-22}$eV, in good agreement with estimates from dwarf galaxy cores when scaled by momentum. Further JWST imaging may reveal the pattern of critical curves by simply ``joining the dots" between microlensed stars, allowing wave corrugations of $ψ$DM to be distinguished from CDM sub-halos △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 12 pages, 5 figures

arXiv:2405.18027 [pdf, other]

TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models

Authors: Jaewoo Ahn, Taehyun Lee, Junyoung Lim, Jin-Hwa Kim, Sangdoo Yun, Hwaran Lee, Gunhee Kim

Abstract: While Large Language Models (LLMs) can serve as agents to simulate human behaviors (i.e., role-playing agents), we emphasize the importance of point-in-time role-playing. This situates characters at specific moments in the narrative progression for three main reasons: (i) enhancing users' narrative immersion, (ii) avoiding spoilers, and (iii) fostering engagement in fandom role-playing. To accurat… ▽ More While Large Language Models (LLMs) can serve as agents to simulate human behaviors (i.e., role-playing agents), we emphasize the importance of point-in-time role-playing. This situates characters at specific moments in the narrative progression for three main reasons: (i) enhancing users' narrative immersion, (ii) avoiding spoilers, and (iii) fostering engagement in fandom role-playing. To accurately represent characters at specific time points, agents must avoid character hallucination, where they display knowledge that contradicts their characters' identities and historical timelines. We introduce TimeChara, a new benchmark designed to evaluate point-in-time character hallucination in role-playing LLMs. Comprising 10,895 instances generated through an automated pipeline, this benchmark reveals significant hallucination issues in current state-of-the-art LLMs (e.g., GPT-4o). To counter this challenge, we propose Narrative-Experts, a method that decomposes the reasoning steps and utilizes narrative experts to reduce point-in-time character hallucinations effectively. Still, our findings with TimeChara highlight the ongoing challenges of point-in-time character hallucination, calling for further study. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: ACL 2024 Findings. Code and dataset are released at https://ahnjaewoo.github.io/timechara

arXiv:2405.16496 [pdf, other]

Exploring a Multimodal Fusion-based Deep Learning Network for Detecting Facial Palsy

Authors: Nicole Heng Yim Oo, Min Hun Lee, Jeong Hoon Lim

Abstract: Algorithmic detection of facial palsy offers the potential to improve current practices, which usually involve labor-intensive and subjective assessment by clinicians. In this paper, we present a multimodal fusion-based deep learning model that utilizes unstructured data (i.e. an image frame with facial line segments) and structured data (i.e. features of facial expressions) to detect facial palsy… ▽ More Algorithmic detection of facial palsy offers the potential to improve current practices, which usually involve labor-intensive and subjective assessment by clinicians. In this paper, we present a multimodal fusion-based deep learning model that utilizes unstructured data (i.e. an image frame with facial line segments) and structured data (i.e. features of facial expressions) to detect facial palsy. We then contribute to a study to analyze the effect of different data modalities and the benefits of a multimodal fusion-based approach using videos of 21 facial palsy patients. Our experimental results show that among various data modalities (i.e. unstructured data - RGB images and images of facial line segments and structured data - coordinates of facial landmarks and features of facial expressions), the feed-forward neural network using features of facial expression achieved the highest precision of 76.22 while the ResNet-based model using images of facial line segments achieved the highest recall of 83.47. When we leveraged both images of facial line segments and features of facial expressions, our multimodal fusion-based deep learning model slightly improved the precision score to 77.05 at the expense of a decrease in the recall score. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.14155 [pdf]

Room-temperature waveguide-integrated photodetector using bolometric effect for mid-infrared spectroscopy applications

Authors: Joonsup Shim, Jinha Lim, Inki Kim, Jaeyong Jeong, Bong Ho Kim, Seong Kwang Kim, Dae-Myeong Geum, SangHyeon Kim

Abstract: Waveguide-integrated mid-infrared (MIR) photodetectors are pivotal components for developing molecular spectroscopy applications, leveraging mature photonic integrated circuit (PIC) technologies. Despite various strategies, critical challenges still remain in achieving broadband photoresponse, cooling-free operation, and large-scale complementary-metal-oxide-semiconductor (CMOS)-compatible manufac… ▽ More Waveguide-integrated mid-infrared (MIR) photodetectors are pivotal components for developing molecular spectroscopy applications, leveraging mature photonic integrated circuit (PIC) technologies. Despite various strategies, critical challenges still remain in achieving broadband photoresponse, cooling-free operation, and large-scale complementary-metal-oxide-semiconductor (CMOS)-compatible manufacturability. To leap beyond these limitations, the bolometric effect - a thermal detection mechanism - is introduced into the waveguide platform. More importantly, we pursue a free-carrier absorption (FCA) process in germanium (Ge) to create an efficient light-absorbing medium, providing a pragmatic solution for full coverage of the MIR spectrum without incorporating exotic materials into CMOS. Here, we present an uncooled waveguide-integrated photodetector based on a Ge-on-insulator (Ge-OI) PIC architecture, exploiting the bolometric effect combined with FCA. Notably, our device exhibits a broadband responsivity of ~12 mA/W across 4030-4360 nm (and potentially beyond), challenging the state of the art, while achieving a noise-equivalent power of 3.4x10^-9 W/Hz^0.5 at 4180 nm. We further demonstrate label-free sensing of carbon dioxide using our integrated photodetector and sensing waveguide on a single chip. This approach to room-temperature waveguide-integrated MIR photodetection, harnessing bolometry with FCA in Ge, not only facilitates the realization of fully integrated lab-on-a-chip systems with wavelength flexibility but also provides a blueprint for MIR PICs with CMOS-foundry-compatibility. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 6 figures for the main manuscript and 14 figures for the supplementary information

arXiv:2405.12538 [pdf, other]

Bridging the Intent Gap: Knowledge-Enhanced Visual Generation

Authors: Yi Cheng, Ziwei Xu, Dongyun Lin, Harry Cheng, Yongkang Wong, Ying Sun, Joo Hwee Lim, Mohan Kankanhalli

Abstract: For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leadi… ▽ More For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leading to a mismatch between the desired and generated output. Second, generative models trained on visual-label pairs lack the comprehensive knowledge to accurately represent all aspects of the input data in their generated outputs. To address these challenges, we propose a knowledge-enhanced iterative refinement framework for visual content generation. We begin by analyzing and identifying the key challenges faced by existing generative models. Then, we introduce various knowledge sources, including human insights, pre-trained models, logic rules, and world knowledge, which can be leveraged to address these challenges. Furthermore, we propose a novel visual generation framework that incorporates a knowledge-based feedback module to iteratively refine the generation process. This module gradually improves the alignment between the generated content and user intentions. We demonstrate the efficacy of the proposed framework through preliminary results, highlighting the potential of knowledge-enhanced generative models for intention-aligned content generation. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.11354 [pdf, ps, other]

On differences of two harmonic numbers

Authors: Jeck Lim, Stefan Steinerberger

Abstract: We prove that the existence of infinitely many $(m_k, n_k) \in \mathbb{N}^2$ such that the difference of harmonic numbers $H_{m_k} - H_{n_k}$ approximates 1 well $$ \lim_{k \rightarrow \infty} \left| \sum_{\ell = n}^{m_k} \frac{1}{\ell} - 1 \right|\cdot n_k^2 = 0.$$ This answers a question of Erdős and Graham. The construction uses asymptotics for harmonic numbers, the precise nature of the contin… ▽ More We prove that the existence of infinitely many $(m_k, n_k) \in \mathbb{N}^2$ such that the difference of harmonic numbers $H_{m_k} - H_{n_k}$ approximates 1 well $$ \lim_{k \rightarrow \infty} \left| \sum_{\ell = n}^{m_k} \frac{1}{\ell} - 1 \right|\cdot n_k^2 = 0.$$ This answers a question of Erdős and Graham. The construction uses asymptotics for harmonic numbers, the precise nature of the continued fraction expansion of $e$ and a suitable rescaling of a subsequence of convergents. We also prove a quantitative rate by appealing to techniques of Heilbronn, Danicic, Harman, Hooley and others regarding $\min_{1 \leq n \leq N} \min_{m \in \mathbb{N}}\| n^2 θ- m\|$. △ Less

Submitted 11 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.09093 [pdf, ps, other]

Line graphs and Nordhaus-Gaddum-type bounds for self-loop graphs

Authors: Saieed Akbari, Irena M. Jovanović, Johnny Lim

Abstract: Let $G_S$ be the graph obtained by attaching a self-loop at every vertex in $S \subseteq V(G)$ of a simple graph $G$ of order $n.$ In this paper, we explore several new results related to the line graph $L(G_S)$ of $G_S.$ Particularly, we show that every eigenvalue of $L(G_S)$ must be at least $-2,$ and relate the characteristic polynomial of the line graph $L(G)$ of $G$ with the characteristic po… ▽ More Let $G_S$ be the graph obtained by attaching a self-loop at every vertex in $S \subseteq V(G)$ of a simple graph $G$ of order $n.$ In this paper, we explore several new results related to the line graph $L(G_S)$ of $G_S.$ Particularly, we show that every eigenvalue of $L(G_S)$ must be at least $-2,$ and relate the characteristic polynomial of the line graph $L(G)$ of $G$ with the characteristic polynomial of the line graph $L(\widehat{G})$ of a self-loop graph $\widehat{G}$, which is obtained by attaching a self-loop at each vertex of $G$. Then, we provide some new bounds for the eigenvalues and energy of $G_S.$ As one of the consequences, we obtain that the energy of a connected regular complete multipartite graph is not greater than the energy of the corresponding self-loop graph. Lastly, we establish a lower bound of the spectral radius in terms of the first Zagreb index $M_1(G)$ and the minimum degree $δ(G),$ as well as proving two Nordhaus-Gaddum-type bounds for the spectral radius and the energy of $G_S,$ respectively. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 19 pages. To appear in Bulletin of the Malaysian Mathematical Sciences Society

MSC Class: 05C50; 05C90; 05C92

arXiv:2405.04739 [pdf]

Pressure induced metallization and loss of surface magnetism in FeSi

Authors: Yuhang Deng, Farhad Taraporevala, Haozhe Wang, Eric Lee-Wong, Camilla M. Moir, Jinhyuk Lim, Shubham Sinha, Weiwei Xie, James Hamlin, Yogesh Vohra, M. Brian Maple

Abstract: Single crystalline FeSi samples with a conducting surface state (CSS) were studied under high pressure ($\textit{P}$) and magnetic field ($\textit{B}$) by means of electrical resistance ($\textit{R}$) measurements to explore how the bulk semiconducting state and the surface state are tuned by the application of pressure. We found that the energy gap ($Δ$) associated with the semiconducting bulk ph… ▽ More Single crystalline FeSi samples with a conducting surface state (CSS) were studied under high pressure ($\textit{P}$) and magnetic field ($\textit{B}$) by means of electrical resistance ($\textit{R}$) measurements to explore how the bulk semiconducting state and the surface state are tuned by the application of pressure. We found that the energy gap ($Δ$) associated with the semiconducting bulk phase begins to close abruptly at a critical pressure ($P_{cr}$) of ~10 GPa and the bulk material becomes metallic with no obvious sign of any emergent phases or non-Fermi liquid behavior in $\textit{R}$($\textit{T}$) in the neighborhood of $P_{cr}$ above 3 K. Moreover, the metallic phase appears to remain at near-ambient pressure upon release of the pressure. Interestingly, the hysteresis in the $\textit{R}$($\textit{T}$) curve associated with the magnetically ordered CSS decreases with pressure and vanishes at $P_{cr}$, while the slope of the $\textit{R}$($\textit{B}$) curve, d$\textit{R}$/d$\textit{B}$, which has a negative value for $\textit{P}$ < $P_{cr}$, decreases in magnitude with $\textit{P}$ and changes sign at $P_{cr}$. Thus, the CSS and the corresponding two-dimensional magnetic order collapse at $P_{cr}$ where the energy gap $Δ$ of the bulk material starts to close abruptly, revealing the connection between the CSS and the semiconducting bulk state in FeSi. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.02011 [pdf, other]

Autonomous Active Mapping in Steep Alpine Environments with Fixed-wing Aerial Vehicles

Authors: Jaeyoung Lim, Florian Achermann, Nicholas Lawrance, Roland Siegwart

Abstract: Monitoring large scale environments is a crucial task for managing remote alpine environments, especially for hazardous events such as avalanches. One key information for avalanche risk forecast is imagery of released avalanches. As these happen in remote and potentially dangerous locations this data is difficult to obtain. Fixed-wing vehicles, due to their long range and travel speeds are a promi… ▽ More Monitoring large scale environments is a crucial task for managing remote alpine environments, especially for hazardous events such as avalanches. One key information for avalanche risk forecast is imagery of released avalanches. As these happen in remote and potentially dangerous locations this data is difficult to obtain. Fixed-wing vehicles, due to their long range and travel speeds are a promising platform to gather aerial imagery to map avalanche activities. However, operating such vehicles in mountainous terrain remains a challenge due to the complex topography, regulations, and uncertain environment. In this work, we present a system that is capable of safely navigating and mapping an avalanche using a fixed-wing aerial system and discuss the challenges arising when executing such a mission. We show in our field experiments that we can effectively navigate in steep terrain environments while maximizing the map quality. We expect our work to enable more autonomous operations of fixed-wing vehicles in alpine environments to maximize the quality of the data gathered. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 8 pages, 8 figures, Accepted to the IEEE ICRA Workshop on Field Robotics 2024

arXiv:2404.16504 [pdf]

Hardware Implementation of Double Pendulum Pseudo Random Number Generator

Authors: Jarrod Lim, Tom Manuel Opalla Piccio, Chua Min Jie Michelle, Maoyang Xiang, T. Hui Teo

Abstract: The objective of this project is to utilize an FPGA board which is the CMOD A7 35t to obtain a pseudo random number which can be used for encryption. We aim to achieve this by leveraging the inherent randomness present in environmental data captured by sensors. This data will be used as a seed to initialize an algorithm implemented on the CMOD A7 35t FPGA board. The project will focus on interfaci… ▽ More The objective of this project is to utilize an FPGA board which is the CMOD A7 35t to obtain a pseudo random number which can be used for encryption. We aim to achieve this by leveraging the inherent randomness present in environmental data captured by sensors. This data will be used as a seed to initialize an algorithm implemented on the CMOD A7 35t FPGA board. The project will focus on interfacing the sensors with the FPGA and developing suitable algorithms to ensure the generated numbers exhibit strong randomness properties. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 15 pages, 12 figure

arXiv:2404.16133 [pdf]

Quantitative Characterization of Retinal Features in Translated OCTA

Authors: Rashadul Hasan Badhon, Atalie Carina Thompson, Jennifer I. Lim, Theodore Leng, Minhaj Nur Alam

Abstract: Purpose: This study explores the feasibility of using generative machine learning (ML) to translate Optical Coherence Tomography (OCT) images into Optical Coherence Tomography Angiography (OCTA) images, potentially bypassing the need for specialized OCTA hardware. Methods: The method involved implementing a generative adversarial network framework that includes a 2D vascular segmentation model and… ▽ More Purpose: This study explores the feasibility of using generative machine learning (ML) to translate Optical Coherence Tomography (OCT) images into Optical Coherence Tomography Angiography (OCTA) images, potentially bypassing the need for specialized OCTA hardware. Methods: The method involved implementing a generative adversarial network framework that includes a 2D vascular segmentation model and a 2D OCTA image translation model. The study utilizes a public dataset of 500 patients, divided into subsets based on resolution and disease status, to validate the quality of TR-OCTA images. The validation employs several quality and quantitative metrics to compare the translated images with ground truth OCTAs (GT-OCTA). We then quantitatively characterize vascular features generated in TR-OCTAs with GT-OCTAs to assess the feasibility of using TR-OCTA for objective disease diagnosis. Result: TR-OCTAs showed high image quality in both 3 and 6 mm datasets (high-resolution, moderate structural similarity and contrast quality compared to GT-OCTAs). There were slight discrepancies in vascular metrics, especially in diseased patients. Blood vessel features like tortuosity and vessel perimeter index showed a better trend compared to density features which are affected by local vascular distortions. Conclusion: This study presents a promising solution to the limitations of OCTA adoption in clinical practice by using vascular features from TR-OCTA for disease detection. Translation relevance: This study has the potential to significantly enhance the diagnostic process for retinal diseases by making detailed vascular imaging more widely available and reducing dependency on costly OCTA equipment. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: The article has been revised and edited

arXiv:2404.14957 [pdf, other]

doi 10.1126/sciadv.adm9563

Strongly correlated multi-electron bunches from interaction with quantum light

Authors: Suraj Kumar, Jeremy Lim, Nicholas Rivera, Wesley Wong, Yee Sin Ang, Lay Kee Ang, Liang Jie Wong

Abstract: Strongly correlated electron systems are a cornerstone of modern physics, being responsible for groundbreaking phenomena from superconducting magnets to quantum computing. In most cases, correlations in electrons arise exclusively due to Coulomb interactions. In this work, we reveal that free electrons interacting simultaneously with a light field can become highly correlated via mechanisms beyond… ▽ More Strongly correlated electron systems are a cornerstone of modern physics, being responsible for groundbreaking phenomena from superconducting magnets to quantum computing. In most cases, correlations in electrons arise exclusively due to Coulomb interactions. In this work, we reveal that free electrons interacting simultaneously with a light field can become highly correlated via mechanisms beyond Coulomb interactions. In the case of two electrons, the resulting Pearson correlation coefficient (PCC) for the joint probability distribution of the output electron energies is enhanced over 13 orders of magnitude compared to that of electrons interacting with the light field in succession (one after another). These highly correlated electrons are the result of momentum and energy exchange between the participating electrons via the external quantum light field. Our findings pave the way to the creation and control of highly correlated free electrons for applications including quantum information and ultra-fast imaging. △ Less

Submitted 13 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: 3 figures for Main Text, 4 figures for Supplementary Materials, Supplementary is available at end of Main Text figures

arXiv:2404.08847 [pdf, other]

doi 10.1145/3620665.3640384

LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models

Authors: Juntaek Lim, Youngeun Kwon, Ranggi Hwang, Kiwan Maeng, G. Edward Suh, Minsoo Rhu

Abstract: Differential privacy (DP) is widely being employed in the industry as a practical standard for privacy protection. While private training of computer vision or natural language processing applications has been studied extensively, the computational challenges of training of recommender systems (RecSys) with DP have not been explored. In this work, we first present our detailed characterization of… ▽ More Differential privacy (DP) is widely being employed in the industry as a practical standard for privacy protection. While private training of computer vision or natural language processing applications has been studied extensively, the computational challenges of training of recommender systems (RecSys) with DP have not been explored. In this work, we first present our detailed characterization of private RecSys training using DP-SGD, root-causing its several performance bottlenecks. Specifically, we identify DP-SGD's noise sampling and noisy gradient update stage to suffer from a severe compute and memory bandwidth limitation, respectively, causing significant performance overhead in training private RecSys. Based on these findings, we propose LazyDP, an algorithm-software co-design that addresses the compute and memory challenges of training RecSys with DP-SGD. Compared to a state-of-the-art DP-SGD training system, we demonstrate that LazyDP provides an average 119x training throughput improvement while also ensuring mathematically equivalent, differentially private RecSys models to be trained. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Journal ref: Published at 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-29), 2024

arXiv:2404.08571 [pdf, other]

Flashlights: Microlensing vs Stellar Variability of Transients in the Star Clusters of the Dragon Arc

Authors: Sung Kei Li, Patrick L. Kelly, Jose M. Diego, Jeremy Lim, WenLei Chen, Amruth Alfred, Liliya L. R. Williams, Thomas J. Broadhurst, Ashish. K. Meena, Adi Zitrin, Alex Chow

Abstract: We study the nature of transient events detected in the "Dragon Arc", a star-forming galaxy at a redshift of $0.7251$ that is gravitationally lensed by the galaxy cluster Abell 370. In particular, we focus on a subset of ten transients that are identified as unresolved young star clusters in the deep broadband, F200LP, taken as part of the "Flashlights" Hubble Space Telescope program, showing flux… ▽ More We study the nature of transient events detected in the "Dragon Arc", a star-forming galaxy at a redshift of $0.7251$ that is gravitationally lensed by the galaxy cluster Abell 370. In particular, we focus on a subset of ten transients that are identified as unresolved young star clusters in the deep broadband, F200LP, taken as part of the "Flashlights" Hubble Space Telescope program, showing flux variations of $\sim 10-20\%$ over a period of about a year. Here we develop several methods to address whether stellar microlensing alone is capable of explaining the transients, or whether intrinsic stellar outbursts or variability are required to explain them. We first present a lens model that has new constraints in the Dragon Arc itself to understand the properties of the lensed young star clusters. Using our improved galaxy-cluster lens model, we simulate the effect of microlensing on the flux variation for unresolved stars within lensed young star clusters. We find good agreement between the observed and the expected detection rates of microlensing events by intracluster stars of young star clusters within $1σ$. However, we cannot fully exclude the possibility that a minority of these transients are caused by intrinsic stellar variability such as outbursts of Luminous Blue Variables (LBVs). With JWST observations taken recently or coming in the near future, the color information will be able to break the degeneracy and definitively test whether or not these lensed young star cluster transients are caused by stellar microlensing. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 28 pages, 14 figures. To be submitted, comments welcomed

arXiv:2404.08033 [pdf, other]

Imaging dark matter at the smallest scales with $z\approx1$ lensed stars

Authors: J. M. Diego, Sung Kei Li, Alfred Amruth, Ashish K. Meena, Tom J. Broadhurst, Patrick L. Kelly, Alexei V. Filippenko, Liliya L. R. Williams, Adi Zitrin, William E. Harris, Marta Reina-Campos, Carlo Giocoli, Liang Dai, Mitchell F. Struble, Tommaso Treu, Yoshinobu Fudamoto, Daniel Gilman, Anton M. Koekemoer, Jeremy Lim, J. M. Palencia, Fengwu Sun, Rogier A. Windhorst

Abstract: Observations of caustic-crossing galaxies at redshift $0.7<z<1$ show a wealth of transient events. Most of them are believed to be microlensing events of highly magnified stars. Earlier work predicted such events should be common near the critical curves (CCs) of galaxy clusters, but some are found relatively far away from these CCs. We consider the possibility that substructure on milliarcsecond… ▽ More Observations of caustic-crossing galaxies at redshift $0.7<z<1$ show a wealth of transient events. Most of them are believed to be microlensing events of highly magnified stars. Earlier work predicted such events should be common near the critical curves (CCs) of galaxy clusters, but some are found relatively far away from these CCs. We consider the possibility that substructure on milliarcsecond scales (few parsecs in the lens plane) is boosting the microlensing signal. We study the combined magnification from the macrolens, millilenses, and microlenses (3M-lensing). After considering realistic populations of millilenses and microlenses, we conclude that the enhanced microlensing rate around millilenses is not sufficient to explain the high fraction of observed events in the far region. Instead we find a that the shape of the luminosity function (LF) of the lensed stars combined with the amount of substructure in the lens plane determines the number of mcirolensing events found near and far from the CC. By measuring $β$ (the exponent of the LF), and the number density of microlensing events at each location, one can create a pseudoimage of the underlying distribution of mass on small scales. We identify two regimes: (i) positive imaging regime where $β>2$ and the number density of events is greater around substructureand the number density of events is greater around substructures, and (ii) negative imaging regime where $β<2$. We study the particular case of seven microlensing events found by HST in the Dragon arc (at z=0.725). We find that a population of supergiant stars with a steep LF with $β=2.55$ fits the distribution of these events. We identify a small region of high density of microlensing events, and interpret it as evidence of a possible invisible substructure, for which we derive a mass of $\sim 1.3 \times 10^8\,\Msun$ (within its Einstein radius). △ Less

Submitted 22 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: 23 pages, 16 figures

arXiv:2404.04562 [pdf, other]

Diffusion Time-step Curriculum for One Image to 3D Generation

Authors: Xuanyu Yi, Zike Wu, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Hanwang Zhang

Abstract: Score distillation sampling~(SDS) has been widely adopted to overcome the absence of unseen views in reconstructing 3D objects from a \textbf{single} image. It leverages pre-trained 2D diffusion models as teacher to guide the reconstruction of student 3D models. Despite their remarkable success, SDS-based methods often encounter geometric artifacts and texture saturation. We find out the crux is t… ▽ More Score distillation sampling~(SDS) has been widely adopted to overcome the absence of unseen views in reconstructing 3D objects from a \textbf{single} image. It leverages pre-trained 2D diffusion models as teacher to guide the reconstruction of student 3D models. Despite their remarkable success, SDS-based methods often encounter geometric artifacts and texture saturation. We find out the crux is the overlooked indiscriminate treatment of diffusion time-steps during optimization: it unreasonably treats the student-teacher knowledge distillation to be equal at all time-steps and thus entangles coarse-grained and fine-grained modeling. Therefore, we propose the Diffusion Time-step Curriculum one-image-to-3D pipeline (DTC123), which involves both the teacher and student models collaborating with the time-step curriculum in a coarse-to-fine manner. Extensive experiments on NeRF4, RealFusion15, GSO and Level50 benchmark demonstrate that DTC123 can produce multi-view consistent, high-quality, and diverse 3D assets. Codes and more generation demos will be released in https://github.com/yxymessi/DTC123. △ Less

Submitted 2 May, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024

arXiv:2404.04171 [pdf]

Directed Aggregation of Cellulose Nanocrystals to Enhance Chiral Twist

Authors: Kévin Ballu, Jia-Hui Lim, Thomas G. Parton, Richard M. Parker, Bruno Frka-Petesic, Alexei A. Lapkin, Yu Ogawa, Silvia Vignolini

Abstract: Cellulose nanocrystals (CNCs) are bioderived nanoparticles that can be isolated from any source of natural cellulose via sulfuric acid hydrolysis. Arising from a combination of the negatively-charged sulfate half-ester groups grafted during this process and their elongated morphology, CNCs typically form colloidal cholesteric liquid crystalline phases in aqueous suspension. Recently, the chiral st… ▽ More Cellulose nanocrystals (CNCs) are bioderived nanoparticles that can be isolated from any source of natural cellulose via sulfuric acid hydrolysis. Arising from a combination of the negatively-charged sulfate half-ester groups grafted during this process and their elongated morphology, CNCs typically form colloidal cholesteric liquid crystalline phases in aqueous suspension. Recently, the chiral strength of such a CNC mesophase was correlated to the presence of CNCs with a 'bundle' morphology, analogous to the case of chiral dopants in molecular liquid crystal systems. This indicates the central role these composite particles play in the chiral behavior of CNCs, however the origin and formation pathway of the CNC bundles remains elusive. In this study, we systematically explore how different post-hydrolysis treatments alter the morphology of the CNCs (using electron microscopy, viscosimetry, and electron diffraction) and correlate this to changes in the observed liquid crystalline behavior. We found that the centrifugation step applied during CNC purification favors the formation of bundles of aligned crystallites, attached preferentially on their hydrophobic faces. This is in stark contrast to ionic treatments, where uncontrolled aggregation dominates. This reveals the importance of these often-disregarded purification steps on the final chiral and liquid crystalline properties of CNCs and promotes routes to tailor them towards a variety of applications. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.02077 [pdf, other]

Energy-Optimized Planning in Non-Uniform Wind Fields with Fixed-Wing Aerial Vehicles

Authors: Yufei Duan, Florian Achermann, Jaeyoung Lim, Roland Siegwart

Abstract: Fixed-wing small uncrewed aerial vehicles (sUAVs) possess the capability to remain airborne for extended durations and traverse vast distances. However, their operation is susceptible to wind conditions, particularly in regions of complex terrain where high wind speeds may push the aircraft beyond its operational limitations, potentially raising safety concerns. Moreover, wind impacts the energy r… ▽ More Fixed-wing small uncrewed aerial vehicles (sUAVs) possess the capability to remain airborne for extended durations and traverse vast distances. However, their operation is susceptible to wind conditions, particularly in regions of complex terrain where high wind speeds may push the aircraft beyond its operational limitations, potentially raising safety concerns. Moreover, wind impacts the energy required to follow a path, especially in locations where the wind direction and speed are not favorable. Incorporating wind information into mission planning is essential to ensure both safety and energy efficiency. In this paper, we propose a sampling-based planner using the kinematic Dubins aircraft paths with respect to the ground, to plan energy-efficient paths in non-uniform wind fields. We study the planner characteristics with synthetic and real-world wind data and compare its performance against baseline cost and path formulations. We demonstrate that the energy-optimized planner effectively utilizes updrafts to minimize energy consumption, albeit at the expense of increased travel time. The ground-relative path formulation facilitates the generation of safe trajectories onboard sUAVs within reasonable computational timeframes. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.00123 [pdf, other]

SURESTEP: An Uncertainty-Aware Trajectory Optimization Framework to Enhance Visual Tool Tracking for Robust Surgical Automation

Authors: Nikhil U. Shinde, Zih-Yun Chiu, Florian Richter, Jason Lim, Yuheng Zhi, Sylvia Herbert, Michael C. Yip

Abstract: Inaccurate tool localization is one of the main reasons for failures in automating surgical tasks. Imprecise robot kinematics and noisy observations caused by the poor visual acuity of an endoscopic camera make tool tracking challenging. Previous works in surgical automation adopt environment-specific setups or hard-coded strategies instead of explicitly considering motion and observation uncertai… ▽ More Inaccurate tool localization is one of the main reasons for failures in automating surgical tasks. Imprecise robot kinematics and noisy observations caused by the poor visual acuity of an endoscopic camera make tool tracking challenging. Previous works in surgical automation adopt environment-specific setups or hard-coded strategies instead of explicitly considering motion and observation uncertainty of tool tracking in their policies. In this work, we present SURESTEP, an uncertainty-aware trajectory optimization framework for robust surgical automation. We model the uncertainty of tool tracking with the components motivated by the sources of noise in typical surgical scenes. Using a Gaussian assumption to propagate our uncertainty models through a given tool trajectory, SURESTEP provides a general framework that minimizes the upper bound on the entropy of the final estimated tool distribution. We compare SURESTEP with a baseline method on a real-world suture needle regrasping task under challenging environmental conditions, such as poor lighting and a moving endoscopic camera. The results over 60 regrasps on the da Vinci Research Kit (dVRK) demonstrate that our optimized trajectories significantly outperform the un-optimized baseline. △ Less

Submitted 29 March, 2024; originally announced April 2024.

arXiv:2403.19102 [pdf, other]

Automatic Fingerpad Customization for Precise and Stable Grasping of 3D-Print Parts

Authors: Joyce Xin-Yan Lim, Quang-Cuong Pham

Abstract: The rise in additive manufacturing comes with unique opportunities and challenges. Massive part customization and rapid design changes are made possible with additive manufacturing, however, manufacturing industries that desire the implementation of robotics automation to improve production efficiency could face challenges in the gripper design and grasp planning due to highly complex geometrical… ▽ More The rise in additive manufacturing comes with unique opportunities and challenges. Massive part customization and rapid design changes are made possible with additive manufacturing, however, manufacturing industries that desire the implementation of robotics automation to improve production efficiency could face challenges in the gripper design and grasp planning due to highly complex geometrical shapes resulting from massive part customization. Yet, current gripper design for such objects are often manual and rely on ad-hoc design intuition. This would be limiting as such grippers would lack the ability to grasp different objects or grasp points, which is important for practical implementations. Hence, we introduce a fast, end-to-end approach to customize rigid gripper fingerpads that could achieve precise and stable grasping for different objects at multiple grasp points. Our approach relies on two key components: (i) a method based on set Boolean operations, e.g. intersections, subtractions, and unions to extract object features and synthesize gripper surfaces that conform to different local shapes to form caging grasps; (ii) a method to evaluate the grasp quality of synthesized grippers. We experimentally demonstrate the validity of our approach by synthesizing fingerpads that, once mounted on a physical robot gripper, are able to grasp different objects at multiple grasp points, all with tightly constrained grasps. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.16509 [pdf, other]

Human Understanding AI Paper Challenge 2024 -- Dataset Design

Authors: Se Won Oh, Hyuntae Jeong, Jeong Mook Lim, Seungeun Chung, Kyoung Ju Noh

Abstract: In 2024, we will hold a research paper competition (the third Human Understanding AI Paper Challenge) for the research and development of artificial intelligence technologies to understand human daily life. This document introduces the datasets that will be provided to participants in the competition, and summarizes the issues to consider in data processing and learning model development. In 2024, we will hold a research paper competition (the third Human Understanding AI Paper Challenge) for the research and development of artificial intelligence technologies to understand human daily life. This document introduces the datasets that will be provided to participants in the competition, and summarizes the issues to consider in data processing and learning model development. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 7 pages, 3 figures

ACM Class: J.7; E.m

arXiv:2403.12945 [pdf, other]

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Project website: https://droid-dataset.github.io/

arXiv:2403.09336 [pdf, other]

Radiative lifetime of the A 2Π1/2 state in RaF with relevance to laser cooling

Authors: M. Athanasakis-Kaklamanakis, S. G. Wilkins, P. Lassègues, L. Lalanne, J. R. Reilly, O. Ahmad, M. Au, S. W. Bai, J. Berbalk, C. Bernerd, A. Borschevsky, A. A. Breier, K. Chrysalidis, T. E. Cocolios, R. P. de Groote, C. M. Fajardo-Zambrano, K. T. Flanagan, S. Franchoo, R. F. Garcia Ruiz, D. Hanstorp, R. Heinke, P. Imgram, A. Koszorús, A. A. Kyuberis, J. Lim , et al. (16 additional authors not shown)

Abstract: The radiative lifetime of the $A$ $^2 Π_{1/2}$ (v=0) state in radium monofluoride (RaF) is measured to be 35(1) ns. The lifetime of this state and the related decay rate $Γ= 2.86(8) \times 10^7$ $s^{-1}$ are of relevance to the laser cooling of RaF via the optically closed $A$ $^2 Π_{1/2} \leftarrow X$ $^2Σ_{1/2}$ transition, which makes the molecule a promising probe to search for new physics. Ra… ▽ More The radiative lifetime of the $A$ $^2 Π_{1/2}$ (v=0) state in radium monofluoride (RaF) is measured to be 35(1) ns. The lifetime of this state and the related decay rate $Γ= 2.86(8) \times 10^7$ $s^{-1}$ are of relevance to the laser cooling of RaF via the optically closed $A$ $^2 Π_{1/2} \leftarrow X$ $^2Σ_{1/2}$ transition, which makes the molecule a promising probe to search for new physics. RaF is found to have a comparable photon-scattering rate to homoelectronic laser-coolable molecules. Thanks to its highly diagonal Franck-Condon matrix, it is expected to scatter an order of magnitude more photons than other molecules when using just 3 cooling lasers, before it decays to a dark state. The lifetime measurement in RaF is benchmarked by measuring the lifetime of the $8P_{3/2}$ state in Fr to be 83(3) ns, in agreement with literature. △ Less

Submitted 6 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: Accepted as a Letter in Physical Review A; 8 pages of main text, 5 pages of supplemental material

arXiv:2403.09104 [pdf, other]

doi 10.3847/1538-4357/ad188a

On the Evidence for Molecular Outflows in High-redshift Dusty Star-forming Galaxies

Authors: James Nianias, Jeremy Lim, Michael Yeung

Abstract: Galactic-scale outflows of molecular gas from star-forming galaxies constitute the most direct evidence for regulation of star formation. In the early universe ($ z > 4 $), such outflows have recently been inferred from gravitationally-lensed dusty star-forming galaxies (DSFGs) based on ubiquitous detections of OH absorption extending to more blueshifted velocities than [CII] or CO emission in spa… ▽ More Galactic-scale outflows of molecular gas from star-forming galaxies constitute the most direct evidence for regulation of star formation. In the early universe ($ z > 4 $), such outflows have recently been inferred from gravitationally-lensed dusty star-forming galaxies (DSFGs) based on ubiquitous detections of OH absorption extending to more blueshifted velocities than [CII] or CO emission in spatially-integrated spectra. Because these lines are redshifted to sub-mm wavelengths, such measurements require careful corrections for atmospheric absorption lines, and a proper accounting of sometimes large variations in measurement uncertainties over these lines. Taking these factors into consideration, we re-analyze OH and [CII] data taken with ALMA for the five sources where such data is available, of which four were categorised as exhibiting outflows. Based on their spatially-integrated spectra alone, we find statistically significant ($ \geq 3 σ$) OH absorption more blueshifted than [CII] emission in only one source. By contrast, searching channel maps for signals diluted below the detection threshold in spatially-integrated spectra, we find evidence for a separate kinematic component in OH absorption in all five sources in the form of: (i) more blueshifted OH absorption than [CII] emission and/or (ii) a component in OH absorption exhibiting a different spatio-kinematic pattern than [CII] emission, the latter presumably tracing gas in a rotating disc. Providing a more complete and accurate assessment of molecular outflows in gravitationally-lensed DSFGs, we suggest methods to better assess the precision of corrections for atmospheric absorption and to more accurately measure the source continuum in future observations. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 18 pages, 8 figures, 2 tables

Journal ref: ApJ 963 19 (2024)

arXiv:2403.04583 [pdf, other]

Unbiased Estimator for Distorted Conics in Camera Calibration

Authors: Chaehyeon Song, Jaeho Shin, Myung-Hwan Jeon, Jongwoo Lim, Ayoung Kim

Abstract: In the literature, points and conics have been major features for camera geometric calibration. Although conics are more informative features than points, the loss of the conic property under distortion has critically limited the utility of conic features in camera calibration. Many existing approaches addressed conic-based calibration by ignoring distortion or introducing 3D spherical targets to… ▽ More In the literature, points and conics have been major features for camera geometric calibration. Although conics are more informative features than points, the loss of the conic property under distortion has critically limited the utility of conic features in camera calibration. Many existing approaches addressed conic-based calibration by ignoring distortion or introducing 3D spherical targets to circumvent this limitation. In this paper, we present a novel formulation for conic-based calibration using moments. Our derivation is based on the mathematical finding that the first moment can be estimated without bias even under distortion. This allows us to track moment changes during projection and distortion, ensuring the preservation of the first moment of the distorted conic. With an unbiased estimator, the circular patterns can be accurately detected at the sub-pixel level and can now be fully exploited for an entire calibration pipeline, resulting in significantly improved calibration. The entire code is readily available from https://github.com/ChaehyeonSong/discocal. △ Less

Submitted 9 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.02706 [pdf, other]

DeepBioisostere: Discovering Bioisosteres with Deep Learning for a Fine Control of Multiple Molecular Properties

Authors: Hyeongwoo Kim, Seokhyun Moon, Wonho Zhung, Jaechang Lim, Woo Youn Kim

Abstract: Optimizing molecules to improve their properties is a fundamental challenge in drug design. For a fine-tuning of molecular properties without losing bio-activity validated in advance, the concept of bioisosterism has emerged. Many in silico methods have been proposed for discovering bioisosteres, but they require expert knowledge for their applications or are restricted to known databases. Here, w… ▽ More Optimizing molecules to improve their properties is a fundamental challenge in drug design. For a fine-tuning of molecular properties without losing bio-activity validated in advance, the concept of bioisosterism has emerged. Many in silico methods have been proposed for discovering bioisosteres, but they require expert knowledge for their applications or are restricted to known databases. Here, we introduce DeepBioisostere, a deep generative model to design suitable bioisosteric replacements. Our model allows an end-to-end chemical replacement by intelligently selecting fragments for removal and insertion along with their attachment orientation. Through various scenarios of multiple property control, we showcase the model's capability to modulate specific properties, addressing the challenge in molecular optimization. Our model's innovation lies in its capacity to design a bioisosteric replacement reflecting the compatibility with the surroundings of the modification site, facilitating the control of sophisticated properties like drug-likeness. DeepBioisostere can also provide previously unseen bioisosteric replacements, highlighting its capability for exploring diverse chemical modifications rather than just mining them from known databases. Lastly, we employed DeepBioisostere to improve the sensitivity of a known SARS-CoV-2 main protease inhibitor to the E166V mutant that exhibits drug resistance to the inhibitor, demonstrating its potential application in lead optimization. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 32 pages, 7 figures, and 2 tables for main text

arXiv:2402.11776 [pdf, other]

Early feasibility of an embedded bi-directional brain-computer interface for ambulation

Authors: Jeffrey Lim, Po T. Wang, Wonjoon Sohn, Claudia Serrano-Amenos, Mina Ibrahim, Derrick Lin, Shravan Thaploo, Susan J. Shaw, Michelle Armacost, Hui Gong, Brian Lee, Darrin Lee, Richard A. Andersen, Payam Heydari, Charles Y. Liu, Zoran Nenadic, An H. Do

Abstract: Current treatments for paraplegia induced by spinal cord injury (SCI) are often limited by the severity of the injury. The accompanying loss of sensory and motor functions often results in reliance on wheelchairs, which in turn causes reduced quality of life and increased risk of co-morbidities. While brain-computer interfaces (BCIs) for ambulation have shown promise in restoring or replacing lowe… ▽ More Current treatments for paraplegia induced by spinal cord injury (SCI) are often limited by the severity of the injury. The accompanying loss of sensory and motor functions often results in reliance on wheelchairs, which in turn causes reduced quality of life and increased risk of co-morbidities. While brain-computer interfaces (BCIs) for ambulation have shown promise in restoring or replacing lower extremity motor functions, none so far have simultaneously implemented sensory feedback functions. Additionally, many existing BCIs for ambulation rely on bulky external hardware that make them ill-suited for non-research settings. Here, we present an embedded bi-directional BCI (BDBCI), that restores motor function by enabling neural control over a robotic gait exoskeleton (RGE) and delivers sensory feedback via direct cortical electrical stimulation (DCES) in response to RGE leg swing. A first demonstration with this system was performed with a single subject implanted with electrocorticography electrodes, achieving an average lag-optimized cross-correlation of 0.80$\pm$0.08 between cues and decoded states over 5 runs. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: 5 pages, 6 figures, two tables, also submitted to IEEE EMBC 2024 conference

MSC Class: 92C55

arXiv:2402.11349 [pdf, other]

Language Models Don't Learn the Physical Manifestation of Language

Authors: Bruce W. Lee, JaeHyuk Lim

Abstract: We argue that language-only models don't learn the physical manifestation of language. We present an empirical investigation of visual-auditory properties of language through a series of tasks, termed H-Test. These tasks highlight a fundamental gap between human linguistic understanding and the sensory-deprived linguistic understanding of LLMs. In support of our hypothesis, 1. deliberate reasoning… ▽ More We argue that language-only models don't learn the physical manifestation of language. We present an empirical investigation of visual-auditory properties of language through a series of tasks, termed H-Test. These tasks highlight a fundamental gap between human linguistic understanding and the sensory-deprived linguistic understanding of LLMs. In support of our hypothesis, 1. deliberate reasoning (Chain-of-Thought), 2. few-shot examples, or 3. stronger LLM from the same model family (LLaMA 2 13B -> LLaMA 2 70B) has no significant effect on H-Test performance. We bring in the philosophical case of Mary, who learns about the world in a sensory-deprived environment as a useful conceptual framework to understand how language-only models learn about the world (Jackson, 1986). Our experiments show that some of the strongest proprietary LLMs stay near random chance baseline accuracy of 50%, highlighting the limitations of linguistic knowledge acquired in the absence of sensory experience. Our code and data are available at <github.com/brucewlee/h-test>. △ Less

Submitted 6 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

Comments: ACL 2024 Main

arXiv:2402.10692 [pdf, other]

Understanding inner-shell excitations in molecules through spectroscopy of the 4f hole states of YbF

Authors: S. Popa, S. Schaller, A. Fielicke, J. Lim, B. G. Sartakov, M. R. Tarbutt, G. Meijer

Abstract: Molecules containing a lanthanide atom have sets of electronic states arising from excitation of an inner-shell electron. These states have received little attention, but are thought to play an important role in laser cooling of such molecules and may be a useful resource for testing fundamental physics. We study a series of inner-shell excited states in YbF using resonance-enhanced multi-photon i… ▽ More Molecules containing a lanthanide atom have sets of electronic states arising from excitation of an inner-shell electron. These states have received little attention, but are thought to play an important role in laser cooling of such molecules and may be a useful resource for testing fundamental physics. We study a series of inner-shell excited states in YbF using resonance-enhanced multi-photon ionisation spectroscopy. We investigate the excited states of lowest energy, 8474, 9013 and 9090 cm$^{-1}$ above the ground state, all corresponding to the configuration 4f$^{13}$6s$^{2}$ ${}^{2}F_{7/2}$ of the Yb$^+$ ion. They are metastable, since they have no electric dipole allowed transitions to the ground state. We also characterize a state at 31050 cm$^{-1}$ that is easily excited from both the ground and metastable states, which makes it especially useful for this spectroscopic study. Finally, we study a state at 48729 cm$^{-1}$, which is above the ionization limit and features strong auto-ionizing resonances that prove useful for efficient detection of the molecules and for identifying the rotational quantum number of each line in the spectrum. We resolve the rotational structures of all these states and find that they can all be described by a very simple model based on Hund's case (c). Our study provides information necessary for laser slowing and magneto-optical trapping of YbF, which is an important species for testing fundamental physics. We also consider whether the low-lying inner-shell states may themselves be useful as probes of the electron's electric dipole moment or of varying fundamental constants, since they are long-lived states in a laser-coolable molecule featuring closely-spaced levels of opposite parity. △ Less

Submitted 24 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: 14 pages, 10 figures. Minor amendments

arXiv:2402.10083 [pdf]

Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4

Authors: Ting Fang Tan, Kabilan Elangovan, Liyuan Jin, Yao Jie, Li Yong, Joshua Lim, Stanley Poh, Wei Yan Ng, Daniel Lim, Yuhe Ke, Nan Liu, Daniel Shu Wei Ting

Abstract: Purpose: To assess the alignment of GPT-4-based evaluation to human clinician experts, for the evaluation of responses to ophthalmology-related patient queries generated by fine-tuned LLM chatbots. Methods: 400 ophthalmology questions and paired answers were created by ophthalmologists to represent commonly asked patient questions, divided into fine-tuning (368; 92%), and testing (40; 8%). We find… ▽ More Purpose: To assess the alignment of GPT-4-based evaluation to human clinician experts, for the evaluation of responses to ophthalmology-related patient queries generated by fine-tuned LLM chatbots. Methods: 400 ophthalmology questions and paired answers were created by ophthalmologists to represent commonly asked patient questions, divided into fine-tuning (368; 92%), and testing (40; 8%). We find-tuned 5 different LLMs, including LLAMA2-7b, LLAMA2-7b-Chat, LLAMA2-13b, and LLAMA2-13b-Chat. For the testing dataset, additional 8 glaucoma QnA pairs were included. 200 responses to the testing dataset were generated by 5 fine-tuned LLMs for evaluation. A customized clinical evaluation rubric was used to guide GPT-4 evaluation, grounded on clinical accuracy, relevance, patient safety, and ease of understanding. GPT-4 evaluation was then compared against ranking by 5 clinicians for clinical alignment. Results: Among all fine-tuned LLMs, GPT-3.5 scored the highest (87.1%), followed by LLAMA2-13b (80.9%), LLAMA2-13b-chat (75.5%), LLAMA2-7b-Chat (70%) and LLAMA2-7b (68.8%) based on the GPT-4 evaluation. GPT-4 evaluation demonstrated significant agreement with human clinician rankings, with Spearman and Kendall Tau correlation coefficients of 0.90 and 0.80 respectively; while correlation based on Cohen Kappa was more modest at 0.50. Notably, qualitative analysis and the glaucoma sub-analysis revealed clinical inaccuracies in the LLM-generated responses, which were appropriately identified by the GPT-4 evaluation. Conclusion: The notable clinical alignment of GPT-4 evaluation highlighted its potential to streamline the clinical evaluation of LLM chatbot responses to healthcare-related queries. By complementing the existing clinician-dependent manual grading, this efficient and automated evaluation could assist the validation of future developments in LLM applications for healthcare. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 13 Pages, 1 Figure, 8 Tables

arXiv:2402.07617 [pdf, other]

doi 10.1103/PhysRevA.109.052224

Optimized noise-assisted simulation of the Lindblad equation with time-dependent coefficients on a noisy quantum processor

Authors: José D. Guimarães, Antonio Ruiz-Molero, James Lim, Mikhail I. Vasilevskiy, Susana F. Huelga, Martin B. Plenio

Abstract: Noise in quantum devices is generally considered detrimental to computational accuracy. However, the recent proposal of noise-assisted simulation has demonstrated that noise can be an asset in digital quantum simulations of open systems on Noisy Intermediate-Scale Quantum (NISQ) devices. In this context, we introduce an optimized decoherence rate control scheme that can significantly reduce comput… ▽ More Noise in quantum devices is generally considered detrimental to computational accuracy. However, the recent proposal of noise-assisted simulation has demonstrated that noise can be an asset in digital quantum simulations of open systems on Noisy Intermediate-Scale Quantum (NISQ) devices. In this context, we introduce an optimized decoherence rate control scheme that can significantly reduce computational requirements by multiple orders of magnitude, in comparison to the original noise-assisted simulation. We further extend this approach to encompass Lindblad equations with time-dependent coefficients, using only quantum error characterization and mitigation techniques. This extension allows for the perturbative simulation of non-Markovian dynamics on NISQ devices, eliminating the need for ancilla qubits or mid-circuit measurements. Our contributions are validated through numerical experiments on an emulated IBMQ device. Overall, our work offers valuable optimizations that bring current quantum processors closer to effectively simulating realistic open systems. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.03734 [pdf, other]

Magnon mediated spin pumping by coupled ferrimagnetic garnets heterostructure

Authors: Anupama Swain, Kshitij Singh Rathore, Pushpendra Gupta, Abhisek Mishra, Gary Lee, Jinho Lim, Axel Hoffmann, Ramanathan Mahendiran, Subhankar Bedanta

Abstract: Spin pumping has significant implications for spintronics, providing a mechanism to manipulate and transport spins for information processing. Understanding and harnessing spin currents through spin pumping is critical for the development of efficient spintronic devices. The use of a magnetic insulator with low damping, enhances the signal-to-noise ratio in crucial experiments such as spin-torque… ▽ More Spin pumping has significant implications for spintronics, providing a mechanism to manipulate and transport spins for information processing. Understanding and harnessing spin currents through spin pumping is critical for the development of efficient spintronic devices. The use of a magnetic insulator with low damping, enhances the signal-to-noise ratio in crucial experiments such as spin-torque ferromagnetic resonance (FMR) and spin pumping. A magnetic insulator coupled with a heavy metal or quantum material offers a more straight forward model system, especially when investigating spin-charge interconversion processes to greater accuracy. This simplicity arises from the absence of unwanted effects caused by conduction electrons unlike in ferromagnetic metals. Here, we investigate the spin pumping in coupled ferrimagnetic (FiM) Y3Fe5O12 (YIG)/Tm3Fe5O12 (TmIG) bilayers combined with heavy-metal (Pt) using the inverse spin Hall effect (ISHE). It is observed that magnon transmission occurs at both of the FiMs FMR positions. The enhancement of spin pumping voltage (Vsp) in the FiM garnet heterostructures is attributed to the strong interfacial exchange coupling between FiMs. The modulation of Vsp is achieved by tuning the bilayer structure. Further, the spin mixing conductance for these coupled systems is found to be 10^18 m^-2. Our findings describe a novel coupled FiM system for the investigation of magnon coupling providing new prospects for magnonic devices. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Showing 1–50 of 642 results for author: Lim, J