subscribe to arXiv mailings

ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction

Authors: Shaozhe Hao, Kai Han, Zhengyao Lv, Shihao Zhao, Kwan-Yee K. Wong

Abstract: While personalized text-to-image generation has enabled the learning of a single concept from multiple images, a more practical yet challenging scenario involves learning multiple concepts within a single image. However, existing works tackling this scenario heavily rely on extensive human annotations. In this paper, we introduce a novel task named Unsupervised Concept Extraction (UCE) that consid… ▽ More While personalized text-to-image generation has enabled the learning of a single concept from multiple images, a more practical yet challenging scenario involves learning multiple concepts within a single image. However, existing works tackling this scenario heavily rely on extensive human annotations. In this paper, we introduce a novel task named Unsupervised Concept Extraction (UCE) that considers an unsupervised setting without any human knowledge of the concepts. Given an image that contains multiple concepts, the task aims to extract and recreate individual concepts solely relying on the existing knowledge from pretrained diffusion models. To achieve this, we present ConceptExpress that tackles UCE by unleashing the inherent capabilities of pretrained diffusion models in two aspects. Specifically, a concept localization approach automatically locates and disentangles salient concepts by leveraging spatial correspondence from diffusion self-attention; and based on the lookup association between a concept and a conceptual token, a concept-wise optimization process learns discriminative tokens that represent each individual concept. Finally, we establish an evaluation protocol tailored for the UCE task. Extensive experiments demonstrate that ConceptExpress is a promising solution to the UCE task. Our code and data are available at: https://github.com/haoosz/ConceptExpress △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: ECCV 2024, Project page: https://haoosz.github.io/ConceptExpress/

arXiv:2407.06780 [pdf, other]

CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection

Authors: Shuang Hao, Chunlin Zhong, He Tang

Abstract: The depth/thermal information is beneficial for detecting salient object with conventional RGB images. However, in dual-modal salient object detection (SOD) model, the robustness against noisy inputs and modality missing is crucial but rarely studied. To tackle this problem, we introduce \textbf{Co}nditional Dropout and \textbf{LA}nguage-driven(\textbf{CoLA}) framework comprising two core componen… ▽ More The depth/thermal information is beneficial for detecting salient object with conventional RGB images. However, in dual-modal salient object detection (SOD) model, the robustness against noisy inputs and modality missing is crucial but rarely studied. To tackle this problem, we introduce \textbf{Co}nditional Dropout and \textbf{LA}nguage-driven(\textbf{CoLA}) framework comprising two core components. 1) Language-driven Quality Assessment (LQA): Leveraging a pretrained vision-language model with a prompt learner, the LQA recalibrates image contributions without requiring additional quality annotations. This approach effectively mitigates the impact of noisy inputs. 2) Conditional Dropout (CD): A learning method to strengthen the model's adaptability in scenarios with missing modalities, while preserving its performance under complete modalities. The CD serves as a plug-in training scheme that treats modality-missing as conditions, strengthening the overall robustness of various dual-modal SOD models. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art dual-modal SOD models, under both modality-complete and modality-missing conditions. We will release source code upon acceptance. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.04213 [pdf]

Pathfinder: Exploring Path Diversity for Assessing Internet Censorship Inconsistency

Authors: Xiaoqin Liang, Guannan Liu, Lin Jin, Shuai Hao, Haining Wang

Abstract: Internet censorship is typically enforced by authorities to achieve information control for a certain group of Internet users. So far existing censorship studies have primarily focused on country-level characterization because (1) in many cases, censorship is enabled by governments with nationwide policies and (2) it is usually hard to control how the probing packets are routed to trigger censorsh… ▽ More Internet censorship is typically enforced by authorities to achieve information control for a certain group of Internet users. So far existing censorship studies have primarily focused on country-level characterization because (1) in many cases, censorship is enabled by governments with nationwide policies and (2) it is usually hard to control how the probing packets are routed to trigger censorship in different networks inside a country. However, the deployment and implementation of censorship could be highly diverse at the ISP level. In this paper, we investigate Internet censorship from a different perspective by scrutinizing the diverse censorship deployment inside a country. Specifically, by leveraging an end-to-end measurement framework, we deploy multiple geo-distributed back-end control servers to explore various paths from one single vantage point. The generated traffic with the same domain but different control servers' IPs could be forced to traverse different transit networks, thereby being examined by different censorship devices if present. Through our large-scale experiments and in-depth investigation, we reveal that the diversity of Internet censorship caused by different routing paths inside a country is prevalent, implying that (1) the implementations of centralized censorship are commonly incomplete or flawed and (2) decentralized censorship is also common. Moreover, we identify that different hosting platforms also result in inconsistent censorship activities due to different peering relationships with the ISPs in a country. Finally, we present extensive case studies in detail to illustrate the configurations that lead to censorship inconsistency and explore the causes. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.15999 [pdf, other]

doi 10.1145/3643738

SmartAxe: Detecting Cross-Chain Vulnerabilities in Bridge Smart Contracts via Fine-Grained Static Analysis

Authors: Zeqin Liao, Yuhong Nan, Henglong Liang, Sicheng Hao, Juan Zhai, Jiajing Wu, Zibin Zheng

Abstract: With the increasing popularity of blockchain, different blockchain platforms coexist in the ecosystem (e.g., Ethereum, BNB, EOSIO, etc.), which prompts the high demand for cross-chain communication. Cross-chain bridge is a specific type of decentralized application for asset exchange across different blockchain platforms. Securing the smart contracts of cross-chain bridges is in urgent need, as th… ▽ More With the increasing popularity of blockchain, different blockchain platforms coexist in the ecosystem (e.g., Ethereum, BNB, EOSIO, etc.), which prompts the high demand for cross-chain communication. Cross-chain bridge is a specific type of decentralized application for asset exchange across different blockchain platforms. Securing the smart contracts of cross-chain bridges is in urgent need, as there are a number of recent security incidents with heavy financial losses caused by vulnerabilities in bridge smart contracts, as we call them Cross-Chain Vulnerabilities (CCVs). However, automatically identifying CCVs in smart contracts poses several unique challenges. Particularly, it is non-trivial to (1) identify application-specific access control constraints needed for cross-bridge asset exchange, and (2) identify inconsistent cross-chain semantics between the two sides of the bridge. In this paper, we propose SmartAxe, a new framework to identify vulnerabilities in cross-chain bridge smart contracts. Particularly, to locate vulnerable functions that have access control incompleteness, SmartAxe models the heterogeneous implementations of access control and finds necessary security checks in smart contracts through probabilistic pattern inference. Besides, SmartAxe constructs cross-chain control-flow graph (xCFG) and data-flow graph (xDFG), which help to find semantic inconsistency during cross-chain data communication. To evaluate SmartAxe, we collect and label a dataset of 88 CCVs from real-attacks cross-chain bridge contracts. Evaluation results show that SmartAxe achieves a precision of 84.95% and a recall of 89.77%. In addition, SmartAxe successfully identifies 232 new/unknown CCVs from 129 real-world cross-chain bridge applications (i.e., from 1,703 smart contracts). These identified CCVs affect a total amount of digital assets worth 1,885,250 USD. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Journal ref: The ACM International Conference on the Foundations of Software Engineering 2024

arXiv:2406.15988 [pdf, other]

doi 10.1145/3597926.3598111

SmartState: Detecting State-Reverting Vulnerabilities in Smart Contracts via Fine-Grained State-Dependency Analysis

Authors: Zeqin Liao, Sicheng Hao, Yuhong Nan, Zibin Zheng

Abstract: Smart contracts written in Solidity are widely used in different blockchain platforms such as Ethereum, TRON and BNB Chain. One of the unique designs in Solidity smart contracts is its state-reverting mechanism for error handling and access control. Unfortunately, a number of recent security incidents showed that adversaries also utilize this mechanism to manipulate critical states of smart contra… ▽ More Smart contracts written in Solidity are widely used in different blockchain platforms such as Ethereum, TRON and BNB Chain. One of the unique designs in Solidity smart contracts is its state-reverting mechanism for error handling and access control. Unfortunately, a number of recent security incidents showed that adversaries also utilize this mechanism to manipulate critical states of smart contracts, and hence, bring security consequences such as illegal profit-gain and Deny-of-Service (DoS). In this paper, we call such vulnerabilities as the State-reverting Vulnerability (SRV). Automatically identifying SRVs poses unique challenges, as it requires an in-depth analysis and understanding of the state-dependency relations in smart contracts. This paper presents SmartState, a new framework for detecting state-reverting vulnerability in Solidity smart contracts via fine-grained state-dependency analysis. SmartState integrates a set of novel mechanisms to ensure its effectiveness. Particularly, Smart-State extracts state dependencies from both contract bytecode and historical transactions. Both of them are critical for inferring dependencies related to SRVs. Further, SmartState models the generic patterns of SRVs (i.e., profit-gain and DoS) as SRV indicators, and hence effectively identify SRVs based on the constructed state-dependency graph. To evaluate SmartState, we manually annotated a ground-truth dataset which contains 91 SRVs in the real world. Evaluation results showed that SmartState achieves a precision of 87.23% and a recall of 89.13%. In addition, SmartState successfully identifies 406 new SRVs from 47,351 real-world smart contracts. 11 of these SRVs are from popular smart contracts with high transaction amounts (i.e., top 2000). In total, our reported SRVs affect a total amount of digital assets worth 428,600 USD. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 12 pages, 10 figures

Journal ref: ISSTA 2023

arXiv:2406.09455 [pdf, other]

Pandora: Towards General World Model with Natural Language Actions and Video States

Authors: Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

Abstract: World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the… ▽ More World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the physical world, while video models lack interactive action control over the world simulations. This paper makes a step towards building a general world model by introducing Pandora, a hybrid autoregressive-diffusion model that simulates world states by generating videos and allows real-time control with free-text actions. Pandora achieves domain generality, video consistency, and controllability through large-scale pretraining and instruction tuning. Crucially, Pandora bypasses the cost of training-from-scratch by integrating a pretrained LLM (7B) and a pretrained video model, requiring only additional lightweight finetuning. We illustrate extensive outputs by Pandora across diverse domains (indoor/outdoor, natural/urban, human/robot, 2D/3D, etc.). The results indicate great potential of building stronger general world models with larger-scale training. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Website: https://world-model.maitrix.org/

arXiv:2406.06615 [pdf, other]

Language Guided Skill Discovery

Authors: Seungeun Rho, Laura Smith, Tianyu Li, Sergey Levine, Xue Bin Peng, Sehoon Ha

Abstract: Skill discovery methods enable agents to learn diverse emergent behaviors without explicit rewards. To make learned skills useful for unknown downstream tasks, obtaining a semantically diverse repertoire of skills is essential. While some approaches introduce a discriminator to distinguish skills and others aim to increase state coverage, no existing work directly addresses the "semantic diversity… ▽ More Skill discovery methods enable agents to learn diverse emergent behaviors without explicit rewards. To make learned skills useful for unknown downstream tasks, obtaining a semantically diverse repertoire of skills is essential. While some approaches introduce a discriminator to distinguish skills and others aim to increase state coverage, no existing work directly addresses the "semantic diversity" of skills. We hypothesize that leveraging the semantic knowledge of large language models (LLMs) can lead us to improve semantic diversity of resulting behaviors. In this sense, we introduce Language Guided Skill Discovery (LGSD), a skill discovery framework that aims to directly maximize the semantic diversity between skills. LGSD takes user prompts as input and outputs a set of semantically distinctive skills. The prompts serve as a means to constrain the search space into a semantically desired subspace, and the generated LLM outputs guide the agent to visit semantically diverse states within the subspace. We demonstrate that LGSD enables legged robots to visit different user-intended areas on a plane by simply changing the prompt. Furthermore, we show that language guidance aids in discovering more diverse skills compared to five existing skill discovery methods in robot-arm manipulation environments. Lastly, LGSD provides a simple way of utilizing learned skills via natural language. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.05673 [pdf, other]

Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking

Authors: Fangxu Yu, Lai Jiang, Haoqiang Kang, Shibo Hao, Lianhui Qin

Abstract: Divergent thinking, the cognitive process of generating diverse solutions, is a hallmark of human creativity and problem-solving. For machines, sampling diverse solution trajectories in complex reasoning problems is crucial for robust outcomes, data augmentation, and enhanced model generalization. Large language models (LLMs) often struggle with generating high-quality, diverse reasoning. While su… ▽ More Divergent thinking, the cognitive process of generating diverse solutions, is a hallmark of human creativity and problem-solving. For machines, sampling diverse solution trajectories in complex reasoning problems is crucial for robust outcomes, data augmentation, and enhanced model generalization. Large language models (LLMs) often struggle with generating high-quality, diverse reasoning. While supervised fine-tuning helps with quality, it requires extensive supervision data to capture the full diversity of solutions. Alternatively, reinforcement learning methods like PPO aim to find limited highest-reward solutions while neglecting the solution diversity, akin to convergent thinking. To address these limitations, we propose Flow of Reasoning (FoR) -- an efficient LLM training approach enabling diverse reasoning with minimal data. FoR formulates multi-step LLM reasoning as a Markovian flow from an initial state to terminal states. The formulation allows to adapt principled GFlowNet approaches to train the LLM as a policy, which is able to sample multiple reasoning paths with probabilities proportional to the unnormalized reward. Empirical results show that, with limited training data (e.g., 15 examples), FoR can discover diverse high-quality solutions that excel greatly beyond current state-of-the-art methods across three tasks, including embodied reasoning (BlocksWorld), math puzzle solving (Game24), and logical reasoning (PrOntoQA). Code is available at https://github.com/Yu-Fangxu/FoR. △ Less

Submitted 24 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.04983 [pdf, other]

CityCraft: A Real Crafter for 3D City Generation

Authors: Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang

Abstract: City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neur… ▽ More City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neural rendering. These techniques often exhibit limited diversity and noticeable artifacts in the rendered city scenes. The rendered scenes lack variety, resembling the training images, resulting in monotonous styles. Additionally, these methods lack planning capabilities, leading to less realistic generated scenes. In this paper, we introduce CityCraft, an innovative framework designed to enhance both the diversity and quality of urban scene generation. Our approach integrates three key stages: initially, a diffusion transformer (DiT) model is deployed to generate diverse and controllable 2D city layouts. Subsequently, a Large Language Model(LLM) is utilized to strategically make land-use plans within these layouts based on user prompts and language guidelines. Based on the generated layout and city plan, we utilize the asset retrieval module and Blender for precise asset placement and scene construction. Furthermore, we contribute two new datasets to the field: 1)CityCraft-OSM dataset including 2D semantic layouts of urban areas, corresponding satellite images, and detailed annotations. 2) CityCraft-Buildings dataset, featuring thousands of diverse, high-quality 3D building assets. CityCraft achieves state-of-the-art performance in generating realistic 3D cities. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 20 pages, 9 figures

arXiv:2406.01152 [pdf, other]

Learning-based legged locomotion; state of the art and future perspectives

Authors: Sehoon Ha, Joonho Lee, Michiel van de Panne, Zhaoming Xie, Wenhao Yu, Majid Khadiv

Abstract: Legged locomotion holds the premise of universal mobility, a critical capability for many real-world robotic applications. Both model-based and learning-based approaches have advanced the field of legged locomotion in the past three decades. In recent years, however, a number of factors have dramatically accelerated progress in learning-based methods, including the rise of deep learning, rapid pro… ▽ More Legged locomotion holds the premise of universal mobility, a critical capability for many real-world robotic applications. Both model-based and learning-based approaches have advanced the field of legged locomotion in the past three decades. In recent years, however, a number of factors have dramatically accelerated progress in learning-based methods, including the rise of deep learning, rapid progress in simulating robotic systems, and the availability of high-performance and affordable hardware. This article aims to give a brief history of the field, to summarize recent efforts in learning locomotion skills for quadrupeds, and to provide researchers new to the area with an understanding of the key issues involved. With the recent proliferation of humanoid robots, we further outline the rapid rise of analogous methods for bipedal locomotion. We conclude with a discussion of open problems as well as related societal impact. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.00223 [pdf, other]

ConFides: A Visual Analytics Solution for Automated Speech Recognition Analysis and Exploration

Authors: Sunwoo Ha, Chaehun Lim, R. Jordan Crouser, Alvitta Ottley

Abstract: Confidence scores of automatic speech recognition (ASR) outputs are often inadequately communicated, preventing its seamless integration into analytical workflows. In this paper, we introduce ConFides, a visual analytic system developed in collaboration with intelligence analysts to address this issue. ConFides aims to aid exploration and post-AI-transcription editing by visually representing the… ▽ More Confidence scores of automatic speech recognition (ASR) outputs are often inadequately communicated, preventing its seamless integration into analytical workflows. In this paper, we introduce ConFides, a visual analytic system developed in collaboration with intelligence analysts to address this issue. ConFides aims to aid exploration and post-AI-transcription editing by visually representing the confidence associated with the transcription. We demonstrate how our tool can assist intelligence analysts who use ASR outputs in their analytical and exploratory tasks and how it can help mitigate misinterpretation of crucial information. We also discuss opportunities for improving textual data cleaning and model transparency for human-machine collaboration. △ Less

Submitted 30 April, 2024; originally announced May 2024.

arXiv:2404.17609 [pdf, other]

CoSD: Collaborative Stance Detection with Contrastive Heterogeneous Topic Graph Learning

Authors: Yinghan Cheng, Qi Zhang, Chongyang Shi, Liang Xiao, Shufeng Hao, Liang Hu

Abstract: Stance detection seeks to identify the viewpoints of individuals either in favor or against a given target or a controversial topic. Current advanced neural models for stance detection typically employ fully parametric softmax classifiers. However, these methods suffer from several limitations, including lack of explainability, insensitivity to the latent data structure, and unimodality, which gre… ▽ More Stance detection seeks to identify the viewpoints of individuals either in favor or against a given target or a controversial topic. Current advanced neural models for stance detection typically employ fully parametric softmax classifiers. However, these methods suffer from several limitations, including lack of explainability, insensitivity to the latent data structure, and unimodality, which greatly restrict their performance and applications. To address these challenges, we present a novel collaborative stance detection framework called (CoSD) which leverages contrastive heterogeneous topic graph learning to learn topic-aware semantics and collaborative signals among texts, topics, and stance labels for enhancing stance detection. During training, we construct a heterogeneous graph to structurally organize texts and stances through implicit topics via employing latent Dirichlet allocation. We then perform contrastive graph learning to learn heterogeneous node representations, aggregating informative multi-hop collaborative signals via an elaborate Collaboration Propagation Aggregation (CPA) module. During inference, we introduce a hybrid similarity scoring module to enable the comprehensive incorporation of topic-aware semantics and collaborative signals for stance detection. Extensive experiments on two benchmark datasets demonstrate the state-of-the-art detection performance of CoSD, verifying the effectiveness and explainability of our collaborative framework. △ Less

Submitted 19 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: 13 pages

arXiv:2404.15778 [pdf, other]

BASS: Batched Attention-optimized Speculative Sampling

Authors: Haifeng Qian, Sujan Kumar Gonugondla, Sungsoo Ha, Mingyue Shang, Sanjay Krishna Gouda, Ramesh Nallapati, Sudipta Sengupta, Xiaofei Ma, Anoop Deoras

Abstract: Speculative decoding has emerged as a powerful method to improve latency and throughput in hosting large language models. However, most existing implementations focus on generating a single sequence. Real-world generative AI applications often require multiple responses and how to perform speculative decoding in a batched setting while preserving its latency benefits poses non-trivial challenges.… ▽ More Speculative decoding has emerged as a powerful method to improve latency and throughput in hosting large language models. However, most existing implementations focus on generating a single sequence. Real-world generative AI applications often require multiple responses and how to perform speculative decoding in a batched setting while preserving its latency benefits poses non-trivial challenges. This paper describes a system of batched speculative decoding that sets a new state of the art in multi-sequence generation latency and that demonstrates superior GPU utilization as well as quality of generations within a time budget. For example, for a 7.8B-size model on a single A100 GPU and with a batch size of 8, each sequence is generated at an average speed of 5.8ms per token, the overall throughput being 1.1K tokens per second. These results represent state-of-the-art latency and a 2.15X speed-up over optimized regular decoding. Within a time budget that regular decoding does not finish, our system is able to generate sequences with HumanEval Pass@First of 43% and Pass@All of 61%, far exceeding what's feasible with single-sequence speculative decoding. Our peak GPU utilization during decoding reaches as high as 15.8%, more than 3X the highest of that of regular decoding and around 10X of single-sequence speculative decoding. △ Less

Submitted 26 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.14521 [pdf, other]

Guided By AI: Navigating Trust, Bias, and Data Exploration in AI-Guided Visual Analytics

Authors: Sunwoo Ha, Shayan Monadjemi, Alvitta Ottley

Abstract: The increasing integration of artificial intelligence (AI) in visual analytics (VA) tools raises vital questions about the behavior of users, their trust, and the potential of induced biases when provided with guidance during data exploration. We present an experiment where participants engaged in a visual data exploration task while receiving intelligent suggestions supplemented with four differe… ▽ More The increasing integration of artificial intelligence (AI) in visual analytics (VA) tools raises vital questions about the behavior of users, their trust, and the potential of induced biases when provided with guidance during data exploration. We present an experiment where participants engaged in a visual data exploration task while receiving intelligent suggestions supplemented with four different transparency levels. We also modulated the difficulty of the task (easy or hard) to simulate a more tedious scenario for the analyst. Our results indicate that participants were more inclined to accept suggestions when completing a more difficult task despite the AI's lower suggestion accuracy. Moreover, the levels of transparency tested in this study did not significantly affect suggestion usage or subjective trust ratings of the participants. Additionally, we observed that participants who utilized suggestions throughout the task explored a greater quantity and diversity of data points. We discuss these findings and the implications of this research for improving the design and effectiveness of AI-guided VA tools. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.14072 [pdf, other]

Measure-valued death state and local sensitivity analysis for Winfree models with uncertain high-order couplings

Authors: Seung-Yeal Ha, Myeongju Kang, Jaeyoung Yoon, Mattia Zanella

Abstract: We study the measure-valued death state and local sensitivity analysis of the Winfree model and its mean-field counterpart with uncertain high-order couplings. The Winfree model is the first mathematical model for synchronization, and it can cast as the effective approximation of the pulse-coupled model for synchronization, and it exhibits diverse asymptotic patterns depending on system parameters… ▽ More We study the measure-valued death state and local sensitivity analysis of the Winfree model and its mean-field counterpart with uncertain high-order couplings. The Winfree model is the first mathematical model for synchronization, and it can cast as the effective approximation of the pulse-coupled model for synchronization, and it exhibits diverse asymptotic patterns depending on system parameters and initial data. For the proposed models, we present several frameworks leading to oscillator death in terms of system parameters and initial data, and the propagation of regularity in random space. We also present several numerical tests and compare them with analytical results. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13256 [pdf]

Electrically generated exciton polaritons with spin on-demand

Authors: Yutao Wang, Giorgio Adamo, Son Tung Ha, Jingyi Tian, Cesare Soci

Abstract: Generation and manipulation of exciton polaritons with controllable spin could deeply impact spintronic applications, quantum simulations, and quantum information processing, but is inherently challenging due to the charge neutrality of the polariton and the device complexity it requires. In this work, we demonstrate electrical generation of spin-polarized exciton polaritons in a monolithic dielec… ▽ More Generation and manipulation of exciton polaritons with controllable spin could deeply impact spintronic applications, quantum simulations, and quantum information processing, but is inherently challenging due to the charge neutrality of the polariton and the device complexity it requires. In this work, we demonstrate electrical generation of spin-polarized exciton polaritons in a monolithic dielectric perovskite metasurface embedded in a light-emitting transistor. A finely tailored interplay of in- and out-of-plane symmetry breaking of the metasurface allows to lift the spin degeneracy through the polaritonic Rashba effect, yielding high spin purity with normalized Stokes parameter of S3~0.8. Leveraging on spin-momentum locking, the unique metatransistor device architecture enables electrical control of spin and directionality of the polaritonic emission. This work advances the development of compact and tunable spintronic devices, and represents an important step toward the realization of electrically pumped inversionless spin-lasers. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.10933 [pdf, other]

LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs

Authors: Taeho Kim, Yanming Wang, Vatshank Chaturvedi, Lokesh Gupta, Seyeon Kim, Yongin Kwon, Sangtae Ha

Abstract: Fine-tuning pre-trained large language models (LLMs) with limited hardware presents challenges due to GPU memory constraints. Various distributed fine-tuning methods have been proposed to alleviate memory constraints on GPU. However, determining the most effective method for achieving rapid fine-tuning while preventing GPU out-of-memory issues in a given environment remains unclear. To address thi… ▽ More Fine-tuning pre-trained large language models (LLMs) with limited hardware presents challenges due to GPU memory constraints. Various distributed fine-tuning methods have been proposed to alleviate memory constraints on GPU. However, determining the most effective method for achieving rapid fine-tuning while preventing GPU out-of-memory issues in a given environment remains unclear. To address this challenge, we introduce LLMem, a solution that estimates the GPU memory consumption when applying distributed fine-tuning methods across multiple GPUs and identifies the optimal method. We conduct GPU memory usage estimation prior to fine-tuning, leveraging the fundamental structure of transformer-based decoder models and the memory usage distribution of each method. Experimental results show that LLMem accurately estimates peak GPU memory usage on a single GPU, with error rates of up to 1.6%. Additionally, it shows an average error rate of 3.0% when applying distributed fine-tuning methods to LLMs with more than a billion parameters on multi-GPU setups. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 9 pages, 9 figures, accepted to IJCAI 2024

arXiv:2404.10022 [pdf, other]

COBRAPRO: A MATLAB toolbox for Physics-based Battery Modeling and Co-simulation Parameter Optimization

Authors: Sara Ha, Simona Onori

Abstract: COBRAPRO is a new open-source physics-based battery modeling software with the capability to conduct closed-loop parameter optimization using experimental data. Physics-based battery models require systematic parameter calibration to accurately predict battery behavior across different usage scenarios. While parameter calibration is essential to predict the dynamic behavior of batteries, many exis… ▽ More COBRAPRO is a new open-source physics-based battery modeling software with the capability to conduct closed-loop parameter optimization using experimental data. Physics-based battery models require systematic parameter calibration to accurately predict battery behavior across different usage scenarios. While parameter calibration is essential to predict the dynamic behavior of batteries, many existing open-source DFN modeling tools lack integrated parameter identification routines. COBRAPRO addresses this gap by featuring an embedded parameter optimization framework that optimizes the model parameters by minimizing the error between the simulated and experimentally observed current-voltage data. With COBRAPRO, users can non-invasively identify unknown battery model parameters for any given battery chemistry. △ Less

Submitted 16 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.06611 [pdf, other]

Modeling social interaction dynamics using temporal graph networks

Authors: J. Taery Kim, Archit Naik, Isuru Jayarathne, Sehoon Ha, Jouh Yeong Chew

Abstract: Integrating intelligent systems, such as robots, into dynamic group settings poses challenges due to the mutual influence of human behaviors and internal states. A robust representation of social interaction dynamics is essential for effective human-robot collaboration. Existing approaches often narrow their focus to facial expressions or speech, overlooking the broader context. We propose employi… ▽ More Integrating intelligent systems, such as robots, into dynamic group settings poses challenges due to the mutual influence of human behaviors and internal states. A robust representation of social interaction dynamics is essential for effective human-robot collaboration. Existing approaches often narrow their focus to facial expressions or speech, overlooking the broader context. We propose employing an adapted Temporal Graph Networks to comprehensively represent social interaction dynamics while enabling its practical implementation. Our method incorporates temporal multi-modal behavioral data including gaze interaction, voice activity and environmental context. This representation of social interaction dynamics is trained as a link prediction problem using annotated gaze interaction data. The F1-score outperformed the baseline model by 37.0%. This improvement is consistent for a secondary task of next speaker prediction which achieves an improvement of 29.0%. Our contributions are two-fold, including a model to representing social interaction dynamics which can be used for many downstream human-robot interaction tasks like human state inference and next speaker prediction. More importantly, this is achieved using a more concise yet efficient message passing method, significantly reducing it from 768 to 14 elements, while outperforming the baseline model. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures

Journal ref: 33rd IEEE International Conference on Robot & Human Interactive Communication (RO-MAN 2024)

arXiv:2404.05221 [pdf, other]

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

Authors: Shibo Hao, Yi Gu, Haotian Luo, Tianyang Liu, Xiyan Shao, Xinyuan Wang, Shuhua Xie, Haodi Ma, Adithya Samavedhi, Qiyue Gao, Zhen Wang, Zhiting Hu

Abstract: Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs) to address complex problems and enhance robustness and interpretability. Despite the flux of research on developing advanced reasoning approaches, systematically analyzing the diverse LLMs and reasoning strategies in generating reasoning chains remains a significant challenge. The difficulties stem from the la… ▽ More Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs) to address complex problems and enhance robustness and interpretability. Despite the flux of research on developing advanced reasoning approaches, systematically analyzing the diverse LLMs and reasoning strategies in generating reasoning chains remains a significant challenge. The difficulties stem from the lack of two key elements: (1) an automatic method for evaluating the generated reasoning chains on different tasks, and (2) a unified formalism and implementation of the diverse reasoning approaches for systematic comparison. This paper aims to close the gap: (1) We introduce AutoRace for fully automated reasoning chain evaluation. Existing metrics rely on expensive human annotations or pre-defined LLM prompts not adaptable to different tasks. In contrast, AutoRace automatically creates detailed evaluation criteria tailored for each task, and uses GPT-4 for accurate evaluation following the criteria. (2) We develop LLM Reasoners, a library for standardized modular implementation of existing and new reasoning algorithms, under a unified formulation of the search, reward, and world model components. With the new evaluation and library, (3) we conduct extensive study of different reasoning approaches (e.g., CoT, ToT, RAP). The analysis reveals interesting findings about different factors contributing to reasoning, including the reward-guidance, breadth-vs-depth in search, world model, and prompt formats, etc. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Project website: https://www.llm-reasoners.net/

arXiv:2404.02550 [pdf, other]

On the comparison between phenomenological and kinetic theories of gas mixtures with applications to flocking

Authors: Gi-Chan Bae, Seung-Yeal Ha, Gyuyoung Hwang, Tommaso Ruggeri

Abstract: We study the compression between the phenomenological and kinetic models for a mixture of gases from the viewpoint of collective dynamics. In the case in which constituents are Eulerian gases, balance equations for mass, momentum, and energy are the same in the main differential part, but production terms due to the interchanges between constituents are different. They coincide only when the therm… ▽ More We study the compression between the phenomenological and kinetic models for a mixture of gases from the viewpoint of collective dynamics. In the case in which constituents are Eulerian gases, balance equations for mass, momentum, and energy are the same in the main differential part, but production terms due to the interchanges between constituents are different. They coincide only when the thermal and mechanical diffusion are sufficiently small. In this paper, we first verify that both models satisfy the universal requirements of conservation laws of total mass, momentum, and energy, Galilean invariance and entropy principle. Following the work of Ha and Ruggeri (ARMA 2017), we consider spatially homogeneous models which correspond to the generalizations of the Cucker Smale model with the thermal effect. In these circumstances, we provide analytical results for the comparison between two resulting models and also present several numerical simulations to complement analytical results. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 45 pages, 12 figures

MSC Class: 34C60; 34E10; 35L65

arXiv:2403.17158 [pdf, other]

Reflecting the Male Gaze: Quantifying Female Objectification in 19th and 20th Century Novels

Authors: Kexin Luo, Yue Mao, Bei Zhang, Sophie Hao

Abstract: Inspired by the concept of the male gaze (Mulvey, 1975) in literature and media studies, this paper proposes a framework for analyzing gender bias in terms of female objectification: the extent to which a text portrays female individuals as objects of visual pleasure. Our framework measures female objectification along two axes. First, we compute an agency bias score that indicates whether male en… ▽ More Inspired by the concept of the male gaze (Mulvey, 1975) in literature and media studies, this paper proposes a framework for analyzing gender bias in terms of female objectification: the extent to which a text portrays female individuals as objects of visual pleasure. Our framework measures female objectification along two axes. First, we compute an agency bias score that indicates whether male entities are more likely to appear in the text as grammatical agents than female entities. Next, by analyzing the word embedding space induced by a text (Caliskan et al., 2017), we compute an appearance bias score that indicates whether female entities are more closely associated with appearance-related words than male entities. Applying our framework to 19th and 20th century novels reveals evidence of female objectification in literature: we find that novels written from a male perspective systematically objectify female characters, while novels written from a female perspective do not exhibit statistically significant objectification of any gender. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: To appear in LREC-COLING 2024

arXiv:2403.12550 [pdf, other]

RGBD GS-ICP SLAM

Authors: Seongbo Ha, Jiung Yeon, Hyeonwoo Yu

Abstract: Simultaneous Localization and Mapping (SLAM) with dense representation plays a key role in robotics, Virtual Reality (VR), and Augmented Reality (AR) applications. Recent advancements in dense representation SLAM have highlighted the potential of leveraging neural scene representation and 3D Gaussian representation for high-fidelity spatial representation. In this paper, we propose a novel dense r… ▽ More Simultaneous Localization and Mapping (SLAM) with dense representation plays a key role in robotics, Virtual Reality (VR), and Augmented Reality (AR) applications. Recent advancements in dense representation SLAM have highlighted the potential of leveraging neural scene representation and 3D Gaussian representation for high-fidelity spatial representation. In this paper, we propose a novel dense representation SLAM approach with a fusion of Generalized Iterative Closest Point (G-ICP) and 3D Gaussian Splatting (3DGS). In contrast to existing methods, we utilize a single Gaussian map for both tracking and mapping, resulting in mutual benefits. Through the exchange of covariances between tracking and mapping processes with scale alignment techniques, we minimize redundant computations and achieve an efficient system. Additionally, we enhance tracking accuracy and mapping quality through our keyframe selection methods. Experimental results demonstrate the effectiveness of our approach, showing an incredibly fast speed up to 107 FPS (for the entire system) and superior quality of the reconstructed map. △ Less

Submitted 22 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.11070 [pdf, other]

Controllable Relation Disentanglement for Few-Shot Class-Incremental Learning

Authors: Yuan Zhou, Richang Hong, Yanrong Guo, Lin Liu, Shijie Hao, Hanwang Zhang

Abstract: In this paper, we propose to tackle Few-Shot Class-Incremental Learning (FSCIL) from a new perspective, i.e., relation disentanglement, which means enhancing FSCIL via disentangling spurious relation between categories. The challenge of disentangling spurious correlations lies in the poor controllability of FSCIL. On one hand, an FSCIL model is required to be trained in an incremental manner and t… ▽ More In this paper, we propose to tackle Few-Shot Class-Incremental Learning (FSCIL) from a new perspective, i.e., relation disentanglement, which means enhancing FSCIL via disentangling spurious relation between categories. The challenge of disentangling spurious correlations lies in the poor controllability of FSCIL. On one hand, an FSCIL model is required to be trained in an incremental manner and thus it is very hard to directly control relationships between categories of different sessions. On the other hand, training samples per novel category are only in the few-shot setting, which increases the difficulty of alleviating spurious relation issues as well. To overcome this challenge, in this paper, we propose a new simple-yet-effective method, called ConTrollable Relation-disentangLed Few-Shot Class-Incremental Learning (CTRL-FSCIL). Specifically, during the base session, we propose to anchor base category embeddings in feature space and construct disentanglement proxies to bridge gaps between the learning for category representations in different sessions, thereby making category relation controllable. During incremental learning, the parameters of the backbone network are frozen in order to relieve the negative impact of data scarcity. Moreover, a disentanglement loss is designed to effectively guide a relation disentanglement controller to disentangle spurious correlations between the embeddings encoded by the backbone. In this way, the spurious correlation issue in FSCIL can be suppressed. Extensive experiments on CIFAR-100, mini-ImageNet, and CUB-200 datasets demonstrate the effectiveness of our CTRL-FSCIL method. △ Less

Submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.07860 [pdf, other]

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

Authors: Shihao Zhao, Shaozhe Hao, Bojia Zi, Huaizhe Xu, Kwan-Yee K. Wong

Abstract: Text-to-image generation has made significant advancements with the introduction of text-to-image diffusion models. These models typically consist of a language model that interprets user prompts and a vision model that generates corresponding images. As language and vision models continue to progress in their respective domains, there is a great potential in exploring the replacement of component… ▽ More Text-to-image generation has made significant advancements with the introduction of text-to-image diffusion models. These models typically consist of a language model that interprets user prompts and a vision model that generates corresponding images. As language and vision models continue to progress in their respective domains, there is a great potential in exploring the replacement of components in text-to-image diffusion models with more advanced counterparts. A broader research objective would therefore be to investigate the integration of any two unrelated language and generative vision models for text-to-image generation. In this paper, we explore this objective and propose LaVi-Bridge, a pipeline that enables the integration of diverse pre-trained language models and generative vision models for text-to-image generation. By leveraging LoRA and adapters, LaVi-Bridge offers a flexible and plug-and-play approach without requiring modifications to the original weights of the language and vision models. Our pipeline is compatible with various language models and generative vision models, accommodating different structures. Within this framework, we demonstrate that incorporating superior modules, such as more advanced language models or generative vision models, results in notable improvements in capabilities like text alignment or image quality. Extensive evaluations have been conducted to verify the effectiveness of LaVi-Bridge. Code is available at https://github.com/ShihaoZhaoZSH/LaVi-Bridge. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.05086 [pdf, other]

UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Sets

Authors: Youngju Na, Woo Jae Kim, Kyu Beom Han, Suhyeon Ha, Sung-eui Yoon

Abstract: Generalizable neural implicit surface reconstruction aims to obtain an accurate underlying geometry given a limited number of multi-view images from unseen scenes. However, existing methods select only informative and relevant views using predefined scores for training and testing phases. This constraint renders the model impractical in real-world scenarios, where the availability of favorable com… ▽ More Generalizable neural implicit surface reconstruction aims to obtain an accurate underlying geometry given a limited number of multi-view images from unseen scenes. However, existing methods select only informative and relevant views using predefined scores for training and testing phases. This constraint renders the model impractical in real-world scenarios, where the availability of favorable combinations cannot always be ensured. We introduce and validate a view-combination score to indicate the effectiveness of the input view combination. We observe that previous methods output degenerate solutions under arbitrary and unfavorable sets. Building upon this finding, we propose UFORecon, a robust view-combination generalizable surface reconstruction framework. To achieve this, we apply cross-view matching transformers to model interactions between source images and build correlation frustums to capture global correlations. Additionally, we explicitly encode pairwise feature similarities as view-consistent priors. Our proposed framework significantly outperforms previous methods in terms of view-combination generalizability and also in the conventional generalizable protocol trained with favorable view-combinations. The code is available at https://github.com/Youngju-Na/UFORecon. △ Less

Submitted 17 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: accepted at CVPR 2024 project page: https://youngju-na.github.io/uforecon.github.io/

arXiv:2403.04918 [pdf, other]

Secure Information Embedding and Extraction in Forensic 3D Fingerprinting

Authors: Canran Wang, Jinwen Wang, Mi Zhou, Vinh Pham, Senyue Hao, Chao Zhou, Ning Zhang, Netanel Raviv

Abstract: The prevalence of 3D printing poses a significant risk to public safety, as any individual with internet access and a commodity printer is able to produce untraceable firearms, keys, counterfeit products, etc. To aid government authorities in combating these new security threats, several approaches have been taken to tag 3D-prints with identifying information. Known as fingerprints, this informati… ▽ More The prevalence of 3D printing poses a significant risk to public safety, as any individual with internet access and a commodity printer is able to produce untraceable firearms, keys, counterfeit products, etc. To aid government authorities in combating these new security threats, several approaches have been taken to tag 3D-prints with identifying information. Known as fingerprints, this information is written into the object using various bit embedding techniques; examples include varying the height of the molten thermoplastic layers, and depositing metallic powder with different magnetic properties. Yet, the practicality of theses techniques in real-world forensic settings is hindered by the adversarial nature of this problem. That is, the 3D-printing process is out of reach of any law enforcement agencies; it is the adversary who controls all aspects of printing and possesses the printed object. To combat these threats, law enforcement agencies can regulate the manufacturing of 3D printers, on which they may enforce a fingerprinting scheme, and collect adversarially tampered remains (e.g., fragments of a broken 3D-printed firearm) during forensic investigation. Therefore, it is important to devise fingerprinting techniques so that the fingerprint could be extracted even if printing is carried out by the adversary. To this end, we present SIDE (Secure Information Embedding and Extraction), a fingerprinting framework that tackles the adversarial nature of forensic fingerprinting in 3D prints by offering both secure information embedding and secure information extraction. △ Less

Submitted 12 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03430 [pdf, other]

Discrete Consensus-Based Optimization

Authors: Junhyeok Byeon, Seung-Yeal Ha, Joong-Ho Won

Abstract: We propose Discrete Consensus-Based Optimization (DCBO), a fully discrete version of the Consensus-Based Optimization (CBO) framework. DCBO is a multi-agent method for the global optimization of possibly non-convex and non-differentiable functions. It aligns with the CBO paradigm, which promotes a consensus among agents towards a global optimum through simple stochastic dynamics amenable to rigoro… ▽ More We propose Discrete Consensus-Based Optimization (DCBO), a fully discrete version of the Consensus-Based Optimization (CBO) framework. DCBO is a multi-agent method for the global optimization of possibly non-convex and non-differentiable functions. It aligns with the CBO paradigm, which promotes a consensus among agents towards a global optimum through simple stochastic dynamics amenable to rigorous mathematical analysis. Despite the promises, there has been a gap between the analysis of CBO and the actual behavior of the agents from its time-discrete implementation, as the former has focused on the system of continuous stochastic differential equations defining the model or its mean-field approximation. In particular, direct analysis of CBO-type algorithms with heterogeneous stochasticity is very challenging. DCBO distinguishes itself from these approaches in the sense that it has no continuous counterpart, thanks to the replacement of the "softmin" operator with the "hardmin" one, which is inherently discrete. Yet, it maintains the operational principles of CBO and allows for rich mathematical analysis. We present conditions, independent of the number of agents, for achieving a consensus or convergence and study the circumstances under which global optimization occurs. We test DCBO on a large number of benchmark functions to show its merits. We also demonstrate that DCBO is applicable to a diverse range of real-world problems, including neural network training, compressed sensing, and portfolio optimization, with competitive performance. △ Less

Submitted 16 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

MSC Class: 37H10; 37N40; 65K10

arXiv:2402.10280 [pdf, other]

SusFL: Energy-Aware Federated Learning-based Monitoring for Sustainable Smart Farms

Authors: Dian Chen, Paul Yang, Ing-Ray Chen, Dong Sam Ha, Jin-Hee Cho

Abstract: We propose a novel energy-aware federated learning (FL)-based system, namely SusFL, for sustainable smart farming to address the challenge of inconsistent health monitoring due to fluctuating energy levels of solar sensors. This system equips animals, such as cattle, with solar sensors with computational capabilities, including Raspberry Pis, to train a local deep-learning model on health data. Th… ▽ More We propose a novel energy-aware federated learning (FL)-based system, namely SusFL, for sustainable smart farming to address the challenge of inconsistent health monitoring due to fluctuating energy levels of solar sensors. This system equips animals, such as cattle, with solar sensors with computational capabilities, including Raspberry Pis, to train a local deep-learning model on health data. These sensors periodically update Long Range (LoRa) gateways, forming a wireless sensor network (WSN) to detect diseases like mastitis. Our proposed SusFL system incorporates mechanism design, a game theory concept, for intelligent client selection to optimize monitoring quality while minimizing energy use. This strategy ensures the system's sustainability and resilience against adversarial attacks, including data poisoning and privacy threats, that could disrupt FL operations. Through extensive comparative analysis using real-time datasets, we demonstrate that our FL-based monitoring system significantly outperforms existing methods in prediction accuracy, operational efficiency, system reliability (i.e., mean time between failures or MTBF), and social welfare maximization by the mechanism designer. Our findings validate the superiority of our system for effective and sustainable animal health monitoring in smart farms. The experimental results show that SusFL significantly improves system performance, including a $10\%$ reduction in energy consumption, a $15\%$ increase in social welfare, and a $34\%$ rise in Mean Time Between Failures (MTBF), alongside a marginal increase in the global model's prediction accuracy. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.01787 [pdf, other]

Harm Amplification in Text-to-Image Models

Authors: Susan Hao, Renee Shelby, Yuchi Liu, Hansa Srinivasan, Mukul Bhutani, Burcu Karagol Ayan, Ryan Poplin, Shivani Poddar, Sarah Laszlo

Abstract: Text-to-image (T2I) models have emerged as a significant advancement in generative AI; however, there exist safety concerns regarding their potential to produce harmful image outputs even when users input seemingly safe prompts. This phenomenon, where T2I models generate harmful representations that were not explicit in the input, poses a potentially greater risk than adversarial prompts, leaving… ▽ More Text-to-image (T2I) models have emerged as a significant advancement in generative AI; however, there exist safety concerns regarding their potential to produce harmful image outputs even when users input seemingly safe prompts. This phenomenon, where T2I models generate harmful representations that were not explicit in the input, poses a potentially greater risk than adversarial prompts, leaving users unintentionally exposed to harms. Our paper addresses this issue by formalizing a definition for this phenomenon which we term harm amplification. We further contribute to the field by developing a framework of methodologies to quantify harm amplification in which we consider the harm of the model output in the context of user input. We then empirically examine how to apply these different methodologies to simulate real-world deployment scenarios including a quantification of disparate impacts across genders resulting from harm amplification. Together, our work aims to offer researchers tools to comprehensively address safety challenges in T2I systems and contribute to the responsible deployment of generative AI models. △ Less

Submitted 17 May, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2402.01338 [pdf, other]

Inferring the Langevin Equation with Uncertainty via Bayesian Neural Networks

Authors: Youngkyoung Bae, Seungwoong Ha, Hawoong Jeong

Abstract: Pervasive across diverse domains, stochastic systems exhibit fluctuations in processes ranging from molecular dynamics to climate phenomena. The Langevin equation has served as a common mathematical model for studying such systems, enabling predictions of their temporal evolution and analyses of thermodynamic quantities, including absorbed heat, work done on the system, and entropy production. How… ▽ More Pervasive across diverse domains, stochastic systems exhibit fluctuations in processes ranging from molecular dynamics to climate phenomena. The Langevin equation has served as a common mathematical model for studying such systems, enabling predictions of their temporal evolution and analyses of thermodynamic quantities, including absorbed heat, work done on the system, and entropy production. However, inferring the Langevin equation from observed trajectories remains challenging, particularly for nonlinear and high-dimensional systems. In this study, we present a comprehensive framework that employs Bayesian neural networks for inferring Langevin equations in both overdamped and underdamped regimes. Our framework first provides the drift force and diffusion matrix separately and then combines them to construct the Langevin equation. By providing a distribution of predictions instead of a single value, our approach allows us to assess prediction uncertainties, which can prevent potential misunderstandings and erroneous decisions about the system. We demonstrate the effectiveness of our framework in inferring Langevin equations for various scenarios including a neuron model and microscopic engine, highlighting its versatility and potential impact. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 30 pages, 17 figures

arXiv:2401.06146 [pdf, other]

AAMDM: Accelerated Auto-regressive Motion Diffusion Model

Authors: Tianyu Li, Calvin Qiao, Guanqiao Ren, KangKang Yin, Sehoon Ha

Abstract: Interactive motion synthesis is essential in creating immersive experiences in entertainment applications, such as video games and virtual reality. However, generating animations that are both high-quality and contextually responsive remains a challenge. Traditional techniques in the game industry can produce high-fidelity animations but suffer from high computational costs and poor scalability. T… ▽ More Interactive motion synthesis is essential in creating immersive experiences in entertainment applications, such as video games and virtual reality. However, generating animations that are both high-quality and contextually responsive remains a challenge. Traditional techniques in the game industry can produce high-fidelity animations but suffer from high computational costs and poor scalability. Trained neural network models alleviate the memory and speed issues, yet fall short on generating diverse motions. Diffusion models offer diverse motion synthesis with low memory usage, but require expensive reverse diffusion processes. This paper introduces the Accelerated Auto-regressive Motion Diffusion Model (AAMDM), a novel motion synthesis framework designed to achieve quality, diversity, and efficiency all together. AAMDM integrates Denoising Diffusion GANs as a fast Generation Module, and an Auto-regressive Diffusion Model as a Polishing Module. Furthermore, AAMDM operates in a lower-dimensional embedded space rather than the full-dimensional pose space, which reduces the training complexity as well as further improves the performance. We show that AAMDM outperforms existing methods in motion quality, diversity, and runtime efficiency, through comprehensive quantitative analyses and visual comparisons. We also demonstrate the effectiveness of each algorithmic component through ablation studies. △ Less

Submitted 2 December, 2023; originally announced January 2024.

arXiv:2401.01629 [pdf, ps, other]

Synthetic Data in AI: Challenges, Applications, and Ethical Implications

Authors: Shuang Hao, Wenfeng Han, Tao Jiang, Yiping Li, Haonan Wu, Chunlin Zhong, Zhangjun Zhou, He Tang

Abstract: In the rapidly evolving field of artificial intelligence, the creation and utilization of synthetic datasets have become increasingly significant. This report delves into the multifaceted aspects of synthetic data, particularly emphasizing the challenges and potential biases these datasets may harbor. It explores the methodologies behind synthetic data generation, spanning traditional statistical… ▽ More In the rapidly evolving field of artificial intelligence, the creation and utilization of synthetic datasets have become increasingly significant. This report delves into the multifaceted aspects of synthetic data, particularly emphasizing the challenges and potential biases these datasets may harbor. It explores the methodologies behind synthetic data generation, spanning traditional statistical models to advanced deep learning techniques, and examines their applications across diverse domains. The report also critically addresses the ethical considerations and legal implications associated with synthetic datasets, highlighting the urgent need for mechanisms to ensure fairness, mitigate biases, and uphold ethical standards in AI development. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.11979 [pdf, other]

Origin of chirality in transition-metal dichalcogenides

Authors: Kwangrae Kim, Hyun-Woo J. Kim, Seunghyeok Ha, Hoon Kim, Jin-Kwang Kim, Jaehwon Kim, Hyunsung Kim, Junyoung Kwon, Jihoon Seol, Saegyeol Jung, Changyoung Kim, Ahmet Alatas, Ayman Said, Michael Merz, Matthieu Le Tacon, Jin Mo Bok, Ki-Seok Kim, B. J. Kim

Abstract: Chirality is a ubiquitous phenomenon in which a symmetry between left- and right-handed objects is broken, examples in nature ranging from subatomic particles and molecules to living organisms. In particle physics, the weak force is responsible for the symmetry breaking and parity violation in beta decay, but in condensed matter systems interactions that lead to chirality remain poorly understood.… ▽ More Chirality is a ubiquitous phenomenon in which a symmetry between left- and right-handed objects is broken, examples in nature ranging from subatomic particles and molecules to living organisms. In particle physics, the weak force is responsible for the symmetry breaking and parity violation in beta decay, but in condensed matter systems interactions that lead to chirality remain poorly understood. Here, we unravel the mechanism of chiral charge density wave formation in the transition-metal dichalcogenide 1T-TiSe2. Using representation analysis, we show that charge density modulations and ionic displacements, which transform as a continuous scalar field and a vector field on a discrete lattice, respectively, follow different irreducible representations of the space group, despite the fact that they propagate with the same wave-vectors and are strongly coupled to each other. This charge-lattice symmetry frustration is resolved by further breaking of all symmetries not common to both sectors through induced lattice distortions, thus leading to chirality. Our theory is verified using Raman spectroscopy and inelastic x-ray scattering, which reveal that all but translation symmetries are broken at a level not resolved by state-of-the-art diffraction techniques. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 10 pages, 3 figures, 1 table

arXiv:2312.03275 [pdf, other]

VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

Authors: Naoki Yokoyama, Sehoon Ha, Dhruv Batra, Jiuguang Wang, Bernadette Bucher

Abstract: Understanding how humans leverage semantic knowledge to navigate unfamiliar environments and decide where to explore next is pivotal for developing robots capable of human-like search behaviors. We introduce a zero-shot navigation approach, Vision-Language Frontier Maps (VLFM), which is inspired by human reasoning and designed to navigate towards unseen semantic objects in novel environments. VLFM… ▽ More Understanding how humans leverage semantic knowledge to navigate unfamiliar environments and decide where to explore next is pivotal for developing robots capable of human-like search behaviors. We introduce a zero-shot navigation approach, Vision-Language Frontier Maps (VLFM), which is inspired by human reasoning and designed to navigate towards unseen semantic objects in novel environments. VLFM builds occupancy maps from depth observations to identify frontiers, and leverages RGB observations and a pre-trained vision-language model to generate a language-grounded value map. VLFM then uses this map to identify the most promising frontier to explore for finding an instance of a given target object category. We evaluate VLFM in photo-realistic environments from the Gibson, Habitat-Matterport 3D (HM3D), and Matterport 3D (MP3D) datasets within the Habitat simulator. Remarkably, VLFM achieves state-of-the-art results on all three datasets as measured by success weighted by path length (SPL) for the Object Goal Navigation task. Furthermore, we show that VLFM's zero-shot nature enables it to be readily deployed on real-world robots such as the Boston Dynamics Spot mobile manipulation platform. We deploy VLFM on Spot and demonstrate its capability to efficiently navigate to target objects within an office building in the real world, without any prior knowledge of the environment. The accomplishments of VLFM underscore the promising potential of vision-language models in advancing the field of semantic navigation. Videos of real-world deployment can be viewed at naoki.io/vlfm. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.15209 [pdf, other]

See and Think: Embodied Agent in Virtual Environment

Authors: Zhonghan Zhao, Wenhao Chai, Xuan Wang, Li Boyi, Shengyu Hao, Shidong Cao, Tian Ye, Gaoang Wang

Abstract: Large language models (LLMs) have achieved impressive pro-gress on several open-world tasks. Recently, using LLMs to build embodied agents has been a hotspot. This paper proposes STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment. STEVE comprises three key components: vision perception, language instruction, and code action. Vision perception involves interpre… ▽ More Large language models (LLMs) have achieved impressive pro-gress on several open-world tasks. Recently, using LLMs to build embodied agents has been a hotspot. This paper proposes STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment. STEVE comprises three key components: vision perception, language instruction, and code action. Vision perception involves interpreting visual information in the environment, which is then integrated into the LLMs component with agent state and task instruction. Language instruction is responsible for iterative reasoning and decomposing complex tasks into manageable guidelines. Code action generates executable skill actions based on retrieval in skill database, enabling the agent to interact effectively within the Minecraft environment. We also collect STEVE-21K dataset, which includes 600+ vision-environment pairs, 20K knowledge question-answering pairs, and 200+ skill-code pairs. We conduct continuous block search, knowledge question and answering, and tech tree mastery to evaluate the performance. Extensive experiments show that STEVE achieves at most 1.5x faster unlocking key tech trees and 2.5x quicker in block search tasks. △ Less

Submitted 9 July, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: ECCV 2024. First three authors contribute equally to this work. Project Website https://rese1f.github.io/STEVE/

arXiv:2311.12467 [pdf, other]

GLAD: Global-Local View Alignment and Background Debiasing for Unsupervised Video Domain Adaptation with Large Domain Gap

Authors: Hyogun Lee, Kyungho Bae, Seong Jong Ha, Yumin Ko, Gyeong-Moon Park, Jinwoo Choi

Abstract: In this work, we tackle the challenging problem of unsupervised video domain adaptation (UVDA) for action recognition. We specifically focus on scenarios with a substantial domain gap, in contrast to existing works primarily deal with small domain gaps between labeled source domains and unlabeled target domains. To establish a more realistic setting, we introduce a novel UVDA scenario, denoted as… ▽ More In this work, we tackle the challenging problem of unsupervised video domain adaptation (UVDA) for action recognition. We specifically focus on scenarios with a substantial domain gap, in contrast to existing works primarily deal with small domain gaps between labeled source domains and unlabeled target domains. To establish a more realistic setting, we introduce a novel UVDA scenario, denoted as Kinetics->BABEL, with a more considerable domain gap in terms of both temporal dynamics and background shifts. To tackle the temporal shift, i.e., action duration difference between the source and target domains, we propose a global-local view alignment approach. To mitigate the background shift, we propose to learn temporal order sensitive representations by temporal order learning and background invariant representations by background augmentation. We empirically validate that the proposed method shows significant improvement over the existing methods on the Kinetics->BABEL dataset with a large domain gap. The code is available at https://github.com/KHUVLL/GLAD. △ Less

Submitted 22 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: This is an accepted WACV 2024 paper. Our code is available at https://github.com/KHUVLL/GLAD

arXiv:2311.10261 [pdf, other]

Vision meets mmWave Radar: 3D Object Perception Benchmark for Autonomous Driving

Authors: Yizhou Wang, Jen-Hao Cheng, Jui-Te Huang, Sheng-Yao Kuan, Qiqian Fu, Chiming Ni, Shengyu Hao, Gaoang Wang, Guanbin Xing, Hui Liu, Jenq-Neng Hwang

Abstract: Sensor fusion is crucial for an accurate and robust perception system on autonomous vehicles. Most existing datasets and perception solutions focus on fusing cameras and LiDAR. However, the collaboration between camera and radar is significantly under-exploited. The incorporation of rich semantic information from the camera, and reliable 3D information from the radar can potentially achieve an eff… ▽ More Sensor fusion is crucial for an accurate and robust perception system on autonomous vehicles. Most existing datasets and perception solutions focus on fusing cameras and LiDAR. However, the collaboration between camera and radar is significantly under-exploited. The incorporation of rich semantic information from the camera, and reliable 3D information from the radar can potentially achieve an efficient, cheap, and portable solution for 3D object perception tasks. It can also be robust to different lighting or all-weather driving scenarios due to the capability of mmWave radars. In this paper, we introduce the CRUW3D dataset, including 66K synchronized and well-calibrated camera, radar, and LiDAR frames in various driving scenarios. Unlike other large-scale autonomous driving datasets, our radar data is in the format of radio frequency (RF) tensors that contain not only 3D location information but also spatio-temporal semantic information. This kind of radar format can enable machine learning models to generate more reliable object perception results after interacting and fusing the information or features between the camera and radar. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.02304 [pdf, other]

Imitating and Finetuning Model Predictive Control for Robust and Symmetric Quadrupedal Locomotion

Authors: Donghoon Youm, Hyunyoung Jung, Hyeongjun Kim, Jemin Hwangbo, Hae-Won Park, Sehoon Ha

Abstract: Control of legged robots is a challenging problem that has been investigated by different approaches, such as model-based control and learning algorithms. This work proposes a novel Imitating and Finetuning Model Predictive Control (IFM) framework to take the strengths of both approaches. Our framework first develops a conventional model predictive controller (MPC) using Differential Dynamic Progr… ▽ More Control of legged robots is a challenging problem that has been investigated by different approaches, such as model-based control and learning algorithms. This work proposes a novel Imitating and Finetuning Model Predictive Control (IFM) framework to take the strengths of both approaches. Our framework first develops a conventional model predictive controller (MPC) using Differential Dynamic Programming and Raibert heuristic, which serves as an expert policy. Then we train a clone of the MPC using imitation learning to make the controller learnable. Finally, we leverage deep reinforcement learning with limited exploration for further finetuning the policy on more challenging terrains. By conducting comprehensive simulation and hardware experiments, we demonstrate that the proposed IFM framework can significantly improve the performance of the given MPC controller on rough, slippery, and conveyor terrains that require careful coordination of footsteps. We also showcase that IFM can efficiently produce more symmetric, periodic, and energy-efficient gaits compared to Vanilla RL with a minimal burden of reward shaping. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2310.15151 [pdf, other]

Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number

Authors: Sophie Hao, Tal Linzen

Abstract: Deep architectures such as Transformers are sometimes criticized for having uninterpretable "black-box" representations. We use causal intervention analysis to show that, in fact, some linguistic features are represented in a linear, interpretable format. Specifically, we show that BERT's ability to conjugate verbs relies on a linear encoding of subject number that can be manipulated with predicta… ▽ More Deep architectures such as Transformers are sometimes criticized for having uninterpretable "black-box" representations. We use causal intervention analysis to show that, in fact, some linguistic features are represented in a linear, interpretable format. Specifically, we show that BERT's ability to conjugate verbs relies on a linear encoding of subject number that can be manipulated with predictable effects on conjugation accuracy. This encoding is found in the subject position at the first layer and the verb position at the last layer, but distributed across positions at middle layers, particularly when there are multiple cues to subject number. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: To appear in Findings of the Association for Computational Linguistics: EMNLP 2023

arXiv:2310.10606 [pdf, other]

BayRnTune: Adaptive Bayesian Domain Randomization via Strategic Fine-tuning

Authors: Tianle Huang, Nitish Sontakke, K. Niranjan Kumar, Irfan Essa, Stefanos Nikolaidis, Dennis W. Hong, Sehoon Ha

Abstract: Domain randomization (DR), which entails training a policy with randomized dynamics, has proven to be a simple yet effective algorithm for reducing the gap between simulation and the real world. However, DR often requires careful tuning of randomization parameters. Methods like Bayesian Domain Randomization (Bayesian DR) and Active Domain Randomization (Adaptive DR) address this issue by automatin… ▽ More Domain randomization (DR), which entails training a policy with randomized dynamics, has proven to be a simple yet effective algorithm for reducing the gap between simulation and the real world. However, DR often requires careful tuning of randomization parameters. Methods like Bayesian Domain Randomization (Bayesian DR) and Active Domain Randomization (Adaptive DR) address this issue by automating parameter range selection using real-world experience. While effective, these algorithms often require long computation time, as a new policy is trained from scratch every iteration. In this work, we propose Adaptive Bayesian Domain Randomization via Strategic Fine-tuning (BayRnTune), which inherits the spirit of BayRn but aims to significantly accelerate the learning processes by fine-tuning from previously learned policy. This idea leads to a critical question: which previous policy should we use as a prior during fine-tuning? We investigated four different fine-tuning strategies and compared them against baseline algorithms in five simulated environments, ranging from simple benchmark tasks to more complex legged robot environments. Our analysis demonstrates that our method yields better rewards in the same amount of timesteps compared to vanilla domain randomization or Bayesian DR. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.06226 [pdf, other]

Words into Action: Learning Diverse Humanoid Robot Behaviors using Language Guided Iterative Motion Refinement

Authors: K. Niranjan Kumar, Irfan Essa, Sehoon Ha

Abstract: Humanoid robots are well suited for human habitats due to their morphological similarity, but developing controllers for them is a challenging task that involves multiple sub-problems, such as control, planning and perception. In this paper, we introduce a method to simplify controller design by enabling users to train and fine-tune robot control policies using natural language commands. We first… ▽ More Humanoid robots are well suited for human habitats due to their morphological similarity, but developing controllers for them is a challenging task that involves multiple sub-problems, such as control, planning and perception. In this paper, we introduce a method to simplify controller design by enabling users to train and fine-tune robot control policies using natural language commands. We first learn a neural network policy that generates behaviors given a natural language command, such as "walk forward", by combining Large Language Models (LLMs), motion retargeting, and motion imitation. Based on the synthesized motion, we iteratively fine-tune by updating the text prompt and querying LLMs to find the best checkpoint associated with the closest motion in history. We validate our approach using a simulated Digit humanoid robot and demonstrate learning of diverse motions, such as walking, hopping, and kicking, without the burden of complex reward engineering. In addition, we show that our iterative refinement enables us to learn 3x times faster than a naive formulation that learns from scratch. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2310.02091 [pdf]

Dual-resonance nanostructures for colour down-conversion of colloidal quantum emitters

Authors: Son Tung Ha, Emmanuel Lassalle, Xiao Liang, Thi Thu Ha Do, Ian Foo, Sushant Shendre, Emek Goksu Durmusoglu, Vytautas Valuckas, Sourav Adhikary, Ramon Paniagua-Dominguez, Hilmi Volkan Demir, Arseniy Kuznetsov

Abstract: Linear colour conversion is a process where an emitter absorbs a photon and then emits another photon with either higher or lower energy, corresponding to up- or down conversion, respectively. In this regard, the presence of a volumetric cavity plays a crucial role in enhancing absorption and photoluminescence (PL), as it allows for large volumes of interaction between the exciting photons and the… ▽ More Linear colour conversion is a process where an emitter absorbs a photon and then emits another photon with either higher or lower energy, corresponding to up- or down conversion, respectively. In this regard, the presence of a volumetric cavity plays a crucial role in enhancing absorption and photoluminescence (PL), as it allows for large volumes of interaction between the exciting photons and the emissive materials, maximising the colour conversion efficiency. Here, we present a dual resonance nanostructure made of a titanium dioxide (TiO2) subwavelength grating to enhance the colour down-conversion efficiency of green light at ~530 nm emitted by gradient alloyed CdxZn1-xSeyS1-y colloidal quantum dots (QDs) when excited with a blue light at ~460 nm. A large mode volume can be created within the QD layer by the hybridisation of the grating resonances and waveguide modes. This allows increasing mode overlap between the resonances and the QDs, resulting in large absorption and tailored emission enhancements. Particularly, we achieved polarized light emission with maximum photoluminescence enhancement of ~140 times at a specific angular direction, and a total enhancement of ~34 times within 0.55 numerical aperture (NA) of the collecting objective. The enhancement encompasses absorption enhancement, Purcell enhancement and directionality enhancement (i.e., outcoupling). We achieved total absorption of 35% for green QDs with a remarkably thin colour conversion layer of ~ 400 nm (inclusive of the TiO2 layer). This work provides a guideline for designing large-volume cavities for practical application in absorption/fluorescence enhancement, such as down colour conversion in microLED displays, detectors or photovoltaics. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 37 pages, 11 figures (4 in maintext, 11 in SI)

arXiv:2310.01273 [pdf, other]

Learning manipulation of steep granular slopes for fast Mini Rover turning

Authors: Deniz Kerimoglu, Daniel Soto, Malone Lincoln Hemsley, Joseph Brunner, Sehoon Ha, Tingnan Zhang, Daniel I. Goldman

Abstract: Future planetary exploration missions will require reaching challenging regions such as craters and steep slopes. Such regions are ubiquitous and present science-rich targets potentially containing information regarding the planet's internal structure. Steep slopes consisting of low-cohesion regolith are prone to flow downward under small disturbances, making it very challenging for autonomous rov… ▽ More Future planetary exploration missions will require reaching challenging regions such as craters and steep slopes. Such regions are ubiquitous and present science-rich targets potentially containing information regarding the planet's internal structure. Steep slopes consisting of low-cohesion regolith are prone to flow downward under small disturbances, making it very challenging for autonomous rovers to traverse. Moreover, the navigation trajectories of rovers are heavily limited by the terrain topology and future systems will need to maneuver on flowable surfaces without getting trapped, allowing them to further expand their reach and increase mission efficiency. In this work, we used a laboratory-scale rover robot and performed maneuvering experiments on a steep granular slope of poppy seeds to explore the rover's turning capabilities. The rover is capable of lifting, sweeping, and spinning its wheels, allowing it to execute leg-like gait patterns. The high-dimensional actuation capabilities of the rover facilitate effective manipulation of the underlying granular surface. We used Bayesian Optimization (BO) to gain insight into successful turning gaits in high dimensional search space and found strategies such as differential wheel spinning and pivoting around a single sweeping wheel. We then used these insights to further fine-tune the turning gait, enabling the rover to turn 90 degrees at just above 4 seconds with minimal slip. Combining gait optimization and human-tuning approaches, we found that fast turning is empowered by creating anisotropic torques with the sweeping wheel. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: 6 pages, 6 figures, conference paper submission for ICRA2024

arXiv:2310.00886 [pdf, other]

doi 10.1038/s41586-023-06829-4

Quantum spin nematic phase in a square-lattice iridate

Authors: Hoon Kim, Jin-Kwang Kim, Jimin Kim, Hyun-Woo J. Kim, Seunghyeok Ha, Kwangrae Kim, Wonjun Lee, Jonghwan Kim, Gil Young Cho, Hyeokjun Heo, Joonho Jang, J. Strempfer, G. Fabbris, Y. Choi, D. Haskel, Jungho Kim, J. -W. Kim, B. J. Kim

Abstract: Spin nematic (SN) is a magnetic analog of classical liquid crystals, a fourth state of matter exhibiting characteristics of both liquid and solid. Particularly intriguing is a valence-bond SN, in which spins are quantum entangled to form a multi-polar order without breaking time-reversal symmetry, but its unambiguous experimental realization remains elusive. Here, we establish a SN phase in the sq… ▽ More Spin nematic (SN) is a magnetic analog of classical liquid crystals, a fourth state of matter exhibiting characteristics of both liquid and solid. Particularly intriguing is a valence-bond SN, in which spins are quantum entangled to form a multi-polar order without breaking time-reversal symmetry, but its unambiguous experimental realization remains elusive. Here, we establish a SN phase in the square-lattice iridate Sr$_2$IrO$_4$, which approximately realizes a pseudospin one-half Heisenberg antiferromagnet (AF) in the strong spin-orbit coupling limit. Upon cooling, the transition into the SN phase at T$_C$ $\approx$ 263 K is marked by a divergence in the static spin quadrupole susceptibility extracted from our Raman spectra, and concomitant emergence of a collective mode associated with the spontaneous breaking of rotational symmetries. The quadrupolar order persists in the antiferromagnetic (AF) phase below T$_N$ $\approx$ 230 K, and becomes directly observable through its interference with the AF order in resonant x-ray diffraction, which allows us to uniquely determine its spatial structure. Further, we find using resonant inelastic x-ray scattering a complete breakdown of coherent magnon excitations at short-wavelength scales, suggesting a resonating-valence-bond-like quantum entanglement in the AF state. Taken together, our results reveal a quantum order underlying the Néel AF that is widely believed to be intimately connected to the mechanism of high temperature superconductivity (HTSC). △ Less

Submitted 14 December, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Published in https://www.nature.com/articles/s41586-023-06829-4

arXiv:2309.17046 [pdf, other]

CrossLoco: Human Motion Driven Control of Legged Robots via Guided Unsupervised Reinforcement Learning

Authors: Tianyu Li, Hyunyoung Jung, Matthew Gombolay, Yong Kwon Cho, Sehoon Ha

Abstract: Human motion driven control (HMDC) is an effective approach for generating natural and compelling robot motions while preserving high-level semantics. However, establishing the correspondence between humans and robots with different body structures is not straightforward due to the mismatches in kinematics and dynamics properties, which causes intrinsic ambiguity to the problem. Many previous algo… ▽ More Human motion driven control (HMDC) is an effective approach for generating natural and compelling robot motions while preserving high-level semantics. However, establishing the correspondence between humans and robots with different body structures is not straightforward due to the mismatches in kinematics and dynamics properties, which causes intrinsic ambiguity to the problem. Many previous algorithms approach this motion retargeting problem with unsupervised learning, which requires the prerequisite skill sets. However, it will be extremely costly to learn all the skills without understanding the given human motions, particularly for high-dimensional robots. In this work, we introduce CrossLoco, a guided unsupervised reinforcement learning framework that simultaneously learns robot skills and their correspondence to human motions. Our key innovation is to introduce a cycle-consistency-based reward term designed to maximize the mutual information between human motions and robot states. We demonstrate that the proposed framework can generate compelling robot motions by translating diverse human motions, such as running, hopping, and dancing. We quantitatively compare our CrossLoco against the manually engineered and unsupervised baseline algorithms along with the ablated versions of our framework and demonstrate that our method translates human motions with better accuracy, diversity, and user preference. We also showcase its utility in other applications, such as synthesizing robot movements from language input and enabling interactive robot control. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.16538 [pdf, ps, other]

On the emergent dynamics of the infinite set of Kuramoto oscillators

Authors: Seung-Yeal Ha, Euntaek Lee, Woojoo Shim

Abstract: We propose an infinite Kuramoto model for a countably infinite set of Kuramoto oscillators and study its emergent dynamics for two classes of network topologies. For a class of symmetric and row(or column)-summable network topology, we show that a homogeneous ensemble exhibits complete synchronization, and the infinite Kuramoto model can cast as a gradient flow, whereas we obtain a weak synchroniz… ▽ More We propose an infinite Kuramoto model for a countably infinite set of Kuramoto oscillators and study its emergent dynamics for two classes of network topologies. For a class of symmetric and row(or column)-summable network topology, we show that a homogeneous ensemble exhibits complete synchronization, and the infinite Kuramoto model can cast as a gradient flow, whereas we obtain a weak synchronization estimate, namely practical synchronization for a heterogeneous ensemble. Unlike with the finite Kuramoto model, phase diameter can be constant for some class of network topologies which is a novel feature of the infinite model. We also consider a second class of network topology (so-called a sender network) in which coupling strengths are proportional to a constant that depends only on sender's index number. For this network topology, we have a better control on emergent dynamics. For a homogeneous ensemble, there are only two possible asymptotic states, complete phase synchrony or bi-cluster configuration in any positive coupling strengths. In contrast, for a heterogeneous ensemble, complete synchronization occurs exponentially fast for a class of initial configuration confined in a quarter arc. △ Less

Submitted 3 October, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

MSC Class: 34D05; 34G20; 70F45

arXiv:2309.13175 [pdf]

American Family Cohort, a data resource description

Authors: Deepa Balraj, Ayin Vala, Shiying Hao, Melanie Philofsky, Anna Tsvetkova, Elena Trach, Shravani Priya Narra, Oleg Zhuk, Mary Shamkhorskaya, Jim Singer, Joseph Mesterhazy, Somalee Datta, Isabella Chu, David Rehkopf

Abstract: This manuscript is a research resource description and presents a large and novel Electronic Health Records (EHR) data resource, American Family Cohort (AFC). The AFC data is derived from Centers for Medicare and Medicaid Services (CMS) certified American Board of Family Medicine (ABFM) PRIME registry. The PRIME registry is the largest national Qualified Clinical Data Registry (QCDR) for Primary C… ▽ More This manuscript is a research resource description and presents a large and novel Electronic Health Records (EHR) data resource, American Family Cohort (AFC). The AFC data is derived from Centers for Medicare and Medicaid Services (CMS) certified American Board of Family Medicine (ABFM) PRIME registry. The PRIME registry is the largest national Qualified Clinical Data Registry (QCDR) for Primary Care. The data is converted to a popular common data model, the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The resource presents approximately 90 million encounters for 7.5 million patients. All 100% of the patients present age, gender, and address information, and 73% report race. Nealy 93% of patients have lab data in LOINC, 86% have medication data in RxNorm, 93% have diagnosis in SNOWMED and ICD, 81% have procedures in HCPCS or CPT, and 61% have insurance information. The richness, breadth, and diversity of this research accessible and research ready data is expected to accelerate observational studies in many diverse areas. We expect this resource to facilitate research in many years to come. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.10436 [pdf, other]

LiDAR-Generated Images Derived Keypoints Assisted Point Cloud Registration Scheme in Odometry Estimation

Authors: Haizhou Zhang, Xianjia Yu, Sier Ha, Tomi Westerlund

Abstract: Keypoint detection and description play a pivotal role in various robotics and autonomous applications including visual odometry (VO), visual navigation, and Simultaneous localization and mapping (SLAM). While a myriad of keypoint detectors and descriptors have been extensively studied in conventional camera images, the effectiveness of these techniques in the context of LiDAR-generated images, i.… ▽ More Keypoint detection and description play a pivotal role in various robotics and autonomous applications including visual odometry (VO), visual navigation, and Simultaneous localization and mapping (SLAM). While a myriad of keypoint detectors and descriptors have been extensively studied in conventional camera images, the effectiveness of these techniques in the context of LiDAR-generated images, i.e. reflectivity and ranges images, has not been assessed. These images have gained attention due to their resilience in adverse conditions such as rain or fog. Additionally, they contain significant textural information that supplements the geometric information provided by LiDAR point clouds in the point cloud registration phase, especially when reliant solely on LiDAR sensors. This addresses the challenge of drift encountered in LiDAR Odometry (LO) within geometrically identical scenarios or where not all the raw point cloud is informative and may even be misleading. This paper aims to analyze the applicability of conventional image key point extractors and descriptors on LiDAR-generated images via a comprehensive quantitative investigation. Moreover, we propose a novel approach to enhance the robustness and reliability of LO. After extracting key points, we proceed to downsample the point cloud, subsequently integrating it into the point cloud registration phase for the purpose of odometry estimation. Our experiment demonstrates that the proposed approach has comparable accuracy but reduced computational overhead, higher odometry publishing rate, and even superior performance in scenarios prone to drift by using the raw point cloud. This, in turn, lays a foundation for subsequent investigations into the integration of LiDAR-generated images with LO. Our code is available on GitHub: https://github.com/TIERS/ws-lidar-as-camera-odom. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.08230 [pdf, other]

A Duty to Forget, a Right to be Assured? Exposing Vulnerabilities in Machine Unlearning Services

Authors: Hongsheng Hu, Shuo Wang, Jiamin Chang, Haonan Zhong, Ruoxi Sun, Shuang Hao, Haojin Zhu, Minhui Xue

Abstract: The right to be forgotten requires the removal or "unlearning" of a user's data from machine learning models. However, in the context of Machine Learning as a Service (MLaaS), retraining a model from scratch to fulfill the unlearning request is impractical due to the lack of training data on the service provider's side (the server). Furthermore, approximate unlearning further embraces a complex tr… ▽ More The right to be forgotten requires the removal or "unlearning" of a user's data from machine learning models. However, in the context of Machine Learning as a Service (MLaaS), retraining a model from scratch to fulfill the unlearning request is impractical due to the lack of training data on the service provider's side (the server). Furthermore, approximate unlearning further embraces a complex trade-off between utility (model performance) and privacy (unlearning performance). In this paper, we try to explore the potential threats posed by unlearning services in MLaaS, specifically over-unlearning, where more information is unlearned than expected. We propose two strategies that leverage over-unlearning to measure the impact on the trade-off balancing, under black-box access settings, in which the existing machine unlearning attacks are not applicable. The effectiveness of these strategies is evaluated through extensive experiments on benchmark datasets, across various model architectures and representative unlearning approaches. Results indicate significant potential for both strategies to undermine model efficacy in unlearning scenarios. This study uncovers an underexplored gap between unlearning and contemporary MLaaS, highlighting the need for careful considerations in balancing data unlearning, model utility, and security. △ Less

Submitted 15 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: To Appear in the Network and Distributed System Security Symposium (NDSS) 2024, San Diego, CA, USA

Showing 1–50 of 398 results for author: Hao, S