Skip to main content

Showing 1–50 of 398 results for author: Hao, S

  1. arXiv:2407.07077  [pdf, other

    cs.CV cs.AI

    ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction

    Authors: Shaozhe Hao, Kai Han, Zhengyao Lv, Shihao Zhao, Kwan-Yee K. Wong

    Abstract: While personalized text-to-image generation has enabled the learning of a single concept from multiple images, a more practical yet challenging scenario involves learning multiple concepts within a single image. However, existing works tackling this scenario heavily rely on extensive human annotations. In this paper, we introduce a novel task named Unsupervised Concept Extraction (UCE) that consid… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, Project page: https://haoosz.github.io/ConceptExpress/

  2. arXiv:2407.06780  [pdf, other

    cs.CV

    CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection

    Authors: Shuang Hao, Chunlin Zhong, He Tang

    Abstract: The depth/thermal information is beneficial for detecting salient object with conventional RGB images. However, in dual-modal salient object detection (SOD) model, the robustness against noisy inputs and modality missing is crucial but rarely studied. To tackle this problem, we introduce \textbf{Co}nditional Dropout and \textbf{LA}nguage-driven(\textbf{CoLA}) framework comprising two core componen… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  3. arXiv:2407.04213  [pdf

    cs.CR cs.NI

    Pathfinder: Exploring Path Diversity for Assessing Internet Censorship Inconsistency

    Authors: Xiaoqin Liang, Guannan Liu, Lin Jin, Shuai Hao, Haining Wang

    Abstract: Internet censorship is typically enforced by authorities to achieve information control for a certain group of Internet users. So far existing censorship studies have primarily focused on country-level characterization because (1) in many cases, censorship is enabled by governments with nationwide policies and (2) it is usually hard to control how the probing packets are routed to trigger censorsh… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  4. SmartAxe: Detecting Cross-Chain Vulnerabilities in Bridge Smart Contracts via Fine-Grained Static Analysis

    Authors: Zeqin Liao, Yuhong Nan, Henglong Liang, Sicheng Hao, Juan Zhai, Jiajing Wu, Zibin Zheng

    Abstract: With the increasing popularity of blockchain, different blockchain platforms coexist in the ecosystem (e.g., Ethereum, BNB, EOSIO, etc.), which prompts the high demand for cross-chain communication. Cross-chain bridge is a specific type of decentralized application for asset exchange across different blockchain platforms. Securing the smart contracts of cross-chain bridges is in urgent need, as th… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Journal ref: The ACM International Conference on the Foundations of Software Engineering 2024

  5. SmartState: Detecting State-Reverting Vulnerabilities in Smart Contracts via Fine-Grained State-Dependency Analysis

    Authors: Zeqin Liao, Sicheng Hao, Yuhong Nan, Zibin Zheng

    Abstract: Smart contracts written in Solidity are widely used in different blockchain platforms such as Ethereum, TRON and BNB Chain. One of the unique designs in Solidity smart contracts is its state-reverting mechanism for error handling and access control. Unfortunately, a number of recent security incidents showed that adversaries also utilize this mechanism to manipulate critical states of smart contra… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 12 pages, 10 figures

    Journal ref: ISSTA 2023

  6. arXiv:2406.09455  [pdf, other

    cs.CV cs.AI cs.CL

    Pandora: Towards General World Model with Natural Language Actions and Video States

    Authors: Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

    Abstract: World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Website: https://world-model.maitrix.org/

  7. arXiv:2406.06615  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    Language Guided Skill Discovery

    Authors: Seungeun Rho, Laura Smith, Tianyu Li, Sergey Levine, Xue Bin Peng, Sehoon Ha

    Abstract: Skill discovery methods enable agents to learn diverse emergent behaviors without explicit rewards. To make learned skills useful for unknown downstream tasks, obtaining a semantically diverse repertoire of skills is essential. While some approaches introduce a discriminator to distinguish skills and others aim to increase state coverage, no existing work directly addresses the "semantic diversity… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  8. arXiv:2406.05673  [pdf, other

    cs.AI cs.CL

    Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking

    Authors: Fangxu Yu, Lai Jiang, Haoqiang Kang, Shibo Hao, Lianhui Qin

    Abstract: Divergent thinking, the cognitive process of generating diverse solutions, is a hallmark of human creativity and problem-solving. For machines, sampling diverse solution trajectories in complex reasoning problems is crucial for robust outcomes, data augmentation, and enhanced model generalization. Large language models (LLMs) often struggle with generating high-quality, diverse reasoning. While su… ▽ More

    Submitted 24 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  9. arXiv:2406.04983  [pdf, other

    cs.CV

    CityCraft: A Real Crafter for 3D City Generation

    Authors: Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang

    Abstract: City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neur… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 20 pages, 9 figures

  10. arXiv:2406.01152  [pdf, other

    cs.RO

    Learning-based legged locomotion; state of the art and future perspectives

    Authors: Sehoon Ha, Joonho Lee, Michiel van de Panne, Zhaoming Xie, Wenhao Yu, Majid Khadiv

    Abstract: Legged locomotion holds the premise of universal mobility, a critical capability for many real-world robotic applications. Both model-based and learning-based approaches have advanced the field of legged locomotion in the past three decades. In recent years, however, a number of factors have dramatically accelerated progress in learning-based methods, including the rise of deep learning, rapid pro… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  11. arXiv:2405.00223  [pdf, other

    cs.HC

    ConFides: A Visual Analytics Solution for Automated Speech Recognition Analysis and Exploration

    Authors: Sunwoo Ha, Chaehun Lim, R. Jordan Crouser, Alvitta Ottley

    Abstract: Confidence scores of automatic speech recognition (ASR) outputs are often inadequately communicated, preventing its seamless integration into analytical workflows. In this paper, we introduce ConFides, a visual analytic system developed in collaboration with intelligence analysts to address this issue. ConFides aims to aid exploration and post-AI-transcription editing by visually representing the… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  12. arXiv:2404.17609  [pdf, other

    cs.LG cs.AI cs.CL

    CoSD: Collaborative Stance Detection with Contrastive Heterogeneous Topic Graph Learning

    Authors: Yinghan Cheng, Qi Zhang, Chongyang Shi, Liang Xiao, Shufeng Hao, Liang Hu

    Abstract: Stance detection seeks to identify the viewpoints of individuals either in favor or against a given target or a controversial topic. Current advanced neural models for stance detection typically employ fully parametric softmax classifiers. However, these methods suffer from several limitations, including lack of explainability, insensitivity to the latent data structure, and unimodality, which gre… ▽ More

    Submitted 19 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: 13 pages

  13. arXiv:2404.15778  [pdf, other

    cs.LG cs.CL

    BASS: Batched Attention-optimized Speculative Sampling

    Authors: Haifeng Qian, Sujan Kumar Gonugondla, Sungsoo Ha, Mingyue Shang, Sanjay Krishna Gouda, Ramesh Nallapati, Sudipta Sengupta, Xiaofei Ma, Anoop Deoras

    Abstract: Speculative decoding has emerged as a powerful method to improve latency and throughput in hosting large language models. However, most existing implementations focus on generating a single sequence. Real-world generative AI applications often require multiple responses and how to perform speculative decoding in a batched setting while preserving its latency benefits poses non-trivial challenges.… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  14. arXiv:2404.14521  [pdf, other

    cs.HC

    Guided By AI: Navigating Trust, Bias, and Data Exploration in AI-Guided Visual Analytics

    Authors: Sunwoo Ha, Shayan Monadjemi, Alvitta Ottley

    Abstract: The increasing integration of artificial intelligence (AI) in visual analytics (VA) tools raises vital questions about the behavior of users, their trust, and the potential of induced biases when provided with guidance during data exploration. We present an experiment where participants engaged in a visual data exploration task while receiving intelligent suggestions supplemented with four differe… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  15. arXiv:2404.14072  [pdf, other

    math.AP

    Measure-valued death state and local sensitivity analysis for Winfree models with uncertain high-order couplings

    Authors: Seung-Yeal Ha, Myeongju Kang, Jaeyoung Yoon, Mattia Zanella

    Abstract: We study the measure-valued death state and local sensitivity analysis of the Winfree model and its mean-field counterpart with uncertain high-order couplings. The Winfree model is the first mathematical model for synchronization, and it can cast as the effective approximation of the pulse-coupled model for synchronization, and it exhibits diverse asymptotic patterns depending on system parameters… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  16. arXiv:2404.13256  [pdf

    physics.optics

    Electrically generated exciton polaritons with spin on-demand

    Authors: Yutao Wang, Giorgio Adamo, Son Tung Ha, Jingyi Tian, Cesare Soci

    Abstract: Generation and manipulation of exciton polaritons with controllable spin could deeply impact spintronic applications, quantum simulations, and quantum information processing, but is inherently challenging due to the charge neutrality of the polariton and the device complexity it requires. In this work, we demonstrate electrical generation of spin-polarized exciton polaritons in a monolithic dielec… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  17. arXiv:2404.10933  [pdf, other

    cs.AI cs.CL cs.LG

    LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs

    Authors: Taeho Kim, Yanming Wang, Vatshank Chaturvedi, Lokesh Gupta, Seyeon Kim, Yongin Kwon, Sangtae Ha

    Abstract: Fine-tuning pre-trained large language models (LLMs) with limited hardware presents challenges due to GPU memory constraints. Various distributed fine-tuning methods have been proposed to alleviate memory constraints on GPU. However, determining the most effective method for achieving rapid fine-tuning while preventing GPU out-of-memory issues in a given environment remains unclear. To address thi… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 9 pages, 9 figures, accepted to IJCAI 2024

  18. arXiv:2404.10022  [pdf, other

    eess.SY cs.CE physics.chem-ph

    COBRAPRO: A MATLAB toolbox for Physics-based Battery Modeling and Co-simulation Parameter Optimization

    Authors: Sara Ha, Simona Onori

    Abstract: COBRAPRO is a new open-source physics-based battery modeling software with the capability to conduct closed-loop parameter optimization using experimental data. Physics-based battery models require systematic parameter calibration to accurately predict battery behavior across different usage scenarios. While parameter calibration is essential to predict the dynamic behavior of batteries, many exis… ▽ More

    Submitted 16 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  19. arXiv:2404.06611  [pdf, other

    cs.HC cs.SI

    Modeling social interaction dynamics using temporal graph networks

    Authors: J. Taery Kim, Archit Naik, Isuru Jayarathne, Sehoon Ha, Jouh Yeong Chew

    Abstract: Integrating intelligent systems, such as robots, into dynamic group settings poses challenges due to the mutual influence of human behaviors and internal states. A robust representation of social interaction dynamics is essential for effective human-robot collaboration. Existing approaches often narrow their focus to facial expressions or speech, overlooking the broader context. We propose employi… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 7 pages, 4 figures

    Journal ref: 33rd IEEE International Conference on Robot & Human Interactive Communication (RO-MAN 2024)

  20. arXiv:2404.05221  [pdf, other

    cs.CL cs.AI

    LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

    Authors: Shibo Hao, Yi Gu, Haotian Luo, Tianyang Liu, Xiyan Shao, Xinyuan Wang, Shuhua Xie, Haodi Ma, Adithya Samavedhi, Qiyue Gao, Zhen Wang, Zhiting Hu

    Abstract: Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs) to address complex problems and enhance robustness and interpretability. Despite the flux of research on developing advanced reasoning approaches, systematically analyzing the diverse LLMs and reasoning strategies in generating reasoning chains remains a significant challenge. The difficulties stem from the la… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Project website: https://www.llm-reasoners.net/

  21. arXiv:2404.02550  [pdf, other

    math.DS

    On the comparison between phenomenological and kinetic theories of gas mixtures with applications to flocking

    Authors: Gi-Chan Bae, Seung-Yeal Ha, Gyuyoung Hwang, Tommaso Ruggeri

    Abstract: We study the compression between the phenomenological and kinetic models for a mixture of gases from the viewpoint of collective dynamics. In the case in which constituents are Eulerian gases, balance equations for mass, momentum, and energy are the same in the main differential part, but production terms due to the interchanges between constituents are different. They coincide only when the therm… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 45 pages, 12 figures

    MSC Class: 34C60; 34E10; 35L65

  22. arXiv:2403.17158  [pdf, other

    cs.CL

    Reflecting the Male Gaze: Quantifying Female Objectification in 19th and 20th Century Novels

    Authors: Kexin Luo, Yue Mao, Bei Zhang, Sophie Hao

    Abstract: Inspired by the concept of the male gaze (Mulvey, 1975) in literature and media studies, this paper proposes a framework for analyzing gender bias in terms of female objectification: the extent to which a text portrays female individuals as objects of visual pleasure. Our framework measures female objectification along two axes. First, we compute an agency bias score that indicates whether male en… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: To appear in LREC-COLING 2024

  23. arXiv:2403.12550  [pdf, other

    cs.CV

    RGBD GS-ICP SLAM

    Authors: Seongbo Ha, Jiung Yeon, Hyeonwoo Yu

    Abstract: Simultaneous Localization and Mapping (SLAM) with dense representation plays a key role in robotics, Virtual Reality (VR), and Augmented Reality (AR) applications. Recent advancements in dense representation SLAM have highlighted the potential of leveraging neural scene representation and 3D Gaussian representation for high-fidelity spatial representation. In this paper, we propose a novel dense r… ▽ More

    Submitted 22 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  24. arXiv:2403.11070  [pdf, other

    cs.CV

    Controllable Relation Disentanglement for Few-Shot Class-Incremental Learning

    Authors: Yuan Zhou, Richang Hong, Yanrong Guo, Lin Liu, Shijie Hao, Hanwang Zhang

    Abstract: In this paper, we propose to tackle Few-Shot Class-Incremental Learning (FSCIL) from a new perspective, i.e., relation disentanglement, which means enhancing FSCIL via disentangling spurious relation between categories. The challenge of disentangling spurious correlations lies in the poor controllability of FSCIL. On one hand, an FSCIL model is required to be trained in an incremental manner and t… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  25. arXiv:2403.07860  [pdf, other

    cs.CV

    Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

    Authors: Shihao Zhao, Shaozhe Hao, Bojia Zi, Huaizhe Xu, Kwan-Yee K. Wong

    Abstract: Text-to-image generation has made significant advancements with the introduction of text-to-image diffusion models. These models typically consist of a language model that interprets user prompts and a vision model that generates corresponding images. As language and vision models continue to progress in their respective domains, there is a great potential in exploring the replacement of component… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  26. arXiv:2403.05086  [pdf, other

    cs.CV

    UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Sets

    Authors: Youngju Na, Woo Jae Kim, Kyu Beom Han, Suhyeon Ha, Sung-eui Yoon

    Abstract: Generalizable neural implicit surface reconstruction aims to obtain an accurate underlying geometry given a limited number of multi-view images from unseen scenes. However, existing methods select only informative and relevant views using predefined scores for training and testing phases. This constraint renders the model impractical in real-world scenarios, where the availability of favorable com… ▽ More

    Submitted 17 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: accepted at CVPR 2024 project page: https://youngju-na.github.io/uforecon.github.io/

  27. arXiv:2403.04918  [pdf, other

    cs.CR

    Secure Information Embedding and Extraction in Forensic 3D Fingerprinting

    Authors: Canran Wang, Jinwen Wang, Mi Zhou, Vinh Pham, Senyue Hao, Chao Zhou, Ning Zhang, Netanel Raviv

    Abstract: The prevalence of 3D printing poses a significant risk to public safety, as any individual with internet access and a commodity printer is able to produce untraceable firearms, keys, counterfeit products, etc. To aid government authorities in combating these new security threats, several approaches have been taken to tag 3D-prints with identifying information. Known as fingerprints, this informati… ▽ More

    Submitted 12 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  28. arXiv:2403.03430  [pdf, other

    math.OC

    Discrete Consensus-Based Optimization

    Authors: Junhyeok Byeon, Seung-Yeal Ha, Joong-Ho Won

    Abstract: We propose Discrete Consensus-Based Optimization (DCBO), a fully discrete version of the Consensus-Based Optimization (CBO) framework. DCBO is a multi-agent method for the global optimization of possibly non-convex and non-differentiable functions. It aligns with the CBO paradigm, which promotes a consensus among agents towards a global optimum through simple stochastic dynamics amenable to rigoro… ▽ More

    Submitted 16 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    MSC Class: 37H10; 37N40; 65K10

  29. arXiv:2402.10280  [pdf, other

    cs.LG

    SusFL: Energy-Aware Federated Learning-based Monitoring for Sustainable Smart Farms

    Authors: Dian Chen, Paul Yang, Ing-Ray Chen, Dong Sam Ha, Jin-Hee Cho

    Abstract: We propose a novel energy-aware federated learning (FL)-based system, namely SusFL, for sustainable smart farming to address the challenge of inconsistent health monitoring due to fluctuating energy levels of solar sensors. This system equips animals, such as cattle, with solar sensors with computational capabilities, including Raspberry Pis, to train a local deep-learning model on health data. Th… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  30. arXiv:2402.01787  [pdf, other

    cs.CY cs.AI cs.LG

    Harm Amplification in Text-to-Image Models

    Authors: Susan Hao, Renee Shelby, Yuchi Liu, Hansa Srinivasan, Mukul Bhutani, Burcu Karagol Ayan, Ryan Poplin, Shivani Poddar, Sarah Laszlo

    Abstract: Text-to-image (T2I) models have emerged as a significant advancement in generative AI; however, there exist safety concerns regarding their potential to produce harmful image outputs even when users input seemingly safe prompts. This phenomenon, where T2I models generate harmful representations that were not explicit in the input, poses a potentially greater risk than adversarial prompts, leaving… ▽ More

    Submitted 17 May, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  31. arXiv:2402.01338  [pdf, other

    cond-mat.stat-mech cond-mat.soft cs.LG physics.bio-ph

    Inferring the Langevin Equation with Uncertainty via Bayesian Neural Networks

    Authors: Youngkyoung Bae, Seungwoong Ha, Hawoong Jeong

    Abstract: Pervasive across diverse domains, stochastic systems exhibit fluctuations in processes ranging from molecular dynamics to climate phenomena. The Langevin equation has served as a common mathematical model for studying such systems, enabling predictions of their temporal evolution and analyses of thermodynamic quantities, including absorbed heat, work done on the system, and entropy production. How… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 30 pages, 17 figures

  32. arXiv:2401.06146  [pdf, other

    cs.CV cs.GR

    AAMDM: Accelerated Auto-regressive Motion Diffusion Model

    Authors: Tianyu Li, Calvin Qiao, Guanqiao Ren, KangKang Yin, Sehoon Ha

    Abstract: Interactive motion synthesis is essential in creating immersive experiences in entertainment applications, such as video games and virtual reality. However, generating animations that are both high-quality and contextually responsive remains a challenge. Traditional techniques in the game industry can produce high-fidelity animations but suffer from high computational costs and poor scalability. T… ▽ More

    Submitted 2 December, 2023; originally announced January 2024.

  33. arXiv:2401.01629  [pdf, ps, other

    cs.LG cs.AI cs.CY

    Synthetic Data in AI: Challenges, Applications, and Ethical Implications

    Authors: Shuang Hao, Wenfeng Han, Tao Jiang, Yiping Li, Haonan Wu, Chunlin Zhong, Zhangjun Zhou, He Tang

    Abstract: In the rapidly evolving field of artificial intelligence, the creation and utilization of synthetic datasets have become increasingly significant. This report delves into the multifaceted aspects of synthetic data, particularly emphasizing the challenges and potential biases these datasets may harbor. It explores the methodologies behind synthetic data generation, spanning traditional statistical… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  34. arXiv:2312.11979  [pdf, other

    cond-mat.str-el

    Origin of chirality in transition-metal dichalcogenides

    Authors: Kwangrae Kim, Hyun-Woo J. Kim, Seunghyeok Ha, Hoon Kim, Jin-Kwang Kim, Jaehwon Kim, Hyunsung Kim, Junyoung Kwon, Jihoon Seol, Saegyeol Jung, Changyoung Kim, Ahmet Alatas, Ayman Said, Michael Merz, Matthieu Le Tacon, Jin Mo Bok, Ki-Seok Kim, B. J. Kim

    Abstract: Chirality is a ubiquitous phenomenon in which a symmetry between left- and right-handed objects is broken, examples in nature ranging from subatomic particles and molecules to living organisms. In particle physics, the weak force is responsible for the symmetry breaking and parity violation in beta decay, but in condensed matter systems interactions that lead to chirality remain poorly understood.… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 10 pages, 3 figures, 1 table

  35. arXiv:2312.03275  [pdf, other

    cs.RO cs.AI

    VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

    Authors: Naoki Yokoyama, Sehoon Ha, Dhruv Batra, Jiuguang Wang, Bernadette Bucher

    Abstract: Understanding how humans leverage semantic knowledge to navigate unfamiliar environments and decide where to explore next is pivotal for developing robots capable of human-like search behaviors. We introduce a zero-shot navigation approach, Vision-Language Frontier Maps (VLFM), which is inspired by human reasoning and designed to navigate towards unseen semantic objects in novel environments. VLFM… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  36. arXiv:2311.15209  [pdf, other

    cs.AI

    See and Think: Embodied Agent in Virtual Environment

    Authors: Zhonghan Zhao, Wenhao Chai, Xuan Wang, Li Boyi, Shengyu Hao, Shidong Cao, Tian Ye, Gaoang Wang

    Abstract: Large language models (LLMs) have achieved impressive pro-gress on several open-world tasks. Recently, using LLMs to build embodied agents has been a hotspot. This paper proposes STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment. STEVE comprises three key components: vision perception, language instruction, and code action. Vision perception involves interpre… ▽ More

    Submitted 9 July, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

    Comments: ECCV 2024. First three authors contribute equally to this work. Project Website https://rese1f.github.io/STEVE/

  37. arXiv:2311.12467  [pdf, other

    cs.CV

    GLAD: Global-Local View Alignment and Background Debiasing for Unsupervised Video Domain Adaptation with Large Domain Gap

    Authors: Hyogun Lee, Kyungho Bae, Seong Jong Ha, Yumin Ko, Gyeong-Moon Park, Jinwoo Choi

    Abstract: In this work, we tackle the challenging problem of unsupervised video domain adaptation (UVDA) for action recognition. We specifically focus on scenarios with a substantial domain gap, in contrast to existing works primarily deal with small domain gaps between labeled source domains and unlabeled target domains. To establish a more realistic setting, we introduce a novel UVDA scenario, denoted as… ▽ More

    Submitted 22 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: This is an accepted WACV 2024 paper. Our code is available at https://github.com/KHUVLL/GLAD

  38. arXiv:2311.10261  [pdf, other

    cs.CV eess.SP

    Vision meets mmWave Radar: 3D Object Perception Benchmark for Autonomous Driving

    Authors: Yizhou Wang, Jen-Hao Cheng, Jui-Te Huang, Sheng-Yao Kuan, Qiqian Fu, Chiming Ni, Shengyu Hao, Gaoang Wang, Guanbin Xing, Hui Liu, Jenq-Neng Hwang

    Abstract: Sensor fusion is crucial for an accurate and robust perception system on autonomous vehicles. Most existing datasets and perception solutions focus on fusing cameras and LiDAR. However, the collaboration between camera and radar is significantly under-exploited. The incorporation of rich semantic information from the camera, and reliable 3D information from the radar can potentially achieve an eff… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  39. arXiv:2311.02304  [pdf, other

    cs.RO

    Imitating and Finetuning Model Predictive Control for Robust and Symmetric Quadrupedal Locomotion

    Authors: Donghoon Youm, Hyunyoung Jung, Hyeongjun Kim, Jemin Hwangbo, Hae-Won Park, Sehoon Ha

    Abstract: Control of legged robots is a challenging problem that has been investigated by different approaches, such as model-based control and learning algorithms. This work proposes a novel Imitating and Finetuning Model Predictive Control (IFM) framework to take the strengths of both approaches. Our framework first develops a conventional model predictive controller (MPC) using Differential Dynamic Progr… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  40. arXiv:2310.15151  [pdf, other

    cs.CL cs.AI cs.LG

    Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number

    Authors: Sophie Hao, Tal Linzen

    Abstract: Deep architectures such as Transformers are sometimes criticized for having uninterpretable "black-box" representations. We use causal intervention analysis to show that, in fact, some linguistic features are represented in a linear, interpretable format. Specifically, we show that BERT's ability to conjugate verbs relies on a linear encoding of subject number that can be manipulated with predicta… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: To appear in Findings of the Association for Computational Linguistics: EMNLP 2023

  41. arXiv:2310.10606  [pdf, other

    cs.RO cs.LG

    BayRnTune: Adaptive Bayesian Domain Randomization via Strategic Fine-tuning

    Authors: Tianle Huang, Nitish Sontakke, K. Niranjan Kumar, Irfan Essa, Stefanos Nikolaidis, Dennis W. Hong, Sehoon Ha

    Abstract: Domain randomization (DR), which entails training a policy with randomized dynamics, has proven to be a simple yet effective algorithm for reducing the gap between simulation and the real world. However, DR often requires careful tuning of randomization parameters. Methods like Bayesian Domain Randomization (Bayesian DR) and Active Domain Randomization (Adaptive DR) address this issue by automatin… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  42. arXiv:2310.06226  [pdf, other

    cs.RO

    Words into Action: Learning Diverse Humanoid Robot Behaviors using Language Guided Iterative Motion Refinement

    Authors: K. Niranjan Kumar, Irfan Essa, Sehoon Ha

    Abstract: Humanoid robots are well suited for human habitats due to their morphological similarity, but developing controllers for them is a challenging task that involves multiple sub-problems, such as control, planning and perception. In this paper, we introduce a method to simplify controller design by enabling users to train and fine-tune robot control policies using natural language commands. We first… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  43. arXiv:2310.02091  [pdf

    physics.optics physics.app-ph

    Dual-resonance nanostructures for colour down-conversion of colloidal quantum emitters

    Authors: Son Tung Ha, Emmanuel Lassalle, Xiao Liang, Thi Thu Ha Do, Ian Foo, Sushant Shendre, Emek Goksu Durmusoglu, Vytautas Valuckas, Sourav Adhikary, Ramon Paniagua-Dominguez, Hilmi Volkan Demir, Arseniy Kuznetsov

    Abstract: Linear colour conversion is a process where an emitter absorbs a photon and then emits another photon with either higher or lower energy, corresponding to up- or down conversion, respectively. In this regard, the presence of a volumetric cavity plays a crucial role in enhancing absorption and photoluminescence (PL), as it allows for large volumes of interaction between the exciting photons and the… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 37 pages, 11 figures (4 in maintext, 11 in SI)

  44. arXiv:2310.01273  [pdf, other

    cs.RO

    Learning manipulation of steep granular slopes for fast Mini Rover turning

    Authors: Deniz Kerimoglu, Daniel Soto, Malone Lincoln Hemsley, Joseph Brunner, Sehoon Ha, Tingnan Zhang, Daniel I. Goldman

    Abstract: Future planetary exploration missions will require reaching challenging regions such as craters and steep slopes. Such regions are ubiquitous and present science-rich targets potentially containing information regarding the planet's internal structure. Steep slopes consisting of low-cohesion regolith are prone to flow downward under small disturbances, making it very challenging for autonomous rov… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 6 pages, 6 figures, conference paper submission for ICRA2024

  45. Quantum spin nematic phase in a square-lattice iridate

    Authors: Hoon Kim, Jin-Kwang Kim, Jimin Kim, Hyun-Woo J. Kim, Seunghyeok Ha, Kwangrae Kim, Wonjun Lee, Jonghwan Kim, Gil Young Cho, Hyeokjun Heo, Joonho Jang, J. Strempfer, G. Fabbris, Y. Choi, D. Haskel, Jungho Kim, J. -W. Kim, B. J. Kim

    Abstract: Spin nematic (SN) is a magnetic analog of classical liquid crystals, a fourth state of matter exhibiting characteristics of both liquid and solid. Particularly intriguing is a valence-bond SN, in which spins are quantum entangled to form a multi-polar order without breaking time-reversal symmetry, but its unambiguous experimental realization remains elusive. Here, we establish a SN phase in the sq… ▽ More

    Submitted 14 December, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Published in https://www.nature.com/articles/s41586-023-06829-4

  46. arXiv:2309.17046  [pdf, other

    cs.RO

    CrossLoco: Human Motion Driven Control of Legged Robots via Guided Unsupervised Reinforcement Learning

    Authors: Tianyu Li, Hyunyoung Jung, Matthew Gombolay, Yong Kwon Cho, Sehoon Ha

    Abstract: Human motion driven control (HMDC) is an effective approach for generating natural and compelling robot motions while preserving high-level semantics. However, establishing the correspondence between humans and robots with different body structures is not straightforward due to the mismatches in kinematics and dynamics properties, which causes intrinsic ambiguity to the problem. Many previous algo… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  47. arXiv:2309.16538  [pdf, ps, other

    math.DS

    On the emergent dynamics of the infinite set of Kuramoto oscillators

    Authors: Seung-Yeal Ha, Euntaek Lee, Woojoo Shim

    Abstract: We propose an infinite Kuramoto model for a countably infinite set of Kuramoto oscillators and study its emergent dynamics for two classes of network topologies. For a class of symmetric and row(or column)-summable network topology, we show that a homogeneous ensemble exhibits complete synchronization, and the infinite Kuramoto model can cast as a gradient flow, whereas we obtain a weak synchroniz… ▽ More

    Submitted 3 October, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

    MSC Class: 34D05; 34G20; 70F45

  48. arXiv:2309.13175  [pdf

    cs.IR

    American Family Cohort, a data resource description

    Authors: Deepa Balraj, Ayin Vala, Shiying Hao, Melanie Philofsky, Anna Tsvetkova, Elena Trach, Shravani Priya Narra, Oleg Zhuk, Mary Shamkhorskaya, Jim Singer, Joseph Mesterhazy, Somalee Datta, Isabella Chu, David Rehkopf

    Abstract: This manuscript is a research resource description and presents a large and novel Electronic Health Records (EHR) data resource, American Family Cohort (AFC). The AFC data is derived from Centers for Medicare and Medicaid Services (CMS) certified American Board of Family Medicine (ABFM) PRIME registry. The PRIME registry is the largest national Qualified Clinical Data Registry (QCDR) for Primary C… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  49. arXiv:2309.10436  [pdf, other

    cs.RO

    LiDAR-Generated Images Derived Keypoints Assisted Point Cloud Registration Scheme in Odometry Estimation

    Authors: Haizhou Zhang, Xianjia Yu, Sier Ha, Tomi Westerlund

    Abstract: Keypoint detection and description play a pivotal role in various robotics and autonomous applications including visual odometry (VO), visual navigation, and Simultaneous localization and mapping (SLAM). While a myriad of keypoint detectors and descriptors have been extensively studied in conventional camera images, the effectiveness of these techniques in the context of LiDAR-generated images, i.… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  50. arXiv:2309.08230  [pdf, other

    cs.CR

    A Duty to Forget, a Right to be Assured? Exposing Vulnerabilities in Machine Unlearning Services

    Authors: Hongsheng Hu, Shuo Wang, Jiamin Chang, Haonan Zhong, Ruoxi Sun, Shuang Hao, Haojin Zhu, Minhui Xue

    Abstract: The right to be forgotten requires the removal or "unlearning" of a user's data from machine learning models. However, in the context of Machine Learning as a Service (MLaaS), retraining a model from scratch to fulfill the unlearning request is impractical due to the lack of training data on the service provider's side (the server). Furthermore, approximate unlearning further embraces a complex tr… ▽ More

    Submitted 15 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: To Appear in the Network and Distributed System Security Symposium (NDSS) 2024, San Diego, CA, USA