Skip to main content

Showing 1–50 of 123 results for author: Han, G

  1. arXiv:2407.04064  [pdf, other

    cs.RO

    Collision Avoidance for Multiple UAVs in Unknown Scenarios with Causal Representation Disentanglement

    Authors: Jiafan Zhuang, Zihao Xia, Gaofei Han, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

    Abstract: Deep reinforcement learning (DRL) has achieved remarkable progress in online path planning tasks for multi-UAV systems. However, existing DRL-based methods often suffer from performance degradation when tackling unseen scenarios, since the non-causal factors in visual representations adversely affect policy learning. To address this issue, we propose a novel representation learning approach, \ie,… ▽ More

    Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2407.04056  [pdf, other

    cs.RO

    Robust Policy Learning for Multi-UAV Collision Avoidance with Causal Feature Selection

    Authors: Jiafan Zhuang, Gaofei Han, Zihao Xia, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

    Abstract: In unseen and complex outdoor environments, collision avoidance navigation for unmanned aerial vehicle (UAV) swarms presents a challenging problem. It requires UAVs to navigate through various obstacles and complex backgrounds. Existing collision avoidance navigation methods based on deep reinforcement learning show promising performance but suffer from poor generalization abilities, resulting in… ▽ More

    Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  3. arXiv:2406.02013  [pdf, other

    cs.LG

    Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning

    Authors: Jiahang Cao, Qiang Zhang, Ziqing Wang, Jiaxu Wang, Hao Cheng, Yecheng Shao, Wen Zhao, Gang Han, Yijie Guo, Renjing Xu

    Abstract: Sequential modeling has demonstrated remarkable capabilities in offline reinforcement learning (RL), with Decision Transformer (DT) being one of the most notable representatives, achieving significant success. However, RL trajectories possess unique properties to be distinguished from the conventional sequence (e.g., text or audio): (1) local correlation, where the next states in RL are theoretica… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 16 pages, 5 figures

  4. arXiv:2405.18405  [pdf, other

    cs.CV cs.AI

    WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization

    Authors: Jiawei Ma, Yulei Niu, Shiyuan Huang, Guangxing Han, Shih-Fu Chang

    Abstract: Language has been useful in extending the vision encoder to data from diverse distributions without empirical discovery in training domains. However, as the image description is mostly at coarse-grained level and ignores visual details, the resulted embeddings are still ineffective in overcoming complexity of domains at inference time. We present a self-supervision framework WIDIn, Wording Images… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  5. arXiv:2405.07052  [pdf, other

    cs.CL

    Length-Aware Multi-Kernel Transformer for Long Document Classification

    Authors: Guangzeng Han, Jack Tsao, Xiaolei Huang

    Abstract: Lengthy documents pose a unique challenge to neural language models due to substantial memory consumption. While existing state-of-the-art (SOTA) models segment long texts into equal-length snippets (e.g., 128 tokens per snippet) or deploy sparse attention networks, these methods have new challenges of context fragmentation and generalizability due to sentence boundaries and varying text lengths.… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted to SEM 2024

  6. arXiv:2405.06983  [pdf, other

    cs.NI

    ISAC-Assisted Wireless Rechargeable Sensor Networks with Multiple Mobile Charging Vehicles

    Authors: Muhammad Umar Farooq Qaisar, Weijie Yuan, Paolo Bellavista, Guangjie Han, Adeel Ahmed

    Abstract: As IoT-based wireless sensor networks (WSNs) become more prevalent, the issue of energy shortages becomes more pressing. One potential solution is the use of wireless power transfer (WPT) technology, which is the key to building a new shape of wireless rechargeable sensor networks (WRSNs). However, efficient charging and scheduling are critical for WRSNs to function properly. Motivated by the fact… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted for publication in the Special Issue Q1'2024, "Integrating Sensing and Communication for Ubiquitous Internet of Things," IEEE Internet of Things Magazine

  7. arXiv:2404.13654  [pdf, other

    cs.MA

    Multi-AUV Cooperative Underwater Multi-Target Tracking Based on Dynamic-Switching-enabled Multi-Agent Reinforcement Learning

    Authors: Shengbo Wang, Chuan Lin, Guangjie Han, Shengchao Zhu, Zhixian Li, Zhenyu Wang

    Abstract: With the rapid development of underwater communication, sensing, automation, robot technologies, autonomous underwater vehicle (AUV) swarms are gradually becoming popular and have been widely promoted in ocean exploration and underwater tracking or surveillance, etc. However, the complex underwater environment poses significant challenges for AUV swarm-based accurate tracking for the underwater mo… ▽ More

    Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

  8. arXiv:2404.04656  [pdf, other

    cs.LG cs.AI cs.CL

    Binary Classifier Optimization for Large Language Model Alignment

    Authors: Seungjae Jung, Gunsoo Han, Daniel Wontae Nam, Kyoung-Woon On

    Abstract: Aligning Large Language Models (LLMs) to human preferences through preference optimization has been crucial but labor-intensive, necessitating for each prompt a comparison of both a chosen and a rejected text completion by evaluators. Recently, Kahneman-Tversky Optimization (KTO) has demonstrated that LLMs can be aligned using merely binary "thumbs-up" or "thumbs-down" signals on each prompt-compl… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 18 pages, 9 figures

  9. arXiv:2404.02838  [pdf, other

    cs.AI

    I-Design: Personalized LLM Interior Designer

    Authors: Ata Çelen, Guo Han, Konrad Schindler, Luc Van Gool, Iro Armeni, Anton Obukhov, Xi Wang

    Abstract: Interior design allows us to be who we are and live how we want - each design is as unique as our distinct personality. However, it is not trivial for non-professionals to express and materialize this since it requires aligning functional and visual expectations with the constraints of physical space; this renders interior design a luxury. To make it more accessible, we present I-Design, a persona… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  10. arXiv:2403.13786  [pdf, other

    cs.CL

    Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts

    Authors: Guangzeng Han, Weisi Liu, Xiaolei Huang, Brian Borsari

    Abstract: Automatic coding patient behaviors is essential to support decision making for psychotherapists during the motivational interviewing (MI), a collaborative communication intervention approach to address psychiatric issues, such as alcohol and drug addiction. While the behavior coding task has rapidly adapted machine learning to predict patient states during the MI sessions, lacking of domain-specif… ▽ More

    Submitted 23 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted to IEEE ICHI 2024

  11. arXiv:2403.10492  [pdf, other

    cs.CV

    Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning

    Authors: Dongmin Park, Zhaofang Qian, Guangxing Han, Ser-Nam Lim

    Abstract: Mitigating hallucinations of Large Vision Language Models,(LVLMs) is crucial to enhance their reliability for general-purpose assistants. This paper shows that such hallucinations of LVLMs can be significantly exacerbated by preceding user-system dialogues. To precisely measure this, we first present an evaluation benchmark by extending popular multi-modal benchmark datasets with prepended halluci… ▽ More

    Submitted 25 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  12. arXiv:2402.10873  [pdf, ps, other

    cs.NI eess.SP

    Probabilistic On-Demand Charging Scheduling for ISAC-Assisted WRSNs with Multiple Mobile Charging Vehicles

    Authors: Muhammad Umar Farooq Qaisar, Weijie Yuan, Paolo Bellavista, Guangjie Han, Rabiu Sale Zakariyya, Adeel Ahmed

    Abstract: The internet of things (IoT) based wireless sensor networks (WSNs) face an energy shortage challenge that could be overcome by the novel wireless power transfer (WPT) technology. The combination of WSNs and WPT is known as wireless rechargeable sensor networks (WRSNs), with the charging efficiency and charging scheduling being the primary concerns. Therefore, this paper proposes a probabilistic on… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted for publication at the IEEE Global Communications Conference (GLOBECOM) 2023

  13. arXiv:2401.08121  [pdf, other

    cs.LG cs.AI eess.SY

    CycLight: learning traffic signal cooperation with a cycle-level strategy

    Authors: Gengyue Han, Xiaohan Liu, Xianyue Peng, Hao Wang, Yu Han

    Abstract: This study introduces CycLight, a novel cycle-level deep reinforcement learning (RL) approach for network-level adaptive traffic signal control (NATSC) systems. Unlike most traditional RL-based traffic controllers that focus on step-by-step decision making, CycLight adopts a cycle-level strategy, optimizing cycle length and splits simultaneously using Parameterized Deep Q-Networks (PDQN) algorithm… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  14. arXiv:2312.12423  [pdf, other

    cs.CV cs.AI

    Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

    Authors: Shraman Pramanick, Guangxing Han, Rui Hou, Sayan Nag, Ser-Nam Lim, Nicolas Ballas, Qifan Wang, Rama Chellappa, Amjad Almahairi

    Abstract: The ability of large language models (LLMs) to process visual inputs has given rise to general-purpose vision systems, unifying various vision-language (VL) tasks by instruction tuning. However, due to the enormous diversity in input-output formats in the vision domain, existing general-purpose models fail to successfully integrate segmentation and multi-image inputs with coarse-level tasks into a… ▽ More

    Submitted 19 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Highlight

  15. arXiv:2312.12227  [pdf, other

    cs.CV cs.AI

    HuTuMotion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback

    Authors: Gaoge Han, Shaoli Huang, Mingming Gong, Jinglei Tang

    Abstract: We introduce HuTuMotion, an innovative approach for generating natural human motions that navigates latent motion diffusion models by leveraging few-shot human feedback. Unlike existing approaches that sample latent variables from a standard normal prior distribution, our method adapts the prior distribution to better suit the characteristics of the data, as indicated by human feedback, thus enhan… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024 Main Track

  16. arXiv:2311.01018  [pdf, other

    cs.CV

    Expanding Expressiveness of Diffusion Models with Limited Data via Self-Distillation based Fine-Tuning

    Authors: Jiwan Hur, Jaehyun Choi, Gyojin Han, Dong-Jae Lee, Junmo Kim

    Abstract: Training diffusion models on limited datasets poses challenges in terms of limited generation capacity and expressiveness, leading to unsatisfactory results in various downstream tasks utilizing pretrained diffusion models, such as domain translation and text-guided image manipulation. In this paper, we propose Self-Distillation for Fine-Tuning diffusion models (SDFT), a methodology to address the… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  17. arXiv:2310.10856  [pdf

    eess.SY cs.LG cs.MA

    Joint Optimization of Traffic Signal Control and Vehicle Routing in Signalized Road Networks using Multi-Agent Deep Reinforcement Learning

    Authors: Xianyue Peng, Hang Gao, Gengyue Han, Hao Wang, Michael Zhang

    Abstract: Urban traffic congestion is a critical predicament that plagues modern road networks. To alleviate this issue and enhance traffic efficiency, traffic signal control and vehicle routing have proven to be effective measures. In this paper, we propose a joint optimization approach for traffic signal control and vehicle routing in signalized road networks. The objective is to enhance network performan… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  18. arXiv:2310.06404  [pdf, other

    cs.CL cs.AI cs.LG

    Hexa: Self-Improving for Knowledge-Grounded Dialogue System

    Authors: Daejin Jo, Daniel Wontae Nam, Gunsoo Han, Kyoung-Woon On, Taehwan Kwon, Seungeun Rho, Sungwoong Kim

    Abstract: A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the gene… ▽ More

    Submitted 2 April, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  19. arXiv:2309.03509  [pdf, other

    cs.CV

    BroadCAM: Outcome-agnostic Class Activation Mapping for Small-scale Weakly Supervised Applications

    Authors: Jiatai Lin, Guoqiang Han, Xuemiao Xu, Changhong Liang, Tien-Tsin Wong, C. L. Philip Chen, Zaiyi Liu, Chu Han

    Abstract: Class activation mapping~(CAM), a visualization technique for interpreting deep learning models, is now commonly used for weakly supervised semantic segmentation~(WSSS) and object localization~(WSOL). It is the weighted aggregation of the feature maps by activating the high class-relevance ones. Current CAM methods achieve it relying on the training outcomes, such as predicted scores~(forward info… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  20. arXiv:2308.00783  [pdf, other

    cs.CV

    Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking

    Authors: Mingzhan Yang, Guangxin Han, Bin Yan, Wenhua Zhang, Jinqing Qi, Huchuan Lu, Dong Wang

    Abstract: Multi-Object Tracking (MOT) aims to detect and associate all desired objects across frames. Most methods accomplish the task by explicitly or implicitly leveraging strong cues (i.e., spatial and appearance information), which exhibit powerful instance-level discrimination. However, when object occlusion and clustering occur, spatial and appearance information will become ambiguous simultaneously d… ▽ More

    Submitted 20 January, 2024; v1 submitted 1 August, 2023; originally announced August 2023.

    Comments: Accepted to AAAI 2024

  21. arXiv:2307.08671  [pdf, other

    cs.CR cs.AI

    Deep Cross-Modal Steganography Using Neural Representations

    Authors: Gyojin Han, Dong-Jae Lee, Jiwan Hur, Jaehyun Choi, Junmo Kim

    Abstract: Steganography is the process of embedding secret data into another message or data, in such a way that it is not easily noticeable. With the advancement of deep learning, Deep Neural Networks (DNNs) have recently been utilized in steganography. However, existing deep steganography techniques are limited in scope, as they focus on specific data types and are not effective for cross-modal steganogra… ▽ More

    Submitted 7 October, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

    Comments: ICIP 2023 Oral

  22. arXiv:2307.05889  [pdf, other

    cs.CV

    Rethinking Mitosis Detection: Towards Diverse Data and Feature Representation

    Authors: Hao Wang, Jiatai Lin, Danyi Li, Jing Wang, Bingchao Zhao, Zhenwei Shi, Xipeng Pan, Huadeng Wang, Bingbing Li, Changhong Liang, Guoqiang Han, Li Liang, Chu Han, Zaiyi Liu

    Abstract: Mitosis detection is one of the fundamental tasks in computational pathology, which is extremely challenging due to the heterogeneity of mitotic cell. Most of the current studies solve the heterogeneity in the technical aspect by increasing the model complexity. However, lacking consideration of the biological knowledge and the complex model design may lead to the overfitting problem while limited… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  23. M3PT: A Multi-Modal Model for POI Tagging

    Authors: Jingsong Yang, Guanzhou Han, Deqing Yang, Jingping Liu, Yanghua Xiao, Xiang Xu, Baohua Wu, Shenghua Ni

    Abstract: POI tagging aims to annotate a point of interest (POI) with some informative tags, which facilitates many services related to POIs, including search, recommendation, and so on. Most of the existing solutions neglect the significance of POI images and seldom fuse the textual and visual features of POIs, resulting in suboptimal tagging performance. In this paper, we propose a novel Multi-Modal Model… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted by KDD 2023

    ACM Class: H.3.0

  24. arXiv:2306.07289  [pdf, other

    cs.HC

    Multi-Interactive-Modality based Modeling for Myopia Pro-Gression of Adolescent Student

    Authors: Xiangyu Yan, Gongen Han, Can Fang, Xuan Jing

    Abstract: Myopia is a common visual disorder that affects millions of people worldwide and its prevalence has been increasing in recent years. Environmental factors, such as reading time, viewing distance, and ambient lighting, have been identified as potential factors in the development of myopia. In this study, we investigated the relationship between three major factors and myopia in 120 adolescents. By… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: 9 pages, 5 figures

  25. arXiv:2306.02393  [pdf, other

    cs.RO cs.CV

    Accessible Robot Control in Mixed Reality

    Authors: Ganlin Zhang, Deheng Zhang, Longteng Duan, Guo Han

    Abstract: A novel method to control the Spot robot of Boston Dynamics by Hololens 2 is proposed. This method is mainly designed for people with physical disabilities, users can control the robot's movement and robot arm without using their hands. The eye gaze tracking and head motion tracking technologies of Hololens 2 are utilized for sending control commands. The movement of the robot would follow the eye… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: Course Project of Mixed Reality at ETH Zurich

  26. arXiv:2305.13973  [pdf, other

    cs.CL

    Effortless Integration of Memory Management into Open-Domain Conversation Systems

    Authors: Eunbi Choi, Kyoung-Woon On, Gunsoo Han, Sungwoong Kim, Daniel Wontae Nam, Daejin Jo, Seung Eun Rho, Taehwan Kwon, Minjoon Seo

    Abstract: Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach. One of the limitations of the system, however, is the absence of management capability for external memory. In this paper, we propose a simple method to improve BlenderBot3 by integrating memory management ability into it. Since no training data exists for this purpose, we propo… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  27. arXiv:2304.04625  [pdf, other

    cs.LG cs.CR cs.CV

    Reinforcement Learning-Based Black-Box Model Inversion Attacks

    Authors: Gyojin Han, Jaehyun Choi, Haeil Lee, Junmo Kim

    Abstract: Model inversion attacks are a type of privacy attack that reconstructs private data used to train a machine learning model, solely by accessing the model. Recently, white-box model inversion attacks leveraging Generative Adversarial Networks (GANs) to distill knowledge from public datasets have been receiving great attention because of their excellent attack performance. On the other hand, current… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: CVPR 2023, Accepted

  28. arXiv:2303.15466  [pdf, other

    cs.CV cs.AI

    Supervised Masked Knowledge Distillation for Few-Shot Transformers

    Authors: Han Lin, Guangxing Han, Jiawei Ma, Shiyuan Huang, Xudong Lin, Shih-Fu Chang

    Abstract: Vision Transformers (ViTs) emerge to achieve impressive performance on many data-abundant computer vision tasks by capturing long-range dependencies among local features. However, under few-shot learning (FSL) settings on small datasets with only a few labeled data, ViT tends to overfit and suffers from severe performance degradation due to its absence of CNN-alike inductive bias. Previous works i… ▽ More

    Submitted 28 March, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: To appear in CVPR 2023

  29. arXiv:2303.09674  [pdf, other

    cs.CV cs.AI

    DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection

    Authors: Jiawei Ma, Yulei Niu, Jincheng Xu, Shiyuan Huang, Guangxing Han, Shih-Fu Chang

    Abstract: Generalized few-shot object detection aims to achieve precise detection on both base classes with abundant annotations and novel classes with limited training data. Existing approaches enhance few-shot generalization with the sacrifice of base-class performance, or maintain high precision in base-class detection with limited improvement in novel-class adaptation. In this paper, we point out the re… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 Camera Ready (Supp Attached). Code Link: https://github.com/Phoenix-V/DiGeo

  30. arXiv:2302.14139  [pdf, other

    cs.LG cs.AI cs.SE

    Scalable End-to-End ML Platforms: from AutoML to Self-serve

    Authors: Igor L. Markov, Pavlos A. Apostolopoulos, Mia R. Garrard, Tanya Qie, Yin Huang, Tanvi Gupta, Anika Li, Cesar Cardoso, George Han, Ryan Maghsoudian, Norm Zhou

    Abstract: ML platforms help enable intelligent data-driven applications and maintain them with limited engineering effort. Upon sufficiently broad adoption, such platforms reach economies of scale that bring greater component reuse while improving efficiency of system development and maintenance. For an end-to-end ML platform with broad adoption, scaling relies on pervasive ML automation and system integrat… ▽ More

    Submitted 3 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: 10 pages, 1 figure, 2 tables

  31. arXiv:2302.13073  [pdf, other

    cs.IT

    Feedback Capacity of the Continuous-Time ARMA(1,1) Gaussian Channel

    Authors: Jun Su, Guangyue Han, Shlomo Shamai

    Abstract: We consider the continuous-time ARMA(1,1) Gaussian channel and derive its feedback capacity in closed form. More specifically, the channel is given by $\boldsymbol{y}(t) =\boldsymbol{x}(t) +\boldsymbol{z}(t)$, where the channel input $\{\boldsymbol{x}(t) \}$ satisfies average power constraint $P$ and the noise $\{\boldsymbol{z}(t)\}$ is a first-order {\em autoregressive moving average} (ARMA(1,1))… ▽ More

    Submitted 10 April, 2024; v1 submitted 25 February, 2023; originally announced February 2023.

  32. arXiv:2302.12662  [pdf, other

    eess.IV cs.CV

    FedDBL: Communication and Data Efficient Federated Deep-Broad Learning for Histopathological Tissue Classification

    Authors: Tianpeng Deng, Yanqi Huang, Guoqiang Han, Zhenwei Shi, Jiatai Lin, Qi Dou, Zaiyi Liu, Xiao-jing Guo, C. L. Philip Chen, Chu Han

    Abstract: Histopathological tissue classification is a fundamental task in computational pathology. Deep learning-based models have achieved superior performance but centralized training with data centralization suffers from the privacy leakage problem. Federated learning (FL) can safeguard privacy by keeping training samples locally, but existing FL-based frameworks require a large number of well-annotated… ▽ More

    Submitted 17 December, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

  33. arXiv:2212.13738  [pdf, other

    cs.CV cs.CL

    TempCLR: Temporal Alignment Representation with Contrastive Learning

    Authors: Yuncong Yang, Jiawei Ma, Shiyuan Huang, Long Chen, Xudong Lin, Guangxing Han, Shih-Fu Chang

    Abstract: Video representation learning has been successful in video-text pre-training for zero-shot transfer, where each sentence is trained to be close to the paired video clips in a common feature space. For long videos, given a paragraph of description where the sentences describe different segments of the video, by matching all sentence-clip pairs, the paragraph and the full video are aligned implicitl… ▽ More

    Submitted 29 March, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

    Comments: ICLR 2023 Camera Ready. Code Link: https://github.com/yyuncong/TempCLR

  34. arXiv:2211.15875  [pdf, other

    cs.LG cs.CR cs.CV

    Data Poisoning Attack Aiming the Vulnerability of Continual Learning

    Authors: Gyojin Han, Jaehyun Choi, Hyeong Gwon Hong, Junmo Kim

    Abstract: Generally, regularization-based continual learning models limit access to the previous task data to imitate the real-world constraints related to memory and privacy. However, this introduces a problem in these models by not being able to track the performance on each task. In essence, current continual learning methods are susceptible to attacks on previous tasks. We demonstrate the vulnerability… ▽ More

    Submitted 3 July, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: ICIP 2023 (NeurIPS 2022 ML Safety Workshop accepted paper)

  35. arXiv:2210.12444  [pdf, other

    cs.CV cs.AI cs.CL cs.MM

    Weakly-Supervised Temporal Article Grounding

    Authors: Long Chen, Yulei Niu, Brian Chen, Xudong Lin, Guangxing Han, Christopher Thomas, Hammad Ayyubi, Heng Ji, Shih-Fu Chang

    Abstract: Given a long untrimmed video and natural language queries, video grounding (VG) aims to temporally localize the semantically-aligned video segments. Almost all existing VG work holds two simple but unrealistic assumptions: 1) All query sentences can be grounded in the corresponding video. 2) All query sentences for the same video are always at the same semantic scale. Unfortunately, both assumptio… ▽ More

    Submitted 23 February, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022, https://github.com/zjuchenlong/WSAG

  36. arXiv:2210.09198  [pdf, other

    cs.CV

    Pixel-Aligned Non-parametric Hand Mesh Reconstruction

    Authors: Shijian Jiang, Guwen Han, Danhang Tang, Yang Zhou, Xiang Li, Jiming Chen, Qi Ye

    Abstract: Non-parametric mesh reconstruction has recently shown significant progress in 3D hand and body applications. In these methods, mesh vertices and edges are visible to neural networks, enabling the possibility to establish a direct mapping between 2D image pixels and 3D mesh vertices. In this paper, we seek to establish and exploit this mapping with a simple and compact architecture. The network is… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

  37. arXiv:2208.06132  [pdf, ps, other

    cs.IT eess.SP

    On the Physical Layer Security of Visible Light Communications Empowered by Gold Nanoparticles

    Authors: Geonho Han, Hyuckjin Choi, Ryeong Myeong Kim, Ki Tae Nam, Junil Choi, Theodoros A. Tsiftsis

    Abstract: Visible light is a proper spectrum for secure wireless communications because of its high directivity and impermeability in indoor scenarios. However, if an eavesdropper is located very close to a legitimate receiver, secure communications become highly risky. In this paper, to further increase the level of security of visible light communication (VLC) and increase its resilience against to malici… ▽ More

    Submitted 7 June, 2024; v1 submitted 12 August, 2022; originally announced August 2022.

  38. Radar Imaging Based on IEEE 802.11ad Waveform in V2I Communications

    Authors: Geonho Han, Junil Choi, Robert W. Heath Jr

    Abstract: Since most of vehicular radar systems are already exploiting millimeter-wave (mmWave) spectra, it would become much more feasible to implement a joint radar and communication system by extending communication frequencies into the mmWave band. In this paper, an IEEE 802.11ad waveform-based radar imaging technique is proposed for vehicular settings. A roadside unit (RSU) transmits the IEEE 802.11ad… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

  39. arXiv:2207.09625  [pdf, other

    cs.CV

    Explicit Image Caption Editing

    Authors: Zhen Wang, Long Chen, Wenbo Ma, Guangxing Han, Yulei Niu, Jian Shao, Jun Xiao

    Abstract: Given an image and a reference caption, the image caption editing task aims to correct the misalignment errors and generate a refined caption. However, all existing caption editing works are implicit models, ie, they directly produce the refined captions without explicit connections to the reference captions. In this paper, we introduce a new task: Explicit Caption Editing (ECE). ECE models explic… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: ECCV 2022, dataset and code are available at https://github.com/baaaad/ECE

  40. arXiv:2207.07554  [pdf, ps, other

    cs.IT

    Renyi Entropy Rate of Stationary Ergodic Processes

    Authors: Chengyu Wu, Yonglong Li, Li Xu, Guangyue Han

    Abstract: In this paper, we examine the Renyi entropy rate of stationary ergodic processes. For a special class of stationary ergodic processes, we prove that the Renyi entropy rate always exists and can be polynomially approximated by its defining sequence; moreover, using the Markov approximation method, we show that the Renyi entropy rate can be exponentially approximated by that of the Markov approximat… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

  41. arXiv:2207.07370  [pdf, other

    eess.IV cs.CV

    CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer with Modality-Correlated Cross-Attention for Brain Tumor Segmentation

    Authors: Jianwei Lin, Jiatai Lin, Cheng Lu, Hao Chen, Huan Lin, Bingchao Zhao, Zhenwei Shi, Bingjiang Qiu, Xipeng Pan, Zeyan Xu, Biao Huang, Changhong Liang, Guoqiang Han, Zaiyi Liu, Chu Han

    Abstract: Brain tumor segmentation (BTS) in magnetic resonance image (MRI) is crucial for brain tumor diagnosis, cancer management and research purposes. With the great success of the ten-year BraTS challenges as well as the advances of CNN and Transformer algorithms, a lot of outstanding BTS models have been proposed to tackle the difficulties of BTS in different technical aspects. However, existing studie… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

  42. arXiv:2204.13065  [pdf

    cs.HC cs.DB cs.LG

    Treating Crowdsourcing as Examination: How to Score Tasks and Online Workers?

    Authors: Guangyang Han, Sufang Li, Runmin Wang, Chunming Wu

    Abstract: Crowdsourcing is an online outsourcing mode which can solve the current machine learning algorithm's urge need for massive labeled data. Requester posts tasks on crowdsourcing platforms, which employ online workers over the Internet to complete tasks, then aggregate and return results to requester. How to model the interaction between different types of workers and tasks is a hot spot. In this pap… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

  43. arXiv:2204.07841  [pdf, other

    cs.CV cs.AI cs.MM

    Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting

    Authors: Guangxing Han, Long Chen, Jiawei Ma, Shiyuan Huang, Rama Chellappa, Shih-Fu Chang

    Abstract: We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection, which are complementary to each other by definition. Most of the previous works on multi-modal FSOD are fine-tuning-based which are inefficient for online applications. Moreover, these methods usually require expertise like class names to extract cl… ▽ More

    Submitted 27 March, 2023; v1 submitted 16 April, 2022; originally announced April 2022.

    Comments: 17 pages

  44. arXiv:2204.06455  [pdf, other

    eess.IV cs.CV

    WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic Segmentation for Lung Adenocarcinoma

    Authors: Chu Han, Xipeng Pan, Lixu Yan, Huan Lin, Bingbing Li, Su Yao, Shanshan Lv, Zhenwei Shi, Jinhai Mai, Jiatai Lin, Bingchao Zhao, Zeyan Xu, Zhizhen Wang, Yumeng Wang, Yuan Zhang, Huihui Wang, Chao Zhu, Chunhui Lin, Lijian Mao, Min Wu, Luwen Duan, Jingsong Zhu, Dong Hu, Zijie Fang, Yang Chen , et al. (18 additional authors not shown)

    Abstract: Lung cancer is the leading cause of cancer death worldwide, and adenocarcinoma (LUAD) is the most common subtype. Exploiting the potential value of the histopathology images can promote precision medicine in oncology. Tissue segmentation is the basic upstream task of histopathology image analysis. Existing deep learning models have achieved superior segmentation performance but require sufficient… ▽ More

    Submitted 13 April, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

  45. arXiv:2204.03873  [pdf, other

    cs.CV

    Spatial Transformer Network on Skeleton-based Gait Recognition

    Authors: Cun Zhang, Xing-Peng Chen, Guo-Qiang Han, Xiang-Jie Liu

    Abstract: Skeleton-based gait recognition models usually suffer from the robustness problem, as the Rank-1 accuracy varies from 90\% in normal walking cases to 70\% in walking with coats cases. In this work, we propose a state-of-the-art robust skeleton-based gait recognition model called Gait-TR, which is based on the combination of spatial transformer frameworks and temporal convolutional networks. Gait-T… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

  46. arXiv:2203.15021  [pdf, other

    cs.CV cs.AI cs.MM

    Few-Shot Object Detection with Fully Cross-Transformer

    Authors: Guangxing Han, Jiawei Ma, Shiyuan Huang, Long Chen, Shih-Fu Chang

    Abstract: Few-shot object detection (FSOD), with the aim to detect novel objects using very few training examples, has recently attracted great research interest in the community. Metric-learning based methods have been demonstrated to be effective for this task using a two-branch based siamese network, and calculate the similarity between image regions and few-shot examples for detection. However, in previ… ▽ More

    Submitted 29 September, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: CVPR 2022 (Oral). Code is available at https://github.com/GuangxingHan/FCT

  47. arXiv:2202.01747  [pdf, other

    cs.CV

    The Met Dataset: Instance-level Recognition for Artworks

    Authors: Nikolaos-Antonios Ypsilantis, Noa Garcia, Guangxing Han, Sarah Ibrahimi, Nanne Van Noord, Giorgos Tolias

    Abstract: This work introduces a dataset for large-scale instance-level recognition in the domain of artworks. The proposed benchmark exhibits a number of different challenges such as large inter-class similarity, long tail distribution, and many classes. We rely on the open access collection of The Met museum to form a large training set of about 224k classes, where each class corresponds to a museum exhib… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

  48. arXiv:2202.01551  [pdf, ps, other

    cs.IT

    Isometries and MacWilliams Extension Property for Weighted Poset Metric

    Authors: Yang Xu, Haibin Kan, Guangyue Han

    Abstract: Let $\mathbf{H}$ be the cartesian product of a family of left modules over a ring $S$, indexed by a finite set $Ω$. We are concerned with the $(\mathbf{P},ω)$-weight on $\mathbf{H}$, where $\mathbf{P}=(Ω,\preccurlyeq_{\mathbf{P}})$ is a poset and $ω:Ω\longrightarrow\mathbb{R}^{+}$ is a weight function. We characterize the group of $(\mathbf{P},ω)$-weight isometries of $\mathbf{H}$, and give a cano… ▽ More

    Submitted 20 July, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: text overlap with arXiv:2201.10828

  49. arXiv:2201.10828  [pdf, ps, other

    cs.IT

    Reflexivity of Partitions Induced by Weighted Poset Metric and Combinatorial Metric

    Authors: Yang Xu, Haibin Kan, Guangyue Han

    Abstract: Let $\mathbf{H}$ be the Cartesian product of a family of finite abelian groups. Via a polynomial approach, we give sufficient conditions for a partition of $\mathbf{H}$ induced by weighted poset metric to be reflexive, which also become necessary for some special cases. Moreover, by examining the roots of the Krawtchouk polynomials, we establish non-reflexive partitions of $\mathbf{H}$ induced by… ▽ More

    Submitted 20 July, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

  50. arXiv:2201.05277  [pdf, other

    cs.CV

    Boundary-aware Self-supervised Learning for Video Scene Segmentation

    Authors: Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim

    Abstract: Self-supervised learning has drawn attention through its effectiveness in learning in-domain representations with no ground-truth annotations; in particular, it is shown that properly designed pretext tasks (e.g., contrastive prediction task) bring significant performance gains for downstream tasks (e.g., classification task). Inspired from this, we tackle video scene segmentation, which is a task… ▽ More

    Submitted 13 January, 2022; originally announced January 2022.

    Comments: The code is available at https://github.com/kakaobrain/bassl