Skip to main content

Showing 1–50 of 1,571 results for author: Wang, T

  1. arXiv:2407.02639  [pdf, other

    cs.CV

    Holistically-Nested Structure-Aware Graph Neural Network for Road Extraction

    Authors: Tinghuai Wang, Guangming Wang, Kuan Eeik Tan

    Abstract: Convolutional neural networks (CNN) have made significant advances in detecting roads from satellite images. However, existing CNN approaches are generally repurposed semantic segmentation architectures and suffer from the poor delineation of long and curved regions. Lack of overall road topology and structure information further deteriorates their performance on challenging remote sensing images.… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.02616  [pdf

    eess.IV cs.CV

    Deep Learning Based Apparent Diffusion Coefficient Map Generation1 from Multi-parametric MR Images for Patients with Diffuse Gliomas

    Authors: Zach Eidex, Mojtaba Safari, Jacob Wynne, Richard L. J. Qiu, Tonghe Wang, David Viar Hernandez, Hui-Kuo Shu, Hui Mao, Xiaofeng Yang

    Abstract: Purpose: Apparent diffusion coefficient (ADC) maps derived from diffusion weighted (DWI) MRI provides functional measurements about the water molecules in tissues. However, DWI is time consuming and very susceptible to image artifacts, leading to inaccurate ADC measurements. This study aims to develop a deep learning framework to synthesize ADC maps from multi-parametric MR images. Methods: We pro… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.15044

  3. arXiv:2407.01925  [pdf, other

    cs.CV

    Looking From the Future: Multi-order Iterations Can Enhance Adversarial Attack Transferability

    Authors: Zijian Ying, Qianmu Li, Tao Wang, Zhichao Lian, Shunmei Meng, Xuyun Zhang

    Abstract: Various methods try to enhance adversarial transferability by improving the generalization from different perspectives. In this paper, we rethink the optimization process and propose a novel sequence optimization concept, which is named Looking From the Future (LFF). LFF makes use of the original optimization process to refine the very first local optimization choice. Adapting the LFF concept to t… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  4. arXiv:2407.01733  [pdf, other

    cs.RO

    AquaMILR: Mechanical intelligence simplifies control of undulatory robots in cluttered fluid environments

    Authors: Tianyu Wang, Nishanth Mankame, Matthew Fernandez, Velin Kojouharov, Daniel I. Goldman

    Abstract: While undulatory swimming of elongate limbless robots has been extensively studied in open hydrodynamic environments, less research has been focused on limbless locomotion in complex, cluttered aquatic environments. Motivated by the concept of mechanical intelligence, where controls for obstacle navigation can be offloaded to passive body mechanics in terrestrial limbless locomotion, we hypothesiz… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  5. arXiv:2407.01525  [pdf, other

    cs.CV cs.AI cs.CL

    Empowering 3D Visual Grounding with Reasoning Capabilities

    Authors: Chenming Zhu, Tai Wang, Wenwei Zhang, Kai Chen, Xihui Liu

    Abstract: Although great progress has been made in 3D visual grounding, current models still rely on explicit textual descriptions for grounding and lack the ability to reason human intentions from implicit instructions. We propose a new task called 3D reasoning grounding and introduce a new benchmark ScanReason which provides over 10K question-answer-location pairs from five reasoning types that require th… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. A comprehensive and hierarchical 3D reasoning grounding benchmark in the era of foundation models. Project page: https://zcmax.github.io/projects/ScanReason

  6. arXiv:2407.01094  [pdf, other

    cs.CV

    Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

    Authors: Mingxiang Liao, Hannan Lu, Xinyu Zhang, Fang Wan, Tianyu Wang, Yuzhong Zhao, Wangmeng Zuo, Qixiang Ye, Jingdong Wang

    Abstract: Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content. Dynamics are an essential dimension for measuring the visual vividness and the honesty of video content to… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2406.20053  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation

    Authors: Danny Halawi, Alexander Wei, Eric Wallace, Tony T. Wang, Nika Haghtalab, Jacob Steinhardt

    Abstract: Black-box finetuning is an emerging interface for adapting state-of-the-art language models to user needs. However, such access may also let malicious actors undermine model safety. To demonstrate the challenge of defending finetuning interfaces, we introduce covert malicious finetuning, a method to compromise model safety via finetuning while evading detection. Our method constructs a malicious d… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 22 pages

  8. arXiv:2406.19311  [pdf, other

    cs.CR cs.SD eess.AS

    Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

    Authors: Zheng Fang, Tao Wang, Lingchen Zhao, Shenyi Zhang, Bowen Li, Yunjie Ge, Qi Li, Chao Shen, Qian Wang

    Abstract: In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024

  9. arXiv:2406.18558  [pdf, other

    cs.CV eess.IV

    BAISeg: Boundary Assisted Weakly Supervised Instance Segmentation

    Authors: Tengbo Wang, Yu Bai

    Abstract: How to extract instance-level masks without instance-level supervision is the main challenge of weakly supervised instance segmentation (WSIS). Popular WSIS methods estimate a displacement field (DF) via learning inter-pixel relations and perform clustering to identify instances. However, the resulting instance centroids are inherently unstable and vary significantly across different clustering al… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

  10. arXiv:2406.18532  [pdf, other

    cs.CL cs.AI cs.LG

    Symbolic Learning Enables Self-Evolving Agents

    Authors: Wangchunshu Zhou, Yixin Ou, Shengwei Ding, Long Li, Jialong Wu, Tiannan Wang, Jiamin Chen, Shuai Wang, Xiaohua Xu, Ningyu Zhang, Huajun Chen, Yuchen Eleanor Jiang

    Abstract: The AI community has been exploring a pathway to artificial general intelligence (AGI) by developing "language agents", which are complex large language models (LLMs) pipelines involving both prompting techniques and tool usage methods. While language agents have demonstrated impressive capabilities for many real-world tasks, a fundamental limitation of current language agents research is that the… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Code available at https://github.com/aiwaves-cn/agents

  11. arXiv:2406.18050  [pdf, other

    cs.CV

    A Multi-Stage Goal-Driven Network for Pedestrian Trajectory Prediction

    Authors: Xiuen Wu, Tao Wang, Yuanzheng Cai, Lingyu Liang, George Papageorgiou

    Abstract: Pedestrian trajectory prediction plays a pivotal role in ensuring the safety and efficiency of various applications, including autonomous vehicles and traffic management systems. This paper proposes a novel method for pedestrian trajectory prediction, called multi-stage goal-driven network (MGNet). Diverging from prior approaches relying on stepwise recursive prediction and the singular forecastin… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Paper accepted by 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL 2024)

  12. arXiv:2406.17697  [pdf, other

    cs.LG cs.AI cs.CV

    HGTDP-DTA: Hybrid Graph-Transformer with Dynamic Prompt for Drug-Target Binding Affinity Prediction

    Authors: Xi Xiao, Wentao Wang, Jiacheng Xie, Lijing Zhu, Gaofei Chen, Zhengji Li, Tianyang Wang, Min Xu

    Abstract: Drug target binding affinity (DTA) is a key criterion for drug screening. Existing experimental methods are time-consuming and rely on limited structural and domain information. While learning-based methods can model sequence and structural information, they struggle to integrate contextual data and often lack comprehensive modeling of drug-target interactions. In this study, we propose a novel DT… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  13. Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning

    Authors: Tianfu Wang, Li Shen, Qilin Fan, Tong Xu, Tongliang Liu, Hui Xiong

    Abstract: As an essential resource management problem in network virtualization, virtual network embedding (VNE) aims to allocate the finite resources of physical network to sequentially arriving virtual network requests (VNRs) with different resource demands. Since this is an NP-hard combinatorial optimization problem, many efforts have been made to provide viable solutions. However, most existing approach… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Services Computing (TSC)

    Journal ref: IEEE Transactions on Services Computing ( Volume: 17, Issue: 3, May-June 2024)

  14. arXiv:2406.16707  [pdf, other

    cs.LG cs.AI

    Probabilistic Subgoal Representations for Hierarchical Reinforcement learning

    Authors: Vivienne Huiling Wang, Tinghuai Wang, Wenyan Yang, Joni-Kristian Kämäräinen, Joni Pajarinen

    Abstract: In goal-conditioned hierarchical reinforcement learning (HRL), a high-level policy specifies a subgoal for the low-level policy to reach. Effective HRL hinges on a suitable subgoal represen tation function, abstracting state space into latent subgoal space and inducing varied low-level behaviors. Existing methods adopt a subgoal representation that provides a deterministic mapping from state space… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  15. arXiv:2406.16698  [pdf, other

    cs.LG cs.CY

    Learning Interpretable Fair Representations

    Authors: Tianhao Wang, Zana Buçinca, Zilin Ma

    Abstract: Numerous approaches have been recently proposed for learning fair representations that mitigate unfair outcomes in prediction tasks. A key motivation for these methods is that the representations can be used by third parties with unknown objectives. However, because current fair representations are generally not interpretable, the third party cannot use these fair representations for exploration,… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  16. arXiv:2406.14558  [pdf, other

    cs.RO cs.AI

    CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

    Authors: Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, Jinkun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, Jiangmiao Pang

    Abstract: Recent years have seen significant advancements in humanoid control, largely due to the availability of large-scale motion capture data and the application of reinforcement learning methodologies. However, many real-world tasks, such as moving large and heavy furniture, require multi-character collaboration. Given the scarcity of data on multi-character collaboration and the efficiency challenges… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  17. arXiv:2406.14500  [pdf, other

    cs.CL

    Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary

    Authors: Xingmeng Zhao, Tongnian Wang, Anthony Rios

    Abstract: Radiology report summarization (RRS) is crucial for patient care, requiring concise "Impressions" from detailed "Findings." This paper introduces a novel prompting strategy to enhance RRS by first generating a layperson summary. This approach normalizes key observations and simplifies complex information using non-expert communication techniques inspired by doctor-patient interactions. Combined wi… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  18. arXiv:2406.13923  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

    Authors: Junjie Wang, Yin Zhang, Yatai Ji, Yuxiang Zhang, Chunyang Jiang, Yubo Wang, Kang Zhu, Zekun Wang, Tiezhen Wang, Wenhao Huang, Jie Fu, Bei Chen, Qunshu Lin, Minghao Liu, Ge Zhang, Wenhu Chen

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have leveraged extensive multimodal datasets to enhance capabilities in complex knowledge-driven tasks. However, persistent challenges in perceptual and reasoning errors limit their efficacy, particularly in interpreting intricate visual data and deducing multimodal relationships. Addressing these issues, we introduce a novel dataset format, PI… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  19. arXiv:2406.13763  [pdf, other

    cs.CV cs.AI

    Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models

    Authors: Zhawnen Chen, Tianchun Wang, Yizhou Wang, Michal Kosinski, Xiang Zhang, Yun Fu, Sheng Li

    Abstract: Can large multimodal models have a human-like ability for emotional and social reasoning, and if so, how does it work? Recent research has discovered emergent theory-of-mind (ToM) reasoning capabilities in large language models (LLMs). LLMs can reason about people's mental states by solving various text-based ToM tasks that ask questions about the actors' ToM (e.g., human belief, desire, intention… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  20. arXiv:2406.13025  [pdf, other

    cs.LG cs.RO eess.SY

    ABNet: Attention BarrierNet for Safe and Scalable Robot Learning

    Authors: Wei Xiao, Tsun-Hsuan Wang, Daniela Rus

    Abstract: Safe learning is central to AI-enabled robots where a single failure may lead to catastrophic results. Barrier-based method is one of the dominant approaches for safe robot learning. However, this method is not scalable, hard to train, and tends to generate unstable signals under noisy inputs that are challenging to be deployed for robots. To address these challenges, we propose a novel Attentio… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 18 pages

  21. arXiv:2406.12843  [pdf, other

    cs.LG cs.AI stat.ML

    Can Go AIs be adversarially robust?

    Authors: Tom Tseng, Euan McLean, Kellin Pelrine, Tony T. Wang, Adam Gleave

    Abstract: Prior work found that superhuman Go AIs like KataGo can be defeated by simple adversarial strategies. In this paper, we study if simple defenses can improve KataGo's worst-case performance. We test three natural defenses: adversarial training on hand-constructed positions, iterated adversarial training, and changing the network architecture. We find that some of these defenses are able to protect… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 67 pages

  22. arXiv:2406.12723  [pdf, other

    cs.LG

    BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

    Authors: Zahra Gharaee, Scott C. Lowe, ZeMing Gong, Pablo Millan Arias, Nicholas Pellegrino, Austin T. Wang, Joakim Bruslund Haurum, Iuliia Zarubiieva, Lila Kari, Dirk Steinke, Graham W. Taylor, Paul Fieguth, Angel X. Chang

    Abstract: As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, this paper presents the BIOSCAN-5M Insect dataset to the machine learning community and establish several benchmark tasks. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by includin… ▽ More

    Submitted 24 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  23. arXiv:2406.12195  [pdf, other

    quant-ph cs.LG

    Quantum Compiling with Reinforcement Learning on a Superconducting Processor

    Authors: Z. T. Wang, Qiuhao Chen, Yuxuan Du, Z. H. Yang, Xiaoxia Cai, Kaixuan Huang, Jingning Zhang, Kai Xu, Jun Du, Yinan Li, Yuling Jiao, Xingyao Wu, Wu Liu, Xiliang Lu, Huikai Xu, Yirong Jin, Ruixia Wang, Haifeng Yu, S. P. Zhao

    Abstract: To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcemen… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  24. arXiv:2406.11202  [pdf, other

    cs.CV cs.GR

    Consistency^2: Consistent and Fast 3D Painting with Latent Consistency Models

    Authors: Tianfu Wang, Anton Obukhov, Konrad Schindler

    Abstract: Generative 3D Painting is among the top productivity boosters in high-resolution 3D asset management and recycling. Ever since text-to-image models became accessible for inference on consumer hardware, the performance of 3D Painting methods has consistently improved and is currently close to plateauing. At the core of most such models lies denoising diffusion in the latent space, an inherently tim… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  25. arXiv:2406.11011  [pdf, other

    cs.LG cs.CL stat.ML

    Data Shapley in One Training Run

    Authors: Jiachen T. Wang, Prateek Mittal, Dawn Song, Ruoxi Jia

    Abstract: Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts. However, existing approaches require re-training models on different data subsets, which is computationally intensive, foreclosing their application to large-scale models. Furthermore, they produce the same attribution score for any models produced by running the learning algorithm, m… ▽ More

    Submitted 29 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  26. arXiv:2406.10591  [pdf, other

    eess.AS cs.AI cs.CV cs.MM cs.SD

    MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

    Authors: Ruibo Fu, Shuchen Shi, Hongming Guo, Tao Wang, Chunyu Qiang, Zhengqi Wen, Jianhua Tao, Xin Qi, Yi Lu, Xiaopeng Wang, Zhiyong Wang, Yukun Liu, Xuefei Liu, Shuai Zhang, Guanjun Li

    Abstract: Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  27. arXiv:2406.10160  [pdf, other

    cs.SD cs.AI eess.AS

    One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

    Authors: Zhaoqing Li, Haoning Xu, Tianzi Wang, Shoukang Hu, Zengrui Jin, Shujie Hu, Jiajun Deng, Mingyu Cui, Mengzhe Geng, Xunying Liu

    Abstract: We propose a novel one-pass multiple ASR systems joint compression and quantization approach using an all-in-one neural model. A single compression cycle allows multiple nested systems with varying Encoder depths, widths, and quantization precision settings to be simultaneously constructed without the need to train and store individual target systems separately. Experiments consistently demonstrat… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  28. arXiv:2406.10152  [pdf, other

    cs.SD eess.AS

    Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

    Authors: Guinan Li, Jiajun Deng, Youjun Chen, Mengzhe Geng, Shujie Hu, Zhe Li, Zengrui Jin, Tianzi Wang, Xurong Xie, Helen Meng, Xunying Liu

    Abstract: This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and tightly integrated with the complete system training. Experiments conducted on LRS3-TED data simulated multichannel overlapped speech suggest that joint… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  29. arXiv:2406.10100  [pdf, other

    cs.CV cs.AI

    SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding

    Authors: Junwei Luo, Zhen Pang, Yongjun Zhang, Tingzhu Wang, Linlin Wang, Bo Dang, Jiangwei Lao, Jian Wang, Jingdong Chen, Yihua Tan, Yansheng Li

    Abstract: Remote Sensing Large Multi-Modal Models (RSLMMs) are developing rapidly and showcase significant capabilities in remote sensing imagery (RSI) comprehension. However, due to the limitations of existing datasets, RSLMMs have shortcomings in understanding the rich semantic relations among objects in complex remote sensing scenes. To unlock RSLMMs' complex comprehension ability, we propose a large-sca… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 30 pages, 5 figures, 19 tables, dataset and code see https://github.com/Luo-Z13/SkySenseGPT

  30. arXiv:2406.10034  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

    Authors: Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui Jing, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu

    Abstract: This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam s… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 2 tables, Interspeech24 conference

  31. arXiv:2406.09873  [pdf, other

    eess.AS cs.AI cs.SD

    Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

    Authors: Yicong Jiang, Tianzi Wang, Xurong Xie, Juan Liu, Wei Sun, Nan Yan, Hui Chen, Lan Wang, Xunying Liu, Feng Tian

    Abstract: Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities between dysarthric and non-dysarthric speakers, and significant speaker variations stemming from the disorder. This paper introduces Perceiver-Prompt, a… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by interspeech 2024

  32. arXiv:2406.09669  [pdf, other

    cs.CR

    Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models

    Authors: Changjiang Li, Ren Pang, Bochuan Cao, Jinghui Chen, Fenglong Ma, Shouling Ji, Ting Wang

    Abstract: Thanks to their remarkable denoising capabilities, diffusion models are increasingly being employed as defensive tools to reinforce the security of other models, notably in purifying adversarial examples and certifying adversarial robustness. However, the security risks of these practices themselves remain largely unexplored, which is highly concerning. To bridge this gap, this work investigates t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  33. arXiv:2406.09410  [pdf, other

    cs.CV cs.AI

    STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery

    Authors: Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, Bo Dang, Yongjun Zhang, Yi Yu, Junchi Yan

    Abstract: Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it attractive to holistically conduct SGG in large-size very-high-resolution (VHR… ▽ More

    Submitted 3 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 18 pages, 11 figures

  34. arXiv:2406.09401  [pdf, other

    cs.CV cs.AI cs.RO

    MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

    Authors: Ruiyuan Lyu, Tai Wang, Jingli Lin, Shuai Yang, Xiaohan Mao, Yilun Chen, Runsen Xu, Haifeng Huang, Chenming Zhu, Dahua Lin, Jiangmiao Pang

    Abstract: With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress. However, limited by existing datasets, previous works mainly focus on understanding object properties or inter-object spatial relationships in a 3D scene. To tackle this problem, this paper builds the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Follow-up of EmbodiedScan. A multi-modal 3D dataset with the most-ever comprehensive language annotations for 3D-LLMs. Project page: https://tai-wang.github.io/mmscan/

  35. arXiv:2406.07843  [pdf, other

    cs.CV q-bio.NC

    Incremental Learning and Self-Attention Mechanisms Improve Neural System Identification

    Authors: Isaac Lin, Tianye Wang, Shang Gao, Shiming Tang, Tai Sing Lee

    Abstract: Convolutional neural networks (CNNs) have been shown to be the state-of-the-art approach for modeling the transfer functions of visual cortical neurons. Cortical neurons in the primary visual cortex are are sensitive to contextual information mediated by extensive horizontal and feedback connections. Standard CNNs can integrate global spatial image information to model such contextual modulation v… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Preprint NeurIPS 2024

  36. arXiv:2406.07832  [pdf, other

    cs.SD eess.AS

    SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition

    Authors: Tianhao Wang, Lantian Li, Dong Wang

    Abstract: Deploying a well-optimized pre-trained speaker recognition model in a new domain often leads to a significant decline in performance. While fine-tuning is a commonly employed solution, it demands ample adaptation data and suffers from parameter inefficiency, rendering it impractical for real-world applications with limited data available for model adaptation. Drawing inspiration from the success o… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024

  37. arXiv:2406.07647  [pdf, other

    cs.CR

    FP-Inconsistent: Detecting Evasive Bots using Browser Fingerprint Inconsistencies

    Authors: Hari Venugopalan, Shaoor Munir, Shuaib Ahmed, Tangbaihe Wang, Samuel T. King, Zubair Shafiq

    Abstract: As browser fingerprinting is increasingly being used for bot detection, bots have started altering their fingerprints for evasion. We conduct the first large-scale evaluation of evasive bots to investigate whether and how altering fingerprints helps bots evade detection. To systematically investigate evasive bots, we deploy a honey site incorporating two anti-bot services (DataDome and BotD) and s… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  38. arXiv:2406.07428  [pdf, other

    cs.GT cs.AI cs.LG

    GemNet: Menu-Based, Strategy-Proof Multi-Bidder Auctions Through Deep Learning

    Authors: Tonghan Wang, Yanchen Jiang, David C. Parkes

    Abstract: Differentiable economics uses deep learning for automated mechanism design. Despite strong progress, it has remained an open problem to learn multi-bidder, general, and fully strategy-proof (SP) auctions. We introduce GEneral Menu-based NETwork (GemNet), which significantly extends the menu-based approach of RochetNet [Dütting et al., 2023] to the multi-bidder setting. The challenge in achieving S… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  39. arXiv:2406.05814  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Unified Text-to-Image Generation and Retrieval

    Authors: Leigang Qu, Haochuan Li, Tan Wang, Wenjie Wang, Yongqi Li, Liqiang Nie, Tat-Seng Chua

    Abstract: How humans can efficiently and effectively acquire images has always been a perennial question. A typical solution is text-to-image retrieval from an existing database given the text query; however, the limited database typically lacks creativity. By contrast, recent breakthroughs in text-to-image generation have made it possible to produce fancy and diverse visual content, but it faces challenges… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  40. arXiv:2406.05773  [pdf, other

    cs.CV

    CorrMAE: Pre-training Correspondence Transformers with Masked Autoencoder

    Authors: Tangfei Liao, Xiaoqin Zhang, Guobao Xiao, Min Li, Tao Wang, Mang Ye

    Abstract: Pre-training has emerged as a simple yet powerful methodology for representation learning across various domains. However, due to the expensive training cost and limited data, pre-training has not yet been extensively studied in correspondence pruning. To tackle these challenges, we propose a pre-training method to acquire a generic inliers-consistent representation by reconstructing masked corres… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  41. arXiv:2406.04840  [pdf, other

    cs.SD eess.AS

    TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking

    Authors: Junzuo Zhou, Jiangyan Yi, Tao Wang, Jianhua Tao, Ye Bai, Chu Yuan Zhang, Yong Ren, Zhengqi Wen

    Abstract: Various threats posed by the progress in text-to-speech (TTS) have prompted the need to reliably trace synthesized speech. However, contemporary approaches to this task involve adding watermarks to the audio separately after generation, a process that hurts both speech quality and watermark imperceptibility. In addition, these approaches are limited in robustness and flexibility. To address these… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: acceped by interspeech 2024

  42. arXiv:2406.04683  [pdf, other

    cs.SD eess.AS

    PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

    Authors: Shuchen Shi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Tao Wang, Chunyu Qiang, Yi Lu, Xin Qi, Xuefei Liu, Yukun Liu, Yongwei Li, Zhiyong Wang, Xiaopeng Wang

    Abstract: Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description, playing a crucial role in media production. The text descriptions in TTA datasets lack rich variations and diversity, resulting in a drop in TTA model performance when faced with complex text. To address this issue, we propose a method called Portable Plug-in Prompt Refiner, which utilizes rich knowledge abo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: accepted by INTERSPEECH2024

  43. arXiv:2406.04478  [pdf, other

    cs.CL cs.LG

    PromptFix: Few-shot Backdoor Removal via Adversarial Prompt Tuning

    Authors: Tianrong Zhang, Zhaohan Xi, Ting Wang, Prasenjit Mitra, Jinghui Chen

    Abstract: Pre-trained language models (PLMs) have attracted enormous attention over the past few years with their unparalleled performances. Meanwhile, the soaring cost to train PLMs as well as their amazing generalizability have jointly contributed to few-shot fine-tuning and prompting as the most popular training paradigms for natural language processing (NLP) models. Nevertheless, existing studies have s… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: NAACL 2024

  44. arXiv:2406.04300  [pdf, other

    cs.RO

    Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models

    Authors: Phat Nguyen, Tsun-Hsuan Wang, Zhang-Wei Hong, Sertac Karaman, Daniela Rus

    Abstract: Generating varied scenarios through simulation is crucial for training and evaluating safety-critical systems, such as autonomous vehicles. Yet, the task of modeling the trajectories of other vehicles to simulate diverse and meaningful close interactions remains prohibitively costly. Adopting language descriptions to generate driving behaviors emerges as a promising strategy, offering a scalable a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 14 pages, 7 figures

  45. arXiv:2406.03923  [pdf, other

    cs.LG math.NA

    Latent Neural Operator for Solving Forward and Inverse PDE Problems

    Authors: Tian Wang, Chuang Wang

    Abstract: Neural operators effectively solve PDE problems from data without knowing the explicit equations, which learn the map from the input sequences of observed samples to the predicted values. Most existed works build the model in the original geometric space, leading to high computational costs when the number of sample points is large. We present the Latent Neural Operator (LNO) solving PDEs in the l… ▽ More

    Submitted 9 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  46. arXiv:2406.02428  [pdf, other

    cs.LG

    Harnessing Neural Unit Dynamics for Effective and Scalable Class-Incremental Learning

    Authors: Depeng Li, Tianqi Wang, Junwei Chen, Wei Dai, Zhigang Zeng

    Abstract: Class-incremental learning (CIL) aims to train a model to learn new classes from non-stationary data streams without forgetting old ones. In this paper, we propose a new kind of connectionist model by tailoring neural unit dynamics that adapt the behavior of neural networks for CIL. In each training session, it introduces a supervisory mechanism to guide network expansion whose growth size is comp… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  47. arXiv:2406.02426  [pdf, other

    math.OC cs.LG

    Contextual Optimization under Covariate Shift: A Robust Approach by Intersecting Wasserstein Balls

    Authors: Tianyu Wang, Ningyuan Chen, Chun Wang

    Abstract: In contextual optimization, a decision-maker observes historical samples of uncertain variables and associated concurrent covariates, without knowing their joint distribution. Given an additional covariate observation, the goal is to choose a decision that minimizes some operational costs. A prevalent issue here is covariate shift, where the marginal distribution of the new covariate differs from… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  48. arXiv:2406.02291  [pdf, other

    cs.NI eess.SP

    A deep-learning-based MAC for integrating channel access, rate adaptation and channel switch

    Authors: Jiantao Xin, Wei Xu, Bin Cao, Taotao Wang, Shengli Zhang

    Abstract: With increasing density and heterogeneity in unlicensed wireless networks, traditional MAC protocols, such as carrier-sense multiple access with collision avoidance (CSMA/CA) in Wi-Fi networks, are experiencing performance degradation. This is manifested in increased collisions and extended backoff times, leading to diminished spectrum efficiency and protocol coordination. Addressing these issues,… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  49. arXiv:2406.02239  [pdf, other

    cs.NI

    Decentralized Physical Infrastructure Network (DePIN): Challenges and Opportunities

    Authors: Zhibin Lin, Taotao Wang, Long Shi, Shengli Zhang, Bin Cao

    Abstract: The widespread use of the Internet has posed challenges to existing centralized physical infrastructure networks. Issues such as data privacy risks, service disruptions, and substantial expansion costs have emerged. To address these challenges, an innovative network architecture called Decentralized Physical Infrastructure Network (DePIN) has emerged. DePIN leverages blockchain technology to decen… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  50. arXiv:2406.02176  [pdf, other

    cs.LG

    AROMA: Preserving Spatial Structure for Latent PDE Modeling with Local Neural Fields

    Authors: Louis Serrano, Thomas X Wang, Etienne Le Naour, Jean-Noël Vittaut, Patrick Gallinari

    Abstract: We present AROMA (Attentive Reduced Order Model with Attention), a framework designed to enhance the modeling of partial differential equations (PDEs) using local neural fields. Our flexible encoder-decoder architecture can obtain smooth latent representations of spatial physical fields from a variety of data types, including irregular-grid inputs and point clouds. This versatility eliminates the… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.