Skip to main content

Showing 1–50 of 212 results for author: Ma, B

  1. arXiv:2407.09053  [pdf, other

    cs.RO

    Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing

    Authors: Jun Zhu, Zihao Du, Haotian Xu, Fengbo Lan, Zilong Zheng, Bo Ma, Shengjie Wang, Tao Zhang

    Abstract: Task-aware navigation continues to be a challenging area of research, especially in scenarios involving open vocabulary. Previous studies primarily focus on finding suitable locations for task completion, often overlooking the importance of the robot's pose. However, the robot's orientation is crucial for successfully completing tasks because of how objects are arranged (e.g., to open a refrigerat… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  2. arXiv:2407.07345  [pdf, other

    cs.CV

    Micro-Expression Recognition by Motion Feature Extraction based on Pre-training

    Authors: Ruolin Li, Lu Wang, Tingting Yang, Lisheng Xu, Bingyang Ma, Yongchun Li, Hongchao Wei

    Abstract: Micro-expressions (MEs) are spontaneous, unconscious facial expressions that have promising applications in various fields such as psychotherapy and national security. Thus, micro-expression recognition (MER) has attracted more and more attention from researchers. Although various MER methods have emerged especially with the development of deep learning techniques, the task still faces several cha… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  3. arXiv:2407.05112  [pdf, other

    cs.CR cs.AI

    Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine Unlearning

    Authors: Binhao Ma, Tianhang Zheng, Hongsheng Hu, Di Wang, Shuo Wang, Zhongjie Ba, Zhan Qin, Kui Ren

    Abstract: Machine learning models trained on vast amounts of real or synthetic data often achieve outstanding predictive performance across various domains. However, this utility comes with increasing concerns about privacy, as the training data may include sensitive information. To address these concerns, machine unlearning has been proposed to erase specific data samples from models. While some unlearning… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  4. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  5. arXiv:2406.12434  [pdf, other

    cs.SD cs.LG eess.AS

    Towards Audio Codec-based Speech Separation

    Authors: Jia Qi Yip, Shengkui Zhao, Dianwen Ng, Eng Siong Chng, Bin Ma

    Abstract: Recent improvements in neural audio codec (NAC) models have generated interest in adopting pre-trained codecs for a variety of speech processing applications to take advantage of the efficiencies gained from high compression, but these have yet been applied to the speech separation (SS) task. SS can benefit from high compression because the compute required for traditional SS models makes them imp… ▽ More

    Submitted 5 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: This paper was accepted by Interspeech 2024

  6. arXiv:2406.11831  [pdf, other

    cs.CV

    Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

    Authors: Bingqi Ma, Zhuofan Zong, Guanglu Song, Hongsheng Li, Yu Liu

    Abstract: Large language models (LLMs) based on decoder-only transformers have demonstrated superior text understanding capabilities compared to CLIP and T5-series models. However, the paradigm for utilizing current advanced LLMs in text-to-image diffusion models remains to be explored. We observed an unusual phenomenon: directly using a large language model as the prompt encoder significantly degrades the… ▽ More

    Submitted 21 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  7. arXiv:2406.11096  [pdf, other

    cs.CL

    The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models

    Authors: Bolei Ma, Xinpeng Wang, Tiancheng Hu, Anna-Carolina Haensch, Michael A. Hedderich, Barbara Plank, Frauke Kreuter

    Abstract: Recent advances in Large Language Models (LLMs) have sparked wide interest in validating and comprehending the human-like cognitive-behavioral traits LLMs may have. These cognitive-behavioral traits include typically Attitudes, Opinions, Values (AOV). However, measuring AOV embedded within LLMs remains opaque, and different evaluation methods may yield different results. This has led to a lack of… ▽ More

    Submitted 1 July, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  8. arXiv:2406.10961  [pdf, other

    cs.CV cs.AI cs.CY

    Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP

    Authors: Shuyang Lin, Tong Jia, Hao Wang, Bowen Ma, Mingyuan Li, Dongyue Chen

    Abstract: X-ray prohibited item detection is an essential component of security check and categories of prohibited item are continuously increasing in accordance with the latest laws. Previous works all focus on close-set scenarios, which can only recognize known categories used for training and often require time-consuming as well as labor-intensive annotations when learning novel categories, resulting in… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  9. arXiv:2406.08897  [pdf, other

    cs.LG

    Motif-driven Subgraph Structure Learning for Graph Classification

    Authors: Zhiyao Zhou, Sheng Zhou, Bochao Mao, Jiawei Chen, Qingyun Sun, Yan Feng, Chun Chen, Can Wang

    Abstract: To mitigate the suboptimal nature of graph structure, Graph Structure Learning (GSL) has emerged as a promising approach to improve graph structure and boost performance in downstream tasks. Despite the proposal of numerous GSL methods, the progresses in this field mostly concentrated on node-level tasks, while graph-level tasks (e.g., graph classification) remain largely unexplored. Notably, appl… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 16 pages, 8 figures

  10. arXiv:2406.06960  [pdf, ps, other

    cs.LG

    Low Rank Multi-Dictionary Selection at Scale

    Authors: Boya Ma, Maxwell McNeil, Abram Magner, Petko Bogdanov

    Abstract: The sparse dictionary coding framework represents signals as a linear combination of a few predefined dictionary atoms. It has been employed for images, time series, graph signals and recently for 2-way (or 2D) spatio-temporal data employing jointly temporal and spatial dictionaries. Large and over-complete dictionaries enable high-quality models, but also pose scalability challenges which are exa… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 25--29, 2024, Barcelona, Spain

  11. arXiv:2406.03176  [pdf, other

    cs.CV

    MMCL: Boosting Deformable DETR-Based Detectors with Multi-Class Min-Margin Contrastive Learning for Superior Prohibited Item Detection

    Authors: Mingyuan Li, Tong Jia, Hui Lu, Bowen Ma, Hao Wang, Dongyue Chen

    Abstract: Prohibited Item detection in X-ray images is one of the most effective security inspection methods.However, differing from natural light images, the unique overlapping phenomena in X-ray images lead to the coupling of foreground and background features, thereby lowering the accuracy of general object detectors.Therefore, we propose a Multi-Class Min-Margin Contrastive Learning (MMCL) method that,… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 14 pages, 6 figures

  12. arXiv:2406.02273  [pdf, ps, other

    math.OC cs.LG

    A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods

    Authors: Junwen Qiu, Bohao Ma, Xiao Li, Andre Milzarek

    Abstract: We propose a novel analysis framework for non-descent-type optimization methodologies in nonconvex scenarios based on the Kurdyka-Lojasiewicz property. Our framework allows covering a broad class of algorithms, including those commonly employed in stochastic and distributed optimization. Specifically, it enables the analysis of first-order methods that lack a sufficient descent property and do not… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 29 pages

    MSC Class: 90C06; 90C26; 90C30

  13. arXiv:2406.02009  [pdf, other

    eess.AS cs.CL cs.SD

    Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

    Authors: Kun Zhou, Shengkui Zhao, Yukun Ma, Chong Zhang, Hao Wang, Dianwen Ng, Chongjia Ni, Nguyen Trung Hieu, Jia Qi Yip, Bin Ma

    Abstract: Recent language model-based text-to-speech (TTS) frameworks demonstrate scalability and in-context learning capabilities. However, they suffer from robustness issues due to the accumulation of errors in speech unit predictions during autoregressive language modeling. In this paper, we propose a phonetic enhanced language modeling method to improve the performance of TTS models. We leverage self-su… ▽ More

    Submitted 11 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  14. arXiv:2406.00943  [pdf, other

    cs.LG cs.AI

    State Space Models on Temporal Graphs: A First-Principles Study

    Authors: Jintang Li, Ruofan Wu, Xinzhou Jin, Boqun Ma, Liang Chen, Zibin Zheng

    Abstract: Over the past few years, research on deep graph learning has shifted from static graphs to temporal graphs in response to real-world complex systems that exhibit dynamic behaviors. In practice, temporal graphs are formalized as an ordered sequence of static graph snapshots observed at discrete time points. Sequence models such as RNNs or Transformers have long been the predominant backbone network… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Preprint; Code will be made available at https://github.com/EdisonLeeeee/GraphSSM

  15. arXiv:2405.20596  [pdf, other

    cs.CV cs.LG

    Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation

    Authors: Jiachen Liang, Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, Xilin Chen

    Abstract: Traditional semi-supervised learning (SSL) assumes that the feature distributions of labeled and unlabeled data are consistent which rarely holds in realistic scenarios. In this paper, we propose a novel SSL setting, where unlabeled samples are drawn from a mixed distribution that deviates from the feature distribution of labeled samples. Under this setting, previous SSL methods tend to predict wr… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 10 pages; Accepted by NeurIPS 2023

  16. arXiv:2405.16954  [pdf, ps, other

    math.OC cs.LG

    Convergence of SGD with momentum in the nonconvex case: A time window-based analysis

    Authors: Junwen Qiu, Bohao Ma, Andre Milzarek

    Abstract: We propose a novel time window-based analysis technique to investigate the convergence properties of the stochastic gradient descent method with momentum (SGDM) in nonconvex settings. Despite its popularity, the convergence behavior of SGDM remains less understood in nonconvex scenarios. This is primarily due to the absence of a sufficient descent property and challenges in simultaneously controll… ▽ More

    Submitted 23 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 25 pages

  17. arXiv:2405.11468  [pdf, other

    cs.CV

    Emphasizing Crucial Features for Efficient Image Restoration

    Authors: Hu Gao, Bowen Ma, Ying Zhang, Jingfan Yang, Jing Yang, Depeng Dang

    Abstract: Image restoration is a challenging ill-posed problem which estimates latent sharp image from its degraded counterpart. Although the existing methods have achieved promising performance by designing novelty architecture of module, they ignore the fact that different regions in a corrupted image undergo varying degrees of degradation. In this paper, we propose an efficient and effective framework to… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  18. arXiv:2405.08427  [pdf, other

    cs.CL cs.AI

    Impact of Stickers on Multimodal Chat Sentiment Analysis and Intent Recognition: A New Task, Dataset and Baseline

    Authors: Yuanchen Shi, Biao Ma, Fang Kong

    Abstract: Stickers are increasingly used in social media to express sentiment and intent. When finding typing troublesome, people often use a sticker instead. Despite the significant impact of stickers on sentiment analysis and intent recognition, little research has been conducted. To address this gap, we propose a new task: Multimodal chat Sentiment Analysis and Intent Recognition involving Stickers (MSAI… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures

  19. arXiv:2405.02208  [pdf, other

    eess.IV cs.CV

    Reference-Free Image Quality Metric for Degradation and Reconstruction Artifacts

    Authors: Han Cui, Alfredo De Goyeneche, Efrat Shimron, Boyuan Ma, Michael Lustig

    Abstract: Image Quality Assessment (IQA) is essential in various Computer Vision tasks such as image deblurring and super-resolution. However, most IQA methods require reference images, which are not always available. While there are some reference-free IQA metrics, they have limitations in simulating human perception and discerning subtle image quality variations. We hypothesize that the JPEG quality facto… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  20. arXiv:2405.00393  [pdf, other

    cs.CR

    Inferring State Machine from the Protocol Implementation via Large Language Model

    Authors: Haiyang Wei, Zhengjie Du, Haohui Huang, Yue Liu, Guang Cheng, Linzhang Wang, Bing Mao

    Abstract: State machines play a pivotal role in augmenting the efficacy of protocol analyzing to unveil more vulnerabilities. However, the task of inferring state machines from network protocol implementations presents significant challenges. Traditional methods based on dynamic analysis often overlook crucial state transitions due to limited coverage, while static analysis faces difficulties with complex c… ▽ More

    Submitted 14 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  21. arXiv:2404.18539  [pdf, other

    cs.CV cs.AI

    Enhancing Boundary Segmentation for Topological Accuracy with Skeleton-based Methods

    Authors: Chuni Liu, Boyuan Ma, Xiaojuan Ban, Yujie Xie, Hao Wang, Weihua Xue, Jingchao Ma, Ke Xu

    Abstract: Topological consistency plays a crucial role in the task of boundary segmentation for reticular images, such as cell membrane segmentation in neuron electron microscopic images, grain boundary segmentation in material microscopic images and road segmentation in aerial images. In these fields, topological changes in segmentation results have a serious impact on the downstream tasks, which can even… ▽ More

    Submitted 7 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  22. arXiv:2404.13046  [pdf, other

    cs.CV

    MoVA: Adapting Mixture of Vision Experts to Multimodal Context

    Authors: Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu

    Abstract: As the key component in multimodal large language models (MLLMs), the ability of the visual encoder greatly affects MLLM's understanding on diverse image content. Although some large-scale pretrained vision encoders such as vision encoders in CLIP and DINOv2 have brought promising performance, we found that there is still no single vision encoder that can dominate various image content understandi… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  23. arXiv:2404.09507  [pdf, other

    cs.CV

    Clothes-Changing Person Re-Identification with Feasibility-Aware Intermediary Matching

    Authors: Jiahe Zhao, Ruibing Hou, Hong Chang, Xinqian Gu, Bingpeng Ma, Shiguang Shan, Xilin Chen

    Abstract: Current clothes-changing person re-identification (re-id) approaches usually perform retrieval based on clothes-irrelevant features, while neglecting the potential of clothes-relevant features. However, we observe that relying solely on clothes-irrelevant features for clothes-changing re-id is limited, since they often lack adequate identity information and suffer from large intra-class variations… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  24. arXiv:2404.08450  [pdf, other

    cs.CV

    Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

    Authors: Xianhua He, Dashuang Liang, Song Yang, Zhanlong Hao, Hui Ma, Binjie Mao, Xi Li, Yao Wang, Pengfei Yan, Ajian Liu

    Abstract: Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to dev… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 10 pages with 6 figures, Accepted by CVPRW 2024

  25. arXiv:2404.08382  [pdf, other

    cs.CL cs.AI

    Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think

    Authors: Xinpeng Wang, Chengzhi Hu, Bolei Ma, Paul Röttger, Barbara Plank

    Abstract: Multiple choice questions (MCQs) are commonly used to evaluate the capabilities of large language models (LLMs). One common way to evaluate the model response is to rank the candidate answers based on the log probability of the first token prediction. An alternative way is to examine the text output. Prior work has shown that first token probabilities lack robustness to changes in MCQ phrasing, an… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  26. arXiv:2404.06851  [pdf, other

    cs.CV

    UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion

    Authors: Junsheng Zhou, Weiqi Zhang, Baorui Ma, Kanle Shi, Yu-Shen Liu, Zhizhong Han

    Abstract: Diffusion models have shown remarkable results for image generation, editing and inpainting. Recent works explore diffusion models for 3D shape generation with neural implicit functions, i.e., signed distance function and occupancy function. However, they are limited to shapes with closed surfaces, which prevents them from generating diverse 3D real-world contents containing open surfaces. In this… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: To appear at CVPR2024. Project page: https://weiqi-zhang.github.io/UDiFF

  27. arXiv:2403.15472  [pdf, other

    cs.CY cs.AI cs.PL

    Enhancing Programming Education with ChatGPT: A Case Study on Student Perceptions and Interactions in a Python Course

    Authors: Boxaun Ma, Li Chen, Shin'ichi Konomi

    Abstract: The integration of ChatGPT as a supportive tool in education, notably in programming courses, addresses the unique challenges of programming education by providing assistance with debugging, code generation, and explanations. Despite existing research validating ChatGPT's effectiveness, its application in university-level programming education and a detailed understanding of student interactions a… ▽ More

    Submitted 5 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  28. arXiv:2403.11679  [pdf, ps, other

    cs.CV cs.RO

    NEDS-SLAM: A Novel Neural Explicit Dense Semantic SLAM Framework using 3D Gaussian Splatting

    Authors: Yiming Ji, Yang Liu, Guanghu Xie, Boyu Ma, Zongwu Xie

    Abstract: We propose NEDS-SLAM, an Explicit Dense semantic SLAM system based on 3D Gaussian representation, that enables robust 3D semantic mapping, accurate camera tracking, and high-quality rendering in real-time. In the system, we propose a Spatially Consistent Feature Fusion model to reduce the effect of erroneous estimates from pre-trained segmentation head on semantic reconstruction, achieving robust… ▽ More

    Submitted 1 April, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  29. arXiv:2403.10065  [pdf, other

    cs.CL

    Triple GNNs: Introducing Syntactic and Semantic Information for Conversational Aspect-Based Quadruple Sentiment Analysis

    Authors: Binbin Li, Yuqing Li, Siyu Jia, Bingnan Ma, Yu Ding, Zisen Qi, Xingbang Tan, Menghan Guo, Shenghui Liu

    Abstract: Conversational Aspect-Based Sentiment Analysis (DiaASQ) aims to detect quadruples \{target, aspect, opinion, sentiment polarity\} from given dialogues. In DiaASQ, elements constituting these quadruples are not necessarily confined to individual sentences but may span across multiple utterances within a dialogue. This necessitates a dual focus on both the syntactic information of individual utteran… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by CSCWD2024

  30. arXiv:2403.04626  [pdf, other

    eess.IV cs.CL cs.CV cs.LG

    MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder

    Authors: Lei Li, Tianfang Zhang, Xinglin Zhang, Jiaqi Liu, Bingqi Ma, Yan Luo, Tao Chen

    Abstract: Within the domain of medical analysis, extensive research has explored the potential of mutual learning between Masked Autoencoders(MAEs) and multimodal data. However, the impact of MAEs on intermodality remains a key challenge. We introduce MedFLIP, a Fast Language-Image Pre-training method for Medical analysis. We explore MAEs for zero-shot learning with crossed domains, which enhances the model… ▽ More

    Submitted 30 May, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  31. arXiv:2403.04309  [pdf, other

    cs.CV cs.AI

    AO-DETR: Anti-Overlapping DETR for X-Ray Prohibited Items Detection

    Authors: Mingyuan Li, Tong Jia, Hao Wang, Bowen Ma, Shuyang Lin, Da Cai, Dongyue Chen

    Abstract: Prohibited item detection in X-ray images is one of the most essential and highly effective methods widely employed in various security inspection scenarios. Considering the significant overlapping phenomenon in X-ray prohibited item images, we propose an Anti-Overlapping DETR (AO-DETR) based on one of the state-of-the-art general object detectors, DINO. Specifically, to address the feature coupli… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  32. arXiv:2403.03535  [pdf, other

    cs.CV cs.LG

    Task Attribute Distance for Few-Shot Learning: Theoretical Analysis and Applications

    Authors: Minyang Hu, Hong Chang, Zong Guo, Bingpeng Ma, Shiguan Shan, Xilin Chen

    Abstract: Few-shot learning (FSL) aims to learn novel tasks with very few labeled samples by leveraging experience from \emph{related} training tasks. In this paper, we try to understand FSL by delving into two key questions: (1) How to quantify the relationship between \emph{training} and \emph{novel} tasks? (2) How does the relationship affect the \emph{adaptation difficulty} on novel tasks for different… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  33. arXiv:2402.18397  [pdf, other

    cs.CL

    Decomposed Prompting: Unveiling Multilingual Linguistic Structure Knowledge in English-Centric Large Language Models

    Authors: Ercong Nie, Shuzhou Yuan, Bolei Ma, Helmut Schmid, Michael Färber, Frauke Kreuter, Hinrich Schütze

    Abstract: Despite the predominance of English in their training data, English-centric Large Language Models (LLMs) like GPT-3 and LLaMA display a remarkable ability to perform multilingual tasks, raising questions about the depth and nature of their cross-lingual capabilities. This paper introduces the decomposed prompting approach to probe the linguistic structure understanding of these LLMs in sequence la… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 18 pages, 7 figures

  34. arXiv:2402.14499  [pdf, other

    cs.CL

    "My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

    Authors: Xinpeng Wang, Bolei Ma, Chengzhi Hu, Leon Weber-Genzel, Paul Röttger, Frauke Kreuter, Dirk Hovy, Barbara Plank

    Abstract: The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging. One common evaluation approach uses multiple-choice questions (MCQ) to limit the response space. The model is then evaluated by ranking the candidate answers by the log probability of the first token prediction. However, first-tokens may not consistently reflect the final r… ▽ More

    Submitted 4 July, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Findings

  35. arXiv:2402.11700  [pdf, other

    cs.CL

    Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers

    Authors: Shuzhou Yuan, Ercong Nie, Bolei Ma, Michael Färber

    Abstract: Large Language Models (LLMs) possess outstanding capabilities in addressing various natural language processing (NLP) tasks. However, the sheer size of these models poses challenges in terms of storage, training and inference due to the inclusion of billions of parameters through layer stacking. While traditional approaches such as model pruning or distillation offer ways for reducing model size,… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: 6 pages, 2 figures

  36. arXiv:2401.16589  [pdf, other

    cs.CL

    ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks

    Authors: Bolei Ma, Ercong Nie, Shuzhou Yuan, Helmut Schmid, Michael Färber, Frauke Kreuter, Hinrich Schütze

    Abstract: Prompt-based methods have been successfully applied to multilingual pretrained language models for zero-shot cross-lingual understanding. However, most previous studies primarily focused on sentence-level classification tasks, and only a few considered token-level labeling tasks such as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging. In this paper, we propose Token-Level Prompt De… ▽ More

    Submitted 13 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: EACL 2024

  37. arXiv:2401.06139  [pdf, other

    q-fin.TR cs.LG

    Stockformer: A Price-Volume Factor Stock Selection Model Based on Wavelet Transform and Multi-Task Self-Attention Networks

    Authors: Bohan Ma, Yushan Xue, Yuan Lu, Jing Chen

    Abstract: As the Chinese stock market continues to evolve and its market structure grows increasingly complex, traditional quantitative trading methods are facing escalating challenges. Particularly, due to policy uncertainty and the frequent market fluctuations triggered by sudden economic events, existing models often struggle to accurately predict market dynamics. To address these challenges, this paper… ▽ More

    Submitted 17 June, 2024; v1 submitted 22 November, 2023; originally announced January 2024.

    Comments: Currently under consideration for publication in the Expert Systems With Applications

  38. arXiv:2401.01568  [pdf, other

    cs.CR cs.NI

    A Survey of Protocol Fuzzing

    Authors: Xiaohan Zhang, Cen Zhang, Xinghua Li, Zhengjie Du, Yuekang Li, Yaowen Zheng, Yeting Li, Bing Mao, Yang Liu, Robert H. Deng

    Abstract: Communication protocols form the bedrock of our interconnected world, yet vulnerabilities within their implementations pose significant security threats. Recent developments have seen a surge in fuzzing-based research dedicated to uncovering these vulnerabilities within protocol implementations. However, there still lacks a systematic overview of protocol fuzzing for answering the essential questi… ▽ More

    Submitted 3 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

  39. arXiv:2401.01207  [pdf, other

    cs.CV

    Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

    Authors: Renshuai Liu, Bowen Ma, Wei Zhang, Zhipeng Hu, Changjie Fan, Tangjie Lv, Yu Ding, Xuan Cheng

    Abstract: In human-centric content generation, the pre-trained text-to-image models struggle to produce user-wanted portrait images, which retain the identity of individuals while exhibiting diverse expressions. This paper introduces our efforts towards personalized face generation. To this end, we propose a novel multi-modal face generation framework, capable of simultaneous identity-expression control and… ▽ More

    Submitted 6 April, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

  40. arXiv:2312.15133  [pdf, other

    cs.CV

    Learning Continuous Implicit Field with Local Distance Indicator for Arbitrary-Scale Point Cloud Upsampling

    Authors: Shujuan Li, Junsheng Zhou, Baorui Ma, Yu-Shen Liu, Zhizhong Han

    Abstract: Point cloud upsampling aims to generate dense and uniformly distributed point sets from a sparse point cloud, which plays a critical role in 3D computer vision. Previous methods typically split a sparse point cloud into several local patches, upsample patch points, and merge all upsampled patches. However, these methods often produce holes, outliers or nonuniformity due to the splitting and mergin… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024. Project page: https://lisj575.github.io/APU-LDI

  41. arXiv:2312.14834  [pdf, other

    cs.CV

    Prototype-Guided Text-based Person Search based on Rich Chinese Descriptions

    Authors: Ziqiang Wu, Bingpeng Ma

    Abstract: Text-based person search aims to simultaneously localize and identify the target person based on query text from uncropped scene images, which can be regarded as the unified task of person detection and text-based person retrieval task. In this work, we propose a large-scale benchmark dataset named PRW-TPS-CN based on the widely used person search dataset PRW. Our dataset contains 47,102 sentences… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: 11 pages, 5 figures

  42. arXiv:2312.11825  [pdf, other

    cs.SD eess.AS

    MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation

    Authors: Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Jiaqi Yip, Dianwen Ng, Bin Ma

    Abstract: Our previously proposed MossFormer has achieved promising performance in monaural speech separation. However, it predominantly adopts a self-attention-based MossFormer module, which tends to emphasize longer-range, coarser-scale dependencies, with a deficiency in effectively modelling finer-scale recurrent patterns. In this paper, we introduce a novel hybrid model that provides the capabilities to… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 5 pages, 3 figures, accepted by ICASSP 2024

  43. arXiv:2312.07254  [pdf, other

    cs.CL

    The GUA-Speech System Description for CNVSRC Challenge 2023

    Authors: Shengqiang Li, Chao Lei, Baozhong Ma, Binbin Zhang, Fuping Pan

    Abstract: This study describes our system for Task 1 Single-speaker Visual Speech Recognition (VSR) fixed track in the Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023. Specifically, we use intermediate connectionist temporal classification (Inter CTC) residual modules to relax the conditional independence assumption of CTC in our model. Then we use a bi-transformer decoder to enable the… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: CNVSRC 2023 Challenge

  44. arXiv:2312.04060  [pdf, other

    cs.CV

    Differentiable Registration of Images and LiDAR Point Clouds with VoxelPoint-to-Pixel Matching

    Authors: Junsheng Zhou, Baorui Ma, Wenyuan Zhang, Yi Fang, Yu-Shen Liu, Zhizhong Han

    Abstract: Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic. Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks, and use Perspective-n-Points (PnP) to estimate rigid transformation during post-processing. However, these methods struggle to map points and pixe… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: To appear at NeurIPS2023 (Spotlight). Code is available at https://github.com/junshengzhou/VP2P-Match

  45. arXiv:2312.03017  [pdf, other

    cs.LG physics.optics

    AI-driven emergence of frequency information non-uniform distribution via THz metasurface spectrum prediction

    Authors: Xiaohua Xing, Yuqi Ren, Die Zou, Qiankun Zhang, Bingxuan Mao, Jianquan Yao, Deyi Xiong, Shuang Zhang, Liang Wu

    Abstract: Recently, artificial intelligence has been extensively deployed across various scientific disciplines, optimizing and guiding the progression of experiments through the integration of abundant datasets, whilst continuously probing the vast theoretical space encapsulated within the data. Particularly, deep learning models, due to their end-to-end adaptive learning capabilities, are capable of auton… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 11 pages, 4 figures

  46. arXiv:2312.02567  [pdf, other

    cs.CV

    Think Twice Before Selection: Federated Evidential Active Learning for Medical Image Analysis with Domain Shifts

    Authors: Jiayi Chen, Benteng Ma, Hengfei Cui, Yong Xia

    Abstract: Federated learning facilitates the collaborative learning of a global model across multiple distributed medical institutions without centralizing data. Nevertheless, the expensive cost of annotation on local clients remains an obstacle to effectively utilizing local data. To mitigate this issue, federated active learning methods suggest leveraging local and global model predictions to select a rel… ▽ More

    Submitted 22 April, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024

  47. arXiv:2311.17971  [pdf, other

    cs.CV

    GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation

    Authors: Baorui Ma, Haoge Deng, Junsheng Zhou, Yu-Shen Liu, Tiejun Huang, Xinlong Wang

    Abstract: Text-to-3D generation by distilling pretrained large-scale text-to-image diffusion models has shown great promise but still suffers from inconsistent 3D geometric structures (Janus problems) and severe artifacts. The aforementioned problems mainly stem from 2D diffusion models lacking 3D awareness during the lifting. In this work, we present GeoDream, a novel method that incorporates explicit gene… ▽ More

    Submitted 30 November, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: Code and Demo: https://github.com/baaivision/GeoDream

  48. arXiv:2311.14212  [pdf, other

    stat.ML cs.CL cs.LG stat.ME

    Annotation Sensitivity: Training Data Collection Methods Affect Model Performance

    Authors: Christoph Kern, Stephanie Eckman, Jacob Beck, Rob Chew, Bolei Ma, Frauke Kreuter

    Abstract: When training data are collected from human annotators, the design of the annotation instrument, the instructions given to annotators, the characteristics of the annotators, and their interactions can impact training data. This study demonstrates that design choices made when creating an annotation instrument also impact the models trained on the resulting annotations. We introduce the term annota… ▽ More

    Submitted 22 January, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023 Findings: https://aclanthology.org/2023.findings-emnlp.992/

  49. arXiv:2311.01357  [pdf, other

    cs.CV

    Robust Identity Perceptual Watermark Against Deepfake Face Swapping

    Authors: Tianyi Wang, Mengxiao Huang, Harry Cheng, Bin Ma, Yinglong Wang

    Abstract: Notwithstanding offering convenience and entertainment to society, Deepfake face swapping has caused critical privacy issues with the rapid development of deep generative models. Due to imperceptible artifacts in high-quality synthetic images, passive detection models against face swapping in recent years usually suffer performance damping regarding the generalizability issue. Therefore, several s… ▽ More

    Submitted 15 March, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: In peer review

  50. arXiv:2310.16364  [pdf, other

    cs.CV

    Towards Large-scale Masked Face Recognition

    Authors: Manyuan Zhang, Bingqi Ma, Guanglu Song, Yunxiao Wang, Hongsheng Li, Yu Liu

    Abstract: During the COVID-19 coronavirus epidemic, almost everyone is wearing masks, which poses a huge challenge for deep learning-based face recognition algorithms. In this paper, we will present our \textbf{championship} solutions in ICCV MFR WebFace260M and InsightFace unconstrained tracks. We will focus on four challenges in large-scale masked face recognition, i.e., super-large scale training, data n… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: the top1 solution for ICCV2021-MFR challenge