Skip to main content

Showing 1–50 of 78 results for author: Jiao, M

  1. arXiv:2407.08515  [pdf, other

    cs.CV cs.AI

    15M Multimodal Facial Image-Text Dataset

    Authors: Dawei Dai, YuTang Li, YingGe Liu, Mingming Jia, Zhang YuanHui, Guoyin Wang

    Abstract: Currently, image-text-driven multi-modal deep learning models have demonstrated their outstanding potential in many fields. In practice, tasks centered around facial images have broad application prospects. This paper presents \textbf{FaceCaption-15M}, a large-scale, diverse, and high-quality dataset of facial images accompanied by their natural language descriptions (facial image-to-text). This d… ▽ More

    Submitted 11 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 15 pages, 8 figures

  2. arXiv:2407.02501  [pdf, other

    cs.LG cs.CE eess.SY stat.AP

    Data-driven Power Flow Linearization: Theory

    Authors: Mengshuo Jia, Gabriela Hug, Ning Zhang, Zhaojian Wang, Yi Wang, Chongqing Kang

    Abstract: This two-part tutorial dives into the field of data-driven power flow linearization (DPFL), a domain gaining increased attention. DPFL stands out for its higher approximation accuracy, wide adaptability, and better ability to implicitly incorporate the latest system attributes. This renders DPFL a potentially superior option for managing the significant fluctuations from renewable energy sources,… ▽ More

    Submitted 10 June, 2024; originally announced July 2024.

    Comments: 20 pages

  3. arXiv:2406.17215  [pdf, other

    eess.SY cs.AI

    Enabling Large Language Models to Perform Power System Simulations with Previously Unseen Tools: A Case of Daline

    Authors: Mengshuo Jia, Zeyu Cui, Gabriela Hug

    Abstract: The integration of experiment technologies with large language models (LLMs) is transforming scientific research, offering AI capabilities beyond specialized problem-solving to becoming research assistants for human scientists. In power systems, simulations are essential for research. However, LLMs face significant challenges in power system simulations due to limited pre-existing knowledge and th… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: All the supplementary files mentioned in the manuscript will be open-source upon acceptance

  4. arXiv:2406.15677  [pdf, other

    cs.RO

    Open-vocabulary Pick and Place via Patch-level Semantic Maps

    Authors: Mingxi Jia, Haojie Huang, Zhewen Zhang, Chenghao Wang, Linfeng Zhao, Dian Wang, Jason Xinyu Liu, Robin Walters, Robert Platt, Stefanie Tellex

    Abstract: Controlling robots through natural language instructions in open-vocabulary scenarios is pivotal for enhancing human-robot collaboration and complex robot behavior synthesis. However, achieving this capability poses significant challenges due to the need for a system that can generalize from limited data to a wide range of tasks and environments. Existing methods rely on large, costly datasets and… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  5. arXiv:2406.14326  [pdf, other

    cs.CL

    medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs

    Authors: Mingyi Jia, Junwen Duan, Yan Song, Jianxin Wang

    Abstract: Electronic Medical Records (EMRs), while integral to modern healthcare, present challenges for clinical reasoning and diagnosis due to their complexity and information redundancy. To address this, we proposed medIKAL (Integrating Knowledge Graphs as Assistants of LLMs), a framework that combines Large Language Models (LLMs) with knowledge graphs (KGs) to enhance diagnostic capabilities. medIKAL as… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.12050  [pdf, other

    cs.CL

    Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

    Authors: Zhihan Zhang, Zhenwen Liang, Wenhao Yu, Dian Yu, Mengzhao Jia, Dong Yu, Meng Jiang

    Abstract: Supervised fine-tuning enhances the problem-solving abilities of language models across various mathematical reasoning tasks. To maximize such benefits, existing research focuses on broadening the training set with various data augmentation techniques, which is effective for standard single-round question-answering settings. Our work introduces a novel technique aimed at cultivating a deeper under… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  7. arXiv:2406.11740  [pdf, other

    cs.RO cs.AI cs.LG

    Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies

    Authors: Haojie Huang, Karl Schmeckpeper, Dian Wang, Ondrej Biza, Yaoyao Qian, Haotian Liu, Mingxi Jia, Robert Platt, Robin Walters

    Abstract: Humans can imagine goal states during planning and perform actions to match those goals. In this work, we propose Imagination Policy, a novel multi-task key-frame policy network for solving high-precision pick and place tasks. Instead of learning actions directly, Imagination Policy generates point clouds to imagine desired states which are then translated to actions using rigid action estimation.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.11213  [pdf, other

    cs.SE

    A Survey of AIOps for Failure Management in the Era of Large Language Models

    Authors: Lingzhe Zhang, Tong Jia, Mengxi Jia, Yifan Wu, Aiwei Liu, Yong Yang, Zhonghai Wu, Xuming Hu, Philip S. Yu, Ying Li

    Abstract: As software systems grow increasingly intricate, Artificial Intelligence for IT Operations (AIOps) methods have been widely used in software system failure management to ensure the high availability and reliability of large-scale distributed software systems. However, these methods still face several challenges, such as lack of cross-platform generality and cross-task flexibility. Fortunately, rec… ▽ More

    Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 35 pages

  9. Multivariate Log-based Anomaly Detection for Distributed Database

    Authors: Lingzhe Zhang, Tong Jia, Mengxi Jia, Ying Li, Yong Yang, Zhonghai Wu

    Abstract: Distributed databases are fundamental infrastructures of today's large-scale software systems such as cloud systems. Detecting anomalies in distributed databases is essential for maintaining software availability. Existing approaches, predominantly developed using Loghub-a comprehensive collection of log datasets from various systems-lack datasets specifically tailored to distributed databases, wh… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD'24

  10. arXiv:2406.06852  [pdf, other

    cs.CR cs.AI cs.CL

    A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures

    Authors: Shuai Zhao, Meihuizi Jia, Zhongliang Guo, Leilei Gan, Jie Fu, Yichao Feng, Fengjun Pan, Luu Anh Tuan

    Abstract: The large language models (LLMs), which bridge the gap between human language understanding and complex problem-solving, achieve state-of-the-art performance on several NLP tasks, particularly in few-shot and zero-shot settings. Despite the demonstrable efficacy of LMMs, due to constraints on computational resources, users have to engage with open-source language models or outsource the entire tra… ▽ More

    Submitted 13 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  11. arXiv:2405.18085  [pdf, other

    cs.SI cs.MA eess.SY

    Network Diffusion -- Framework to Simulate Spreading Processes in Complex Networks

    Authors: Michał Czuba, Mateusz Nurek, Damian Serwata, Yu-Xuan Qiu, Mingshan Jia, Katarzyna Musial, Radosław Michalski, Piotr Bródka

    Abstract: With the advancement of computational network science, its research scope has significantly expanded beyond static graphs to encompass more complex structures. The introduction of streaming, temporal, multilayer, and hypernetwork approaches has brought new possibilities and imposed additional requirements. For instance, by utilising these advancements, one can model structures such as social netwo… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: To be published in: Big Data Mining and Analytics (https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8254253)

  12. arXiv:2405.07283  [pdf, other

    cs.RO cs.CV

    BeautyMap: Binary-Encoded Adaptable Ground Matrix for Dynamic Points Removal in Global Maps

    Authors: Mingkai Jia, Qingwen Zhang, Bowen Yang, Jin Wu, Ming Liu, Patric Jensfelt

    Abstract: Global point clouds that correctly represent the static environment features can facilitate accurate localization and robust path planning. However, dynamic objects introduce undesired ghost tracks that are mixed up with the static environment. Existing dynamic removal methods normally fail to balance the performance in computational efficiency and accuracy. In response, we present BeautyMap to ef… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: The first two authors are co-first authors. 8 pages, accepted by RA-L

  13. arXiv:2404.14604  [pdf, other

    cs.CL

    Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training

    Authors: Mengzhao Jia, Zhihan Zhang, Wenhao Yu, Fangkai Jiao, Meng Jiang

    Abstract: Open-source multimodal large language models (MLLMs) excel in various tasks involving textual and visual inputs but still struggle with complex multimodal mathematical reasoning, lagging behind proprietary models like GPT-4V(ision) and Gemini-Pro. Although fine-tuning with intermediate steps (i.e., rationales) elicits some mathematical reasoning skills, the resulting models still fall short in vis… ▽ More

    Submitted 25 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  14. arXiv:2404.11576  [pdf, other

    cs.CV

    State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend

    Authors: Fei Cui, Jiaojiao Fang, Xiaojiang Wu, Zelong Lai, Mengke Yang, Menghan Jia, Guizhong Liu

    Abstract: Stochastic video prediction enables the consideration of uncertainty in future motion, thereby providing a better reflection of the dynamic nature of the environment. Stochastic video prediction methods based on image auto-regressive recurrent models need to feed their predictions back into the latent space. Conversely, the state-space models, which decouple frame synthesis and temporal prediction… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  15. arXiv:2404.09408  [pdf, other

    cs.NI

    A Distributed Scalable Cross-chain State Channel Scheme Based on Recursive State Synchronization

    Authors: Xinyu Liang, Ruiying Du, Jing Chen, Yu Zhang, Meng Jia, Shuangxi Cao, Yufeng Wei, Shixiong Yao

    Abstract: As cross-chain technology continues to advance, the scale of cross-chain transactions is experiencing significant expansion. To improve scalability, researchers have turned to the study of cross-chain state channels. However, most of the existing schemes rely on trusted parties to support channel operations. To address this issue, we present Interpipe: a distributed cross-chain state channel schem… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  16. arXiv:2404.05726  [pdf, other

    cs.CV

    MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

    Authors: Bo He, Hengduo Li, Young Kyun Jang, Menglin Jia, Xuefei Cao, Ashish Shah, Abhinav Shrivastava, Ser-Nam Lim

    Abstract: With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently. However, existing LLM-based large multimodal models (e.g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding. In this study, we mainly focus on designing an efficient and effective… ▽ More

    Submitted 24 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024. Project Page https://boheumd.github.io/MA-LMM/

  17. arXiv:2404.02439  [pdf, other

    cs.HC

    A neuroergonomics model to evaluating nuclear power plants operators' performance under heat stress driven by ECG time-frequency spectrums and fNIRS prefrontal cortex network: a CNN-GAT fusion model

    Authors: Yan Zhang, Ming Jia, Meng Li, JianYu Wang, XiangMin Hu, ZhiHui Xu, Tao Chen

    Abstract: Operators experience complicated physiological and psychological states when exposed to extreme heat stress, which can impair cognitive function and decrease performance significantly, ultimately leading to severe secondary disasters. Therefore, there is an urgent need for a feasible technique to identify their abnormal states to enhance the reliability of human-cybernetics systems. With the advan… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  18. arXiv:2403.01449  [pdf, other

    cs.RO cs.CV

    DUFOMap: Efficient Dynamic Awareness Mapping

    Authors: Daniel Duberg, Qingwen Zhang, MingKai Jia, Patric Jensfelt

    Abstract: The dynamic nature of the real world is one of the main challenges in robotics. The first step in dealing with it is to detect which parts of the world are dynamic. A typical benchmark task is to create a map that contains only the static part of the world to support, for example, localization and planning. Current solutions are often applied in post-processing, where parameter tuning allows the u… ▽ More

    Submitted 12 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: The first two authors hold equal contribution. 8 pages, 7 figures, project page https://kth-rpl.github.io/dufomap

  19. arXiv:2402.18107  [pdf, other

    cs.MM

    Multimodal Interaction Modeling via Self-Supervised Multi-Task Learning for Review Helpfulness Prediction

    Authors: HongLin Gong, Mengzhao Jia, Liqiang Jing

    Abstract: In line with the latest research, the task of identifying helpful reviews from a vast pool of user-generated textual and visual data has become a prominent area of study. Effective modal representations are expected to possess two key attributes: consistency and differentiation. Current methods designed for Multimodal Review Helpfulness Prediction (MRHP) face limitations in capturing distinctive i… ▽ More

    Submitted 25 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 10 pages,4 figures, 4 tables

  20. arXiv:2402.12168  [pdf, other

    cs.CR cs.AI cs.CL

    Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning

    Authors: Shuai Zhao, Leilei Gan, Luu Anh Tuan, Jie Fu, Lingjuan Lyu, Meihuizi Jia, Jinming Wen

    Abstract: Recently, various parameter-efficient fine-tuning (PEFT) strategies for application to language models have been proposed and successfully implemented. However, this raises the question of whether PEFT, which only updates a limited set of model parameters, constitutes security vulnerabilities when confronted with weight-poisoning backdoor attacks. In this study, we show that PEFT is more susceptib… ▽ More

    Submitted 29 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: NAACL Findings 2024

  21. arXiv:2401.06789  [pdf

    cs.IR cs.AI cs.CL cs.LG

    Information Retrieval and Classification of Real-Time Multi-Source Hurricane Evacuation Notices

    Authors: Tingting Zhao, Shubo Tian, Jordan Daly, Melissa Geiger, Minna Jia, Jinfeng Zhang

    Abstract: For an approaching disaster, the tracking of time-sensitive critical information such as hurricane evacuation notices is challenging in the United States. These notices are issued and distributed rapidly by numerous local authorities that may spread across multiple states. They often undergo frequent updates and are distributed through diverse online portals lacking standard formats. In this study… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  22. arXiv:2401.05949  [pdf, other

    cs.CL cs.AI cs.CR

    Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning

    Authors: Shuai Zhao, Meihuizi Jia, Luu Anh Tuan, Fengjun Pan, Jinming Wen

    Abstract: In-context learning, a paradigm bridging the gap between pre-training and fine-tuning, has demonstrated high efficacy in several NLP tasks, especially in few-shot settings. Despite being widely applied, in-context learning is vulnerable to malicious attacks. In this work, we raise security concerns regarding this paradigm. Our studies demonstrate that an attacker can manipulate the behavior of lar… ▽ More

    Submitted 16 February, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  23. arXiv:2312.15162  [pdf, other

    cs.CV

    Cycle-Consistency Learning for Captioning and Grounding

    Authors: Ning Wang, Jiajun Deng, Mingbo Jia

    Abstract: We present that visual grounding and image captioning, which perform as two mutually inverse processes, can be bridged together for collaborative training by careful designs. By consolidating this idea, we introduce CyCo, a cyclic-consistent learning framework to ameliorate the independent training pipelines of visual grounding and image captioning. The proposed framework (1) allows the semi-weakl… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: To appear in AAAI 2024

  24. arXiv:2312.10493  [pdf, other

    cs.CL cs.MM

    Debiasing Multimodal Sarcasm Detection with Contrastive Learning

    Authors: Mengzhao Jia, Can Xie, Liqiang Jing

    Abstract: Despite commendable achievements made by existing work, prevailing multimodal sarcasm detection studies rely more on textual content over visual information. It unavoidably induces spurious correlations between textual words and labels, thereby significantly hindering the models' generalization capability. To address this problem, we define the task of out-of-distribution (OOD) multimodal sarcasm… ▽ More

    Submitted 19 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

  25. arXiv:2311.08711  [pdf, other

    cs.CL

    PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning

    Authors: Zhihan Zhang, Dong-Ho Lee, Yuwei Fang, Wenhao Yu, Mengzhao Jia, Meng Jiang, Francesco Barbieri

    Abstract: Instruction tuning has remarkably advanced large language models (LLMs) in understanding and responding to diverse human instructions. Despite the success in high-resource languages, its application in lower-resource ones faces challenges due to the imbalanced foundational abilities of LLMs across different languages, stemming from the uneven language distribution in their pre-training data. To ta… ▽ More

    Submitted 11 February, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  26. arXiv:2311.01477  [pdf, other

    cs.CV

    FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models

    Authors: Liqiang Jing, Ruosen Li, Yunmo Chen, Mengzhao Jia, Xinya Du

    Abstract: We introduce FAITHSCORE (Faithfulness to Atomic Image Facts Score), a reference-free and fine-grained evaluation metric that measures the faithfulness of the generated free-form answers from large vision-language models (LVLMs). The FAITHSCORE evaluation first identifies sub-sentences containing descriptive statements that need to be verified, then extracts a comprehensive list of atomic facts fro… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  27. arXiv:2310.09987  [pdf, other

    cs.SI

    Network Disruption via Continuous Batch Removal: The Case of Sicilian Mafia

    Authors: Mingshan Jia, Pasquale De Meo, Bogdan Gabrys, Katarzyna Musial

    Abstract: Network disruption is pivotal in understanding the robustness and vulnerability of complex networks, which is instrumental in devising strategies for infrastructure protection, epidemic control, cybersecurity, and combating crime. In this paper, with a particular focus on disrupting criminal networks, we proposed to impose a within-the-largest-connected-component constraint in a continuous batch r… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

  28. arXiv:2310.07700  [pdf, other

    cs.CL

    Knowledge-enhanced Memory Model for Emotional Support Conversation

    Authors: Mengzhao Jia, Qianglong Chen, Liqiang Jing, Dawei Fu, Renyu Li

    Abstract: The prevalence of mental disorders has become a significant issue, leading to the increased focus on Emotional Support Conversation as an effective supplement for mental health support. Existing methods have achieved compelling results, however, they still face three challenges: 1) variability of emotions, 2) practicality of the response, and 3) intricate strategy modeling. To address these challe… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  29. arXiv:2309.13504  [pdf, other

    eess.AS cs.SD

    Attention Is All You Need For Blind Room Volume Estimation

    Authors: Chunxi Wang, Maoshen Jia, Meiran Li, Changchun Bao, Wenyu Jin

    Abstract: In recent years, dynamic parameterization of acoustic environments has raised increasing attention in the field of audio processing. One of the key parameters that characterize the local room acoustics in isolation from orientation and directivity of sources and receivers is the geometric room volume. Convolutional neural networks (CNNs) have been widely selected as the main models for conducting… ▽ More

    Submitted 27 December, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: 5 pages, 4 figures, to be published in proceedings of ICASSP 2024

  30. arXiv:2309.11653  [pdf, other

    cs.HC cs.AI cs.CR

    "It's a Fair Game", or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents

    Authors: Zhiping Zhang, Michelle Jia, Hao-Ping Lee, Bingsheng Yao, Sauvik Das, Ada Lerner, Dakuo Wang, Tianshi Li

    Abstract: The widespread use of Large Language Model (LLM)-based conversational agents (CAs), especially in high-stakes domains, raises many privacy concerns. Building ethical LLM-based CAs that respect user privacy requires an in-depth understanding of the privacy risks that concern users the most. However, existing research, primarily model-centered, does not provide insight into users' perspectives. To b… ▽ More

    Submitted 1 April, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: 26 pages, 5 figures

  31. arXiv:2309.08396  [pdf, other

    cs.IT

    Resource Optimization Using A Step-by-step Scheme in Wireless Sensing and Localization Networks

    Authors: Ruihang Zhang, Jiayan Yang, Mu Jia, Tingting Zhang

    Abstract: Due to the lack of wireless spectrum resources, people are focusing on the versatile wireless networks. Wireless localization and target sensing both rely on precise extraction of parameters such as signal amplitude, propagation delay and Doppler shift from the received signals. Due to the high multi-path resolution and strong penetration of UWB signals, both localization and sensing can be achiev… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: 28 pages, 12 figures

  32. Robust Transceiver Design for Covert Integrated Sensing and Communications With Imperfect CSI

    Authors: Yuchen Zhang, Wanli Ni, Jianquan Wang, Wanbin Tang, Min Jia, Yonina C. Eldar, Dusit Niyato

    Abstract: We propose a robust transceiver design for a covert integrated sensing and communications (ISAC) system with imperfect channel state information (CSI). Considering both bounded and probabilistic CSI error models, we formulate worst-case and outage-constrained robust optimization problems of joint trasceiver beamforming and radar waveform design to balance the radar performance of multiple targets… ▽ More

    Submitted 28 November, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: This work has been submitted to IEEE journal for publication

    Journal ref: IEEE Transactions on Communications, 2024

  33. arXiv:2308.08961  [pdf

    cs.SE

    On the Evaluation of Neural Code Translation: Taxonomy and Benchmark

    Authors: Mingsheng Jiao, Tingrui Yu, Xuan Li, Guanjie Qiu, Xiaodong Gu, Beijun Shen

    Abstract: In recent years, neural code translation has gained increasing attention. While most of the research focuses on improving model architectures and training processes, we notice that the evaluation process and benchmark for code translation models are severely limited: they primarily treat source code as natural languages and provide a holistic accuracy score while disregarding the full spectrum of… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: accepted by ASE2023

  34. arXiv:2307.07260  [pdf, other

    cs.RO cs.AI

    A Dynamic Points Removal Benchmark in Point Cloud Maps

    Authors: Qingwen Zhang, Daniel Duberg, Ruoyu Geng, Mingkai Jia, Lujia Wang, Patric Jensfelt

    Abstract: In the field of robotics, the point cloud has become an essential map representation. From the perspective of downstream tasks like localization and global path planning, points corresponding to dynamic objects will adversely affect their performance. Existing methods for removing dynamic points in point clouds often lack clarity in comparative evaluations and comprehensive analysis. Therefore, we… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Code check https://github.com/KTH-RPL/DynamicMap_Benchmark.git , 7 pages, accepted by ITSC 2023

  35. arXiv:2307.03166  [pdf, other

    cs.CV

    VideoGLUE: Video General Understanding Evaluation of Foundation Models

    Authors: Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

    Abstract: We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task. Moreover, we propose a scalar VideoG… ▽ More

    Submitted 1 December, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: Fixes some typos and include project open-source page: https://github.com/tensorflow/models/tree/master/official/projects/videoglue

  36. arXiv:2306.16650  [pdf, other

    cs.CL cs.AI

    Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation

    Authors: Liqiang Jing, Xuemeng Song, Kun Ouyang, Mengzhao Jia, Liqiang Nie

    Abstract: Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, which aims to generate a natural language sentence for a multimodal social post (an image as well as its caption) to explain why it contains sarcasm. Although the existing pioneer study has achieved great success with the BART backbone, it overlooks the gap between the visual feature space and the decoder semantic space, the obje… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Accepted by ACL 2023 main conference

    Journal ref: ACL 2023

  37. arXiv:2306.05236  [pdf, other

    cs.CV

    Population-Based Evolutionary Gaming for Unsupervised Person Re-identification

    Authors: Yunpeng Zhai, Peixi Peng, Mengxi Jia, Shiyong Li, Weiqiang Chen, Xuesong Gao, Yonghong Tian

    Abstract: Unsupervised person re-identification has achieved great success through the self-improvement of individual neural networks. However, limited by the lack of diversity of discriminant information, a single network has difficulty learning sufficient discrimination ability by itself under unsupervised conditions. To address this limit, we develop a population-based evolutionary gaming (PEG) framework… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted in IJCV

  38. arXiv:2306.03881  [pdf, other

    cs.CV

    Emergent Correspondence from Image Diffusion

    Authors: Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, Bharath Hariharan

    Abstract: Finding correspondences between images is a fundamental problem in computer vision. In this paper, we show that correspondence emerges in image diffusion models without any explicit supervision. We propose a simple strategy to extract this implicit knowledge out of diffusion networks as image features, namely DIffusion FeaTures (DIFT), and use them to establish correspondences between real images.… ▽ More

    Submitted 6 December, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023. Project page: https://diffusionfeatures.github.io

  39. arXiv:2303.07223  [pdf, other

    cs.CV

    PromptFusion: Decoupling Stability and Plasticity for Continual Learning

    Authors: Haoran Chen, Zuxuan Wu, Xintong Han, Menglin Jia, Yu-Gang Jiang

    Abstract: Current research on continual learning mainly focuses on relieving catastrophic forgetting, and most of their success is at the cost of limiting the performance of newly incoming tasks. Such a trade-off is referred to as the stability-plasticity dilemma and is a more general and challenging problem for continual learning. However, the inherent conflict between these two concepts makes it seemingly… ▽ More

    Submitted 10 July, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: ECCV 2024 camera-ready version

  40. arXiv:2303.04745  [pdf, other

    cs.LG stat.ML

    A General Theory of Correct, Incorrect, and Extrinsic Equivariance

    Authors: Dian Wang, Xupeng Zhu, Jung Yeon Park, Mingxi Jia, Guanang Su, Robert Platt, Robin Walters

    Abstract: Although equivariant machine learning has proven effective at many tasks, success depends heavily on the assumption that the ground truth function is symmetric over the entire domain matching the symmetry in an equivariant neural network. A missing piece in the equivariant learning literature is the analysis of equivariant networks when symmetry exists only partially in the domain. In this work, w… ▽ More

    Submitted 28 October, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

    Comments: Published at NeurIPS 2023

  41. A Network Science perspective of Graph Convolutional Networks: A survey

    Authors: Mingshan Jia, Bogdan Gabrys, Katarzyna Musial

    Abstract: The mining and exploitation of graph structural information have been the focal points in the study of complex networks. Traditional structural measures in Network Science focus on the analysis and modelling of complex networks from the perspective of network structure, such as the centrality measures, the clustering coefficient, and motifs and graphlets, and they have become basic tools for study… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

  42. arXiv:2301.02403  [pdf, other

    cs.CV

    CyberLoc: Towards Accurate Long-term Visual Localization

    Authors: Liu Liu, Yukai Lin, Xiao Liang, Qichao Xu, Miao Jia, Yangdong Liu, Yuxiang Wen, Wei Luo, Jiangwei Li

    Abstract: This technical report introduces CyberLoc, an image-based visual localization pipeline for robust and accurate long-term pose estimation under challenging conditions. The proposed method comprises four modules connected in a sequence. First, a mapping module is applied to build accurate 3D maps of the scene, one map for each reference sequence if there exist multiple reference sequences under diff… ▽ More

    Submitted 6 January, 2023; originally announced January 2023.

    Comments: MLAD-ECCV 2022

  43. arXiv:2212.08985  [pdf, other

    cs.CV

    Efficient Image Captioning for Edge Devices

    Authors: Ning Wang, Jiangrong Xie, Hang Luo, Qinglin Cheng, Jihao Wu, Mingbo Jia, Linlin Li

    Abstract: Recent years have witnessed the rapid progress of image captioning. However, the demands for large memory storage and heavy computational burden prevent these captioning models from being deployed on mobile devices. The main obstacles lie in the heavyweight visual feature extractors (i.e., object detectors) and complicated cross-modal fusion networks. To this end, we propose LightCap, a lightweigh… ▽ More

    Submitted 17 December, 2022; originally announced December 2022.

    Comments: To appear in AAAI 2023

  44. arXiv:2212.01803  [pdf, other

    cs.CV

    Controllable Image Captioning via Prompting

    Authors: Ning Wang, Jiahao Xie, Jihao Wu, Mingbo Jia, Linlin Li

    Abstract: Despite the remarkable progress of image captioning, existing captioners typically lack the controllable capability to generate desired image captions, e.g., describing the image in a rough or detailed manner, in a factual or emotional view, etc. In this paper, we show that a unified model is qualified to perform well in diverse domains and freely switch among multiple styles. Such a controllable… ▽ More

    Submitted 4 December, 2022; originally announced December 2022.

    Comments: To appear in AAAI 2023

  45. arXiv:2211.14739  [pdf, other

    cs.CV cs.AI cs.CL

    MNER-QG: An End-to-End MRC framework for Multimodal Named Entity Recognition with Query Grounding

    Authors: Meihuizi Jia, Lei Shen, Xin Shen, Lejian Liao, Meng Chen, Xiaodong He, Zhendong Chen, Jiaqi Li

    Abstract: Multimodal named entity recognition (MNER) is a critical step in information extraction, which aims to detect entity spans and classify them to corresponding entity types given a sentence-image pair. Existing methods either (1) obtain named entities with coarse-grained visual clues from attention mechanisms, or (2) first detect fine-grained visual regions with toolkits and then recognize named ent… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: 13 pages, 6 figures, published to AAAI

  46. arXiv:2211.00194  [pdf, other

    cs.RO

    SEIL: Simulation-augmented Equivariant Imitation Learning

    Authors: Mingxi Jia, Dian Wang, Guanang Su, David Klee, Xupeng Zhu, Robin Walters, Robert Platt

    Abstract: In robotic manipulation, acquiring samples is extremely expensive because it often requires interacting with the real world. Traditional image-level data augmentation has shown the potential to improve sample efficiency in various machine learning tasks. However, image-level data augmentation is insufficient for an imitation learning agent to learn good manipulation policies in a reasonable amount… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

  47. arXiv:2209.08826  [pdf, other

    cs.OS cs.SE

    Rapid Recovery of Program Execution Under Power Failures for Embedded Systems with NVM

    Authors: Min Jia, Edwin Hsing. -M. Sha, Qingfeng Zhuge, Rui Xu, Shouzhen Gu

    Abstract: After power is switched on, recovering the interrupted program from the initial state can cause negative impact. Some programs are even unrecoverable. To rapid recovery of program execution under power failures, the execution states of checkpoints are backed up by NVM under power failures for embedded systems with NVM. However, frequent checkpoints will shorten the lifetime of the NVM and incur si… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: This paper has been accepted for publication to Microprocessors and Microsystems in March 15, 2021

  48. arXiv:2209.00495  [pdf, other

    cs.CL cs.LG cs.SI

    Searching for Structure in Unfalsifiable Claims

    Authors: Peter Ebert Christensen, Frederik Warburg, Menglin Jia, Serge Belongie

    Abstract: Social media platforms give rise to an abundance of posts and comments on every topic imaginable. Many of these posts express opinions on various aspects of society, but their unfalsifiable nature makes them ill-suited to fact-checking pipelines. In this work, we aim to distill such posts into a small set of narratives that capture the essential claims related to a given topic. Understanding and v… ▽ More

    Submitted 19 August, 2022; originally announced September 2022.

    Comments: 30 pages, 9 main Figures, 5 main Tables Website: https://captaine.github.io/Searching-for-Structure-in-Unfalsifiable-Claims/ Github repo: https://github.com/captainE/Searching-for-Structure-in-Unfalsifiable-Claims

  49. arXiv:2207.07734  [pdf, other

    q-bio.GN cs.AI cs.GL

    COEM: Cross-Modal Embedding for MetaCell Identification

    Authors: Haiyi Mao, Minxue Jia, Jason Xiaotian Dou, Haotian Zhang, Panayiotis V. Benos

    Abstract: Metacells are disjoint and homogeneous groups of single-cell profiles, representing discrete and highly granular cell states. Existing metacell algorithms tend to use only one modality to infer metacells, even though single-cell multi-omics datasets profile multiple molecular modalities within the same cell. Here, we present \textbf{C}ross-M\textbf{O}dal \textbf{E}mbedding for \textbf{M}etaCell Id… ▽ More

    Submitted 24 July, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

    Comments: 5 pages, 2 figures, ICML workshop on computational biology

  50. arXiv:2205.14292  [pdf, other

    cs.RO

    BulletArm: An Open-Source Robotic Manipulation Benchmark and Learning Framework

    Authors: Dian Wang, Colin Kohler, Xupeng Zhu, Mingxi Jia, Robert Platt

    Abstract: We present BulletArm, a novel benchmark and learning-environment for robotic manipulation. BulletArm is designed around two key principles: reproducibility and extensibility. We aim to encourage more direct comparisons between robotic learning methods by providing a set of standardized benchmark tasks in simulation alongside a collection of baseline algorithms. The framework consists of 31 differe… ▽ More

    Submitted 17 October, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

    Comments: Published at ISRR 2022