Skip to main content

Showing 1–50 of 682 results for author: Jin, L

  1. arXiv:2407.09796  [pdf, other

    math.CO cs.DM

    Information dissemination and confusion in signed networks

    Authors: Ligang Jin, Eckhard Steffen

    Abstract: We introduce a model of information dissemination in signed networks. It is a discrete-time process in which uninformed actors incrementally receive information from their informed neighbors or from the outside. Our goal is to minimize the number of confused actors - that is, the number of actors who receive contradictory information. We prove upper bounds for the number of confused actors in sign… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    MSC Class: 05C22; 05C57; 91D30 ACM Class: J.4

  2. arXiv:2407.08978  [pdf, other

    cs.CL cs.LG

    Towards Chapter-to-Chapter Context-Aware Literary Translation via Large Language Models

    Authors: Linghao Jin, Li An, Xuezhe Ma

    Abstract: Discourse phenomena in existing document-level translation datasets are sparse, which has been a fundamental obstacle in the development of context-aware machine translation models. Moreover, most existing document-level corpora and context-aware machine translation methods rely on an unrealistic assumption on sentence-level alignments. To mitigate these issues, we first curate a novel dataset of… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Preprint

  3. arXiv:2407.06115  [pdf, other

    cs.CV cs.AI cs.CL

    Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline

    Authors: Qi Jia, Baoyu Fan, Cong Xu, Lu Liu, Liang Jin, Guoguang Du, Zhenhua Guo, Yaqian Zhao, Xuanjing Huang, Rengang Li

    Abstract: Existing video multi-modal sentiment analysis mainly focuses on the sentiment expression of people within the video, yet often neglects the induced sentiment of viewers while watching the videos. Induced sentiment of viewers is essential for inferring the public response to videos, has broad application in analyzing public societal sentiment, effectiveness of advertising and other areas. The micro… ▽ More

    Submitted 15 May, 2024; originally announced July 2024.

  4. arXiv:2407.04213  [pdf

    cs.CR cs.NI

    Pathfinder: Exploring Path Diversity for Assessing Internet Censorship Inconsistency

    Authors: Xiaoqin Liang, Guannan Liu, Lin Jin, Shuai Hao, Haining Wang

    Abstract: Internet censorship is typically enforced by authorities to achieve information control for a certain group of Internet users. So far existing censorship studies have primarily focused on country-level characterization because (1) in many cases, censorship is enabled by governments with nationwide policies and (2) it is usually hard to control how the probing packets are routed to trigger censorsh… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  5. arXiv:2407.03937  [pdf, other

    cs.CL

    TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models

    Authors: Jiahuan Cao, Dezhi Peng, Peirong Zhang, Yongxin Shi, Yang Liu, Kai Ding, Lianwen Jin

    Abstract: Classical Chinese is a gateway to the rich heritage and wisdom of ancient China, yet its complexities pose formidable comprehension barriers for most modern people without specialized knowledge. While Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), they struggle with Classical Chinese Understanding (CCU), especially in data-demanding and knowle… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  6. arXiv:2407.03632  [pdf, other

    cs.CV

    CLASH: Complementary Learning with Neural Architecture Search for Gait Recognition

    Authors: Huanzhang Dou, Pengyi Zhang, Yuhan Zhao, Lu Jin, Xi Li

    Abstract: Gait recognition, which aims at identifying individuals by their walking patterns, has achieved great success based on silhouette. The binary silhouette sequence encodes the walking pattern within the sparse boundary representation. Therefore, most pixels in the silhouette are under-sensitive to the walking pattern since the sparse boundary lacks dense spatial-temporal information, which is suitab… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  7. arXiv:2407.02716  [pdf, other

    cs.CV cs.LG

    Light-weight Fine-tuning Method for Defending Adversarial Noise in Pre-trained Medical Vision-Language Models

    Authors: Xu Han, Linghao Jin, Xuezhe Ma, Xiaofeng Liu

    Abstract: Fine-tuning pre-trained Vision-Language Models (VLMs) has shown remarkable capabilities in medical image and textual depiction synergy. Nevertheless, many pre-training datasets are restricted by patient privacy concerns, potentially containing noise that can adversely affect downstream performance. Moreover, the growing reliance on multi-modal generation exacerbates this issue because of its susce… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  8. arXiv:2406.19101  [pdf, other

    cs.CV

    DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming

    Authors: Jiaxin Zhang, Wentao Yang, Songxuan Lai, Zecheng Xie, Lianwen Jin

    Abstract: Current multimodal large language models (MLLMs) face significant challenges in visual document understanding (VDU) tasks due to the high resolution, dense text, and complex layouts typical of document images. These characteristics demand a high level of detail perception ability from MLLMs. While increasing input resolution improves detail perception, it also leads to longer sequences of visual t… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  9. arXiv:2406.18916  [pdf, other

    cs.CL cs.AI

    TrustUQA: A Trustful Framework for Unified Structured Data Question Answering

    Authors: Wen Zhang, Long Jin, Yushan Zhu, Jiaoyan Chen, Zhiwei Huang, Junjie Wang, Yin Hua, Lei Liang, Huajun Chen

    Abstract: Natural language question answering (QA) over structured data sources such as tables and knowledge graphs (KGs) have been widely investigated, for example with Large Language Models (LLMs). The main solutions include question to formal query parsing and retrieval-based answer generation. However, current methods of the former often suffer from weak generalization, failing to dealing with multiple… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  10. arXiv:2406.18045  [pdf, other

    cs.CL cs.AI

    PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry

    Authors: Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, Chaobo Xu, Ran Hu, Licong Xu, Qijun Cai, Haoran Hua, Jing Sun, Jin Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yufu Wang, Lin Tie, Chaochao Wang , et al. (11 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general purpo… ▽ More

    Submitted 9 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  11. arXiv:2406.12304  [pdf, other

    cs.CL

    COT: A Generative Approach for Hate Speech Counter-Narratives via Contrastive Optimal Transport

    Authors: Linhao Zhang, Li Jin, Guangluan Xu, Xiaoyu Li, Xian Sun

    Abstract: Counter-narratives, which are direct responses consisting of non-aggressive fact-based arguments, have emerged as a highly effective approach to combat the proliferation of hate speech. Previous methodologies have primarily focused on fine-tuning and post-editing techniques to ensure the fluency of generated contents, while overlooking the critical aspects of individualization and relevance concer… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: IEEE jounrnals

    MSC Class: 68U15 ACM Class: I.2.7

  12. arXiv:2406.05131  [pdf, other

    cs.CV eess.IV

    DVOS: Self-Supervised Dense-Pattern Video Object Segmentation

    Authors: Keyhan Najafian, Farhad Maleki, Ian Stavness, Lingling Jin

    Abstract: Video object segmentation approaches primarily rely on large-scale pixel-accurate human-annotated datasets for model development. In Dense Video Object Segmentation (DVOS) scenarios, each video frame encompasses hundreds of small, dense, and partially occluded objects. Accordingly, the labor-intensive manual annotation of even a single frame often takes hours, which hinders the development of DVOS… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  13. arXiv:2406.03019  [pdf, other

    cs.CV

    Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction

    Authors: Pengjie Wang, Kaile Zhang, Xinyu Wang, Shengwei Han, Yongge Liu, Lianwen Jin, Xiang Bai, Yuliang Liu

    Abstract: Oracle Bone Inscriptions is one of the oldest existing forms of writing in the world. However, due to the great antiquity of the era, a large number of Oracle Bone Inscriptions (OBI) remain undeciphered, making it one of the global challenges in the field of paleography today. This paper introduces a novel approach, namely Puzzle Pieces Picker (P$^3$), to decipher these enigmatic characters throug… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICDAR 2024

  14. arXiv:2406.00684  [pdf, other

    cs.CV cs.CL

    Deciphering Oracle Bone Language with Diffusion Models

    Authors: Haisu Guan, Huanxin Yang, Xinyu Wang, Shengwei Han, Yongge Liu, Lianwen Jin, Xiang Bai, Yuliang Liu

    Abstract: Originating from China's Shang Dynasty approximately 3,000 years ago, the Oracle Bone Script (OBS) is a cornerstone in the annals of linguistic history, predating many established writing systems. Despite the discovery of thousands of inscriptions, a vast expanse of OBS remains undeciphered, casting a veil of mystery over this ancient language. The emergence of modern AI technologies presents a no… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: ACL2024 main conference long paper

  15. arXiv:2405.20136  [pdf, other

    cs.CV

    A Multimodal Dangerous State Recognition and Early Warning System for Elderly with Intermittent Dementia

    Authors: Liyun Deng, Lei Jin, Guangcheng Wang, Quan Shi, Han Wang

    Abstract: In response to the social issue of the increasing number of elderly vulnerable groups going missing due to the aggravating aging population in China, our team has developed a wearable anti-loss device and intelligent early warning system for elderly individuals with intermittent dementia using artificial intelligence and IoT technology. This system comprises an anti-loss smart helmet, a cloud comp… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 13 pages,9 figures

  16. arXiv:2405.17732  [pdf, other

    cs.CL

    C$^{3}$Bench: A Comprehensive Classical Chinese Understanding Benchmark for Large Language Models

    Authors: Jiahuan Cao, Yongxin Shi, Dezhi Peng, Yang Liu, Lianwen Jin

    Abstract: Classical Chinese Understanding (CCU) holds significant value in preserving and exploration of the outstanding traditional Chinese culture. Recently, researchers have attempted to leverage the potential of Large Language Models (LLMs) for CCU by capitalizing on their remarkable comprehension and semantic capabilities. However, no comprehensive benchmark is available to assess the CCU capabilities… ▽ More

    Submitted 30 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  17. arXiv:2405.17188  [pdf, other

    cs.CV

    The SkatingVerse Workshop & Challenge: Methods and Results

    Authors: Jian Zhao, Lei Jin, Jianshu Li, Zheng Zhu, Yinglei Teng, Jiaojiao Zhao, Sadaf Gulshad, Zheng Wang, Bo Zhao, Xiangbo Shu, Yunchao Wei, Xuecheng Nie, Xiaojie Jin, Xiaodan Liang, Shin'ichi Satoh, Yandong Guo, Cewu Lu, Junliang Xing, Jane Shen Shengmei

    Abstract: The SkatingVerse Workshop & Challenge aims to encourage research in developing novel and accurate methods for human action understanding. The SkatingVerse dataset used for the SkatingVerse Challenge has been publicly released. There are two subsets in the dataset, i.e., the training subset and testing subset. The training subsets consists of 19,993 RGB video sequences, and the testing subsets cons… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  18. arXiv:2405.11502  [pdf, other

    cond-mat.mtrl-sci physics.comp-ph

    CTGNN: Crystal Transformer Graph Neural Network for Crystal Material Property Prediction

    Authors: Zijian Du, Luozhijie Jin, Le Shu, Yan Cen, Yuanfeng Xu, Yongfeng Mei, Hao Zhang

    Abstract: The combination of deep learning algorithm and materials science has made significant progress in predicting novel materials and understanding various behaviours of materials. Here, we introduced a new model called as the Crystal Transformer Graph Neural Network (CTGNN), which combines the advantages of Transformer model and graph neural networks to address the complexity of structure-properties r… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 17 pages

  19. arXiv:2405.09001  [pdf, other

    cs.RO

    BEVRender: Vision-based Cross-view Vehicle Registration in Off-road GNSS-denied Environment

    Authors: Lihong Jin, Wei Dong, Michael Kaess

    Abstract: We introduce BEVRender, a novel learning-based approach for the localization of ground vehicles in Global Navigation Satellite System (GNSS)-denied off-road scenarios. These environments are typically challenging for conventional vision-based state estimation due to the lack of distinct visual landmarks and the instability of vehicle poses. To address this, BEVRender generates high-quality local b… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures

    ACM Class: I.2.9

  20. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  21. arXiv:2405.04408  [pdf, other

    cs.CV

    DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

    Authors: Jiaxin Zhang, Dezhi Peng, Chongyu Liu, Peirong Zhang, Lianwen Jin

    Abstract: Document image restoration is a crucial aspect of Document AI systems, as the quality of document images significantly influences the overall performance. Prevailing methods address distinct restoration tasks independently, leading to intricate systems and the incapability to harness the potential synergies of multi-task learning. To overcome this challenge, we propose DocRes, a generalist model t… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024

  22. arXiv:2405.04390  [pdf, other

    cs.CV

    DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving

    Authors: Chen Min, Dawei Zhao, Liang Xiao, Jian Zhao, Xinli Xu, Zheng Zhu, Lei Jin, Jianshu Li, Yulan Guo, Junliang Xing, Liping Jing, Yiming Nie, Bin Dai

    Abstract: Vision-centric autonomous driving has recently raised wide attention due to its lower cost. Pre-training is essential for extracting a universal representation. However, current vision-centric pre-training typically relies on either 2D or 3D pre-text tasks, overlooking the temporal characteristics of autonomous driving as a 4D scene understanding task. In this paper, we address this challenge by i… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR2024

  23. arXiv:2405.02062  [pdf, other

    cs.LG

    Dyna-Style Learning with A Macroscopic Model for Vehicle Platooning in Mixed-Autonomy Traffic

    Authors: Yichuan Zou, Li Jin, Xi Xiong

    Abstract: Platooning of connected and autonomous vehicles (CAVs) plays a vital role in modernizing highways, ushering in enhanced efficiency and safety. This paper explores the significance of platooning in smart highways, employing a coupled partial differential equation (PDE) and ordinary differential equation (ODE) model to elucidate the complex interaction between bulk traffic flow and CAV platoons. Our… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  24. arXiv:2404.19652  [pdf, other

    cs.CV cs.AI

    VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization

    Authors: Yuliang Liu, Mingxin Huang, Hao Yan, Linger Deng, Weijia Wu, Hao Lu, Chunhua Shen, Lianwen Jin, Xiang Bai

    Abstract: Text spotting, a task involving the extraction of textual information from image or video sequences, faces challenges in cross-domain adaption, such as image-to-image and image-to-video generalization. In this paper, we introduce a new method, termed VimTS, which enhances the generalization ability of the model by achieving better synergy among different tasks. Typically, we propose a Prompt Queri… ▽ More

    Submitted 14 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  25. arXiv:2404.15237  [pdf

    cond-mat.mtrl-sci

    Insights into the defect-driven heterogeneous structural evolution of Ni-rich layered cathode in lithium-ion batteries

    Authors: Zhongyuan Huang, Ziwei Chen, Maolin Yang, Mihai Chu, Zenan Li, Sihao Deng, Lunhua He, Lei Jin, Rafal E. Dunin-Borkowski, Rui Wang, Jun Wang, Tingting Yang, Yinguo Xiao

    Abstract: Recently, considerable efforts have been made on research and improvement for Ni-rich lithium-ion batteries to meet the demand from vehicles and grid-level large-scale energy storage. Development of next-generation high-performance lithium-ion batteries requires a comprehensive understanding on the underlying electrochemical mechanisms associated with its structural evolution. In this work, advanc… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 29 pages and 5 figures for manuscript; 30 pages, 14 figures and 4 tables for supplementary information

  26. arXiv:2404.12253  [pdf, other

    cs.CL cs.LG

    Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

    Authors: Ye Tian, Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi, Dong Yu

    Abstract: Despite the impressive capabilities of Large Language Models (LLMs) on various tasks, they still struggle with scenarios that involves complex reasoning and planning. Recent work proposed advanced prompting techniques and the necessity of fine-tuning with high-quality data to augment LLMs' reasoning abilities. However, these approaches are inherently constrained by data availability and quality. I… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  27. arXiv:2404.09856  [pdf, other

    hep-ph hep-ex

    Matching Hadronization and Perturbative Evolution: The Cluster Model in Light of Infrared Shower Cutoff Dependence

    Authors: André H. Hoang, Oliver L. Jin, Simon Plätzer, Daniel Samitz

    Abstract: In the context of Monte Carlo (MC) generators with parton showers that have next-to-leading-logarithmic (NLL) precision, the cutoff $Q_0$ terminating the shower evolution should be viewed as an infrared factorization scale so that parameters or non-perturbative effects of the MC generator may have a field theoretic interpretation with a controllable scheme dependence. This implies that the generat… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 54 pages, 17 figures

    Report number: UWThPh-2023-23, MCnet-24-05

  28. arXiv:2404.09338  [pdf, other

    cs.CL

    Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models

    Authors: Souvik Das, Lifeng Jin, Linfeng Song, Haitao Mi, Baolin Peng, Dong Yu

    Abstract: Large language models (LLMs) exhibit impressive natural language capabilities but suffer from hallucination -- generating content ungrounded in the realities of training data. Recent work has focused on decoding techniques to improve factuality during inference by leveraging LLMs' hierarchical representation of factual knowledge, manipulating the predicted distributions at inference time. Current… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Work in Progress

  29. arXiv:2404.09188  [pdf, other

    eess.SY

    On Joint Convergence of Traffic State and Weight Vector in Learning-Based Dynamic Routing with Value Function Approximation

    Authors: Yidan Wu, Jianan Zhang, Li Jin

    Abstract: Learning-based approaches are increasingly popular for traffic control problems. However, these approaches are applied typically as black boxes with limited theoretical guarantees and interpretability. In this paper, we consider the theory of dynamic routing over parallel servers, a representative traffic control task, using semi-gradient on-policy control algorithm, a representative reinforcement… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  30. arXiv:2404.07985  [pdf, other

    cs.CV eess.IV

    WaveMo: Learning Wavefront Modulations to See Through Scattering

    Authors: Mingyang Xie, Haiyun Guo, Brandon Y. Feng, Lingbo Jin, Ashok Veeraraghavan, Christopher A. Metzler

    Abstract: Imaging through scattering media is a fundamental and pervasive challenge in fields ranging from medical diagnostics to astronomy. A promising strategy to overcome this challenge is wavefront modulation, which induces measurement diversity during image acquisition. Despite its importance, designing optimal wavefront modulations to image through scattering remains under-explored. This paper introdu… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  31. arXiv:2404.05008  [pdf, other

    eess.SY

    Minimax Least-Square Policy Iteration for Cost-Aware Defense of Traffic Routing against Unknown Threats

    Authors: Yuzhen Zhan, Li Jin

    Abstract: Dynamic routing is one of the representative control scheme in transportation, production lines, and data transmission. In the modern context of connectivity and autonomy, routing decisions are potentially vulnerable to malicious attacks. In this paper, we consider the dynamic routing problem over parallel traffic links in the face of such threats. An attacker is capable of increasing or destabili… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 8 pages, 3 figures

  32. arXiv:2404.04624  [pdf, other

    cs.CV

    Bridging the Gap Between End-to-End and Two-Step Text Spotting

    Authors: Mingxin Huang, Hongliang Li, Yuliang Liu, Xiang Bai, Lianwen Jin

    Abstract: Modularity plays a crucial role in the development and maintenance of complex systems. While end-to-end text spotting efficiently mitigates the issues of error accumulation and sub-optimal performance seen in traditional two-step methodologies, the two-step methods continue to be favored in many competitions and practical settings due to their superior modularity. In this paper, we introduce Bridg… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024

  33. arXiv:2404.04141  [pdf, ps, other

    physics.optics quant-ph

    Robust incoherent perfect absorption

    Authors: H. S. Xu, L. Jin

    Abstract: A coherent perfect absorber is capable of completely absorbing input waves. However, the coherent perfect absorption severely depends on the superposition of the input waves, and the perfect absorption is sensitive to the disorder of the absorber. Thus, a robust incoherent perfect absorption, being insensitive to the superposition of input waves and the system disorder, is desirable for practical… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 8 pages, 5 figures

    Journal ref: Phys. Rev. Research. 6, L022006 (2024)

  34. arXiv:2403.16803  [pdf, other

    cs.RO cs.CV

    Exploiting Priors from 3D Diffusion Models for RGB-Based One-Shot View Planning

    Authors: Sicong Pan, Liren Jin, Xuying Huang, Cyrill Stachniss, Marija Popović, Maren Bennewitz

    Abstract: Object reconstruction is relevant for many autonomous robotic tasks that require interaction with the environment. A key challenge in such scenarios is planning view configurations to collect informative measurements for reconstructing an initially unknown object. One-shot view planning enables efficient data collection by predicting view configurations and planning the globally shortest path conn… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Sicong Pan and Liren Jin have equal contribution. Submitted to IROS 2024

  35. arXiv:2403.16034  [pdf, other

    cs.CV

    V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

    Authors: Hao Xiang, Zhaoliang Zheng, Xin Xia, Runsheng Xu, Letian Gao, Zewei Zhou, Xu Han, Xinkai Ji, Mingxi Li, Zonglin Meng, Li Jin, Mingyue Lei, Zhaoyang Ma, Zihang He, Haoxuan Ma, Yunshuang Yuan, Yingqian Zhao, Jiaqi Ma

    Abstract: Recent advancements in Vehicle-to-Everything (V2X) technologies have enabled autonomous vehicles to share sensing information to see through occlusions, greatly boosting the perception capability. However, there are no real-world datasets to facilitate the real V2X cooperative perception research -- existing datasets either only support Vehicle-to-Infrastructure cooperation or Vehicle-to-Vehicle c… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  36. arXiv:2403.13761  [pdf, other

    cs.CV

    HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition

    Authors: Yuyi Zhang, Yuanzhi Zhu, Dezhi Peng, Peirong Zhang, Zhenhua Yang, Zhibo Yang, Cong Yao, Lianwen Jin

    Abstract: Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary. Traditional one-hot encoding methods struggle with the representation of hierarchical radicals, recognition of Out-Of-Vocabulary (OOV) characters, and on-device deployment due to their computational intensity. To address these challenges, we propose… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  37. arXiv:2403.11233  [pdf, other

    cs.RO cs.CV

    STAIR: Semantic-Targeted Active Implicit Reconstruction

    Authors: Liren Jin, Haofei Kuang, Yue Pan, Cyrill Stachniss, Marija Popović

    Abstract: Many autonomous robotic applications require object-level understanding when deployed. Actively reconstructing objects of interest, i.e. objects with specific semantic meanings, is therefore relevant for a robot to perform downstream tasks in an initially unknown environment. In this work, we propose a novel framework for semantic-targeted active reconstruction using posed RGB-D measurements and 2… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  38. arXiv:2403.10656  [pdf, other

    cs.IT

    Properties of the Strong Data Processing Constant for Rényi Divergence

    Authors: Lifu Jin, Amedeo Roberto Esposito, Michael Gastpar

    Abstract: Strong data processing inequalities (SDPI) are an important object of study in Information Theory and have been well studied for $f$-divergences. Universal upper and lower bounds have been provided along with several applications, connecting them to impossibility (converse) results, concentration of measure, hypercontractivity, and so on. In this paper, we study Rényi divergence and the correspond… ▽ More

    Submitted 14 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: 6 pages, 1 figure

  39. arXiv:2403.09849  [pdf, other

    cs.CL cs.AI

    Self-Consistency Boosts Calibration for Math Reasoning

    Authors: Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu

    Abstract: Calibration, which establishes the correlation between accuracy and model confidence, is important for LLM development. We design three off-the-shelf calibration methods based on self-consistency (Wang et al., 2022) for math reasoning tasks. Evaluation on two popular benchmarks (GSM8K and MathQA) using strong open-source LLMs (Mistral and LLaMA2), our methods better bridge model confidence and acc… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  40. arXiv:2403.08768  [pdf, other

    cs.CV

    3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surface

    Authors: Linyi Jin, Nilesh Kulkarni, David Fouhey

    Abstract: This paper introduces 3DFIRES, a novel system for scene-level 3D reconstruction from posed images. Designed to work with as few as one view, 3DFIRES reconstructs the complete geometry of unseen scenes, including hidden surfaces. With multiple view inputs, our method produces full reconstruction within all camera frustums. A key feature of our approach is the fusion of multi-view information at the… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. Project Page https://jinlinyi.github.io/3DFIRES/

  41. arXiv:2403.04997  [pdf, other

    cs.CL cs.CV

    DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation

    Authors: Jiapeng Wang, Chengyu Wang, Tingfeng Cao, Jun Huang, Lianwen Jin

    Abstract: We present DiffChat, a novel method to align Large Language Models (LLMs) to "chat" with prompt-as-input Text-to-Image Synthesis (TIS) models (e.g., Stable Diffusion) for interactive image creation. Given a raw prompt/image and a user-specified instruction, DiffChat can effectively make appropriate modifications and generate the target prompt, which can be leveraged to create the target image of h… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  42. arXiv:2403.04145  [pdf, other

    eess.SY

    A Crosstalk-Aware Timing Prediction Method in Routing

    Authors: Leilei Jin, Jiajie Xu, Wenjie Fu, Hao Yan, Longxing Shi

    Abstract: With shrinking interconnect spacing in advanced technology nodes, existing timing predictions become less precise due to the challenging quantification of crosstalk-induced delay. During the routing, the crosstalk effect is typically modeled by predicting coupling capacitance with congestion information. However, the timing estimation tends to be overly pessimistic, as the crosstalk-induced delay… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 6 pages, 8 figures

    ACM Class: I.6.4; B.7.3

  43. arXiv:2403.03496  [pdf, ps, other

    cs.CL

    A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation

    Authors: Xiangci Li, Linfeng Song, Lifeng Jin, Haitao Mi, Jessica Ouyang, Dong Yu

    Abstract: Knowledge-based, open-domain dialogue generation aims to build chit-chat systems that talk to humans using mined support knowledge. Many types and sources of knowledge have previously been shown to be useful as support knowledge. Even in the era of large language models, response generation grounded in knowledge retrieved from additional up-to-date sources remains a practically important approach.… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted by LREC-COLING 2024

  44. arXiv:2403.03221  [pdf, other

    cs.CV

    FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation

    Authors: Chris Rockwell, Nilesh Kulkarni, Linyi Jin, Jeong Joon Park, Justin Johnson, David F. Fouhey

    Abstract: Estimating relative camera poses between images has been a central problem in computer vision. Methods that find correspondences and solve for the fundamental matrix offer high precision in most cases. Conversely, methods predicting pose directly using neural networks are more robust to limited overlap and can infer absolute translation scale, but at the expense of reduced precision. We show how t… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. Project Page: https://crockwell.github.io/far/

  45. arXiv:2402.18454  [pdf, other

    hep-ph astro-ph.HE

    Prospects for measuring time variation of astrophysical neutrino sources at dark matter detectors

    Authors: Yi Zhuang, Louis E. Strigari, Lei Jin, Samiran Sinha

    Abstract: We study the prospects for measuring the time variation of solar and atmospheric neutrino fluxes at future large-scale Xenon and Argon dark matter detectors. For solar neutrinos, a yearly time variation arises from the eccentricity of the Earth's orbit, and, for charged current interactions, from a smaller energy-dependent day-night variation to due flavor regeneration as neutrinos travel through… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 23 pages, 15 figures

  46. arXiv:2402.18041  [pdf, other

    cs.CL cs.AI

    Datasets for Large Language Models: A Comprehensive Survey

    Authors: Yang Liu, Jiahuan Cao, Chongyu Liu, Kai Ding, Lianwen Jin

    Abstract: This paper embarks on an exploration into the Large Language Model (LLM) datasets, which play a crucial role in the remarkable advancements of LLMs. The datasets serve as the foundational infrastructure analogous to a root system that sustains and nurtures the development of LLMs. Consequently, examination of these datasets emerges as a critical topic in research. In order to address the current l… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 181 pages, 21 figures

  47. arXiv:2402.17982  [pdf, other

    cs.CL

    Collaborative decoding of critical tokens for boosting factuality of large language models

    Authors: Lifeng Jin, Baolin Peng, Linfeng Song, Haitao Mi, Ye Tian, Dong Yu

    Abstract: The most common training pipeline for large language models includes pretraining, finetuning and aligning phases, with their respective resulting models, such as the pretrained model and the finetuned model. Finetuned and aligned models show improved abilities of instruction following and safe generation, however their abilities to stay factual about the world are impacted by the finetuning proces… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: work in progress

  48. arXiv:2402.17555  [pdf, other

    cs.CV

    Scribble Hides Class: Promoting Scribble-Based Weakly-Supervised Semantic Segmentation with Its Class Label

    Authors: Xinliang Zhang, Lei Zhu, Hangzhou He, Lujia Jin, Yanye Lu

    Abstract: Scribble-based weakly-supervised semantic segmentation using sparse scribble supervision is gaining traction as it reduces annotation costs when compared to fully annotated alternatives. Existing methods primarily generate pseudo-labels by diffusing labeled pixels to unlabeled ones with local cues for supervision. However, this diffusion process fails to exploit global semantics and class-specific… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  49. arXiv:2402.16571  [pdf, other

    math.DG

    On the causal discontinuity of Morse spacetimes

    Authors: Lucas Dahinden, Liang Jin

    Abstract: Morse spacetime is a model of singular Lorentzian manifold, built upon a Morse function which serves as a global time function outside its critical points. The Borde-Sorkin conjecture states that a Morse spacetime is causally continuous if and only if the index and coindex of critical points of the corresponding Morse function are both different from 1. The conjecture has recently been confirmed b… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    MSC Class: Primary 53C50; Secondary 83C75; 53Z05

  50. arXiv:2402.15631  [pdf, other

    cs.CL cs.AI

    Fine-Grained Self-Endorsement Improves Factuality and Reasoning

    Authors: Ante Wang, Linfeng Song, Baolin Peng, Ye Tian, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu

    Abstract: This work studies improving large language model (LLM) generations at inference time by mitigating fact-conflicting hallucinations. Particularly, we propose a self-endorsement framework that leverages the fine-grained fact-level comparisons across multiple sampled responses. Compared with prior ensemble methods (Wang et al., 2022;Chen et al., 2023)) that perform response-level selection, our appro… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.