Skip to main content

Showing 1–50 of 802 results for author: Cai, J

  1. arXiv:2407.09977  [pdf

    physics.geo-ph cs.AI

    Mitigating Interpretation Bias in Rock Records with Large Language Models: Insights from Paleoenvironmental Analysis

    Authors: Luoqi Wang, Haipeng Li, Linshu Hu, Jiarui Cai, Zhenhong Du

    Abstract: The reconstruction of Earth's history faces significant challenges due to the nonunique interpretations often derived from rock records. The problem has long been recognized but there are no systematic solutions in practice. This study introduces an innovative approach that leverages Large Language Models (LLMs) along with retrieval augmented generation and real-time search capabilities to counter… ▽ More

    Submitted 17 May, 2024; originally announced July 2024.

  2. arXiv:2407.08366  [pdf, other

    cs.RO cs.CV

    An Economic Framework for 6-DoF Grasp Detection

    Authors: Xiao-Ming Wu, Jia-Feng Cai, Jian-Jian Jiang, Dian Zheng, Yi-Lin Wei, Wei-Shi Zheng

    Abstract: Robotic grasping in clutters is a fundamental task in robotic manipulation. In this work, we propose an economic framework for 6-DoF grasp detection, aiming to economize the resource cost in training and meanwhile maintain effective grasp performance. To begin with, we discover that the dense supervision is the bottleneck of current SOTA methods that severely encumbers the entire training overload… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 19 pages, 7 figures. Accepted in ECCV 2024!

  3. arXiv:2407.06662  [pdf, other

    eess.SP

    Experimental Demonstration of 16D Voronoi Constellation with Two-Level Coding over 50km Four-Core Fiber

    Authors: Can Zhao, Bin Chen, Jiaqi Cai, Zhiwei Liang, Yi Lei, Junjie Xiong, Lin Ma, Daohui Hu, Lin Sun, Gangxiang Shen

    Abstract: A 16-dimensional Voronoi constellation concatenated with multilevel coding is experimentally demonstrated over a 50km four-core fiber transmission system. The proposed scheme reduces the required launch power by 6dB and provides a 17dB larger operating range than 16QAM with BICM at the outer HD-FEC BER threshold.

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 4 pages, 4 figures, accepted by 2024 European Conference on Optical Communication (ECOC)

  4. arXiv:2407.06612  [pdf

    eess.IV cs.CV cs.LG

    AI-based Automatic Segmentation of Prostate on Multi-modality Images: A Review

    Authors: Rui Jin, Derun Li, Dehui Xiang, Lei Zhang, Hailing Zhou, Fei Shi, Weifang Zhu, Jing Cai, Tao Peng, Xinjian Chen

    Abstract: Prostate cancer represents a major threat to health. Early detection is vital in reducing the mortality rate among prostate cancer patients. One approach involves using multi-modality (CT, MRI, US, etc.) computer-aided diagnosis (CAD) systems for the prostate region. However, prostate segmentation is challenging due to imperfections in the images and the prostate's complex tissue structure. The ad… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  5. arXiv:2407.04938  [pdf, other

    cs.CV

    SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation

    Authors: Guoan Wang, Jin Ye, Junlong Cheng, Tianbin Li, Zhaolin Chen, Jianfei Cai, Junjun He, Bohan Zhuang

    Abstract: Volumetric medical image segmentation is pivotal in enhancing disease diagnosis, treatment planning, and advancing medical research. While existing volumetric foundation models for medical image segmentation, such as SAM-Med3D and SegVol, have shown remarkable performance on general organs and tumors, their ability to segment certain categories in clinical downstream tasks remains limited. Supervi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Journal ref: MICCAI 2024

  6. arXiv:2407.02203  [pdf, other

    cs.CL cs.AI

    Automatic Adaptation Rule Optimization via Large Language Models

    Authors: Yusei Ishimizu, Jialong Li, Jinglue Xu, Jinyu Cai, Hitoshi Iba, Kenji Tei

    Abstract: Rule-based adaptation is a foundational approach to self-adaptation, characterized by its human readability and rapid response. However, building high-performance and robust adaptation rules is often a challenge because it essentially involves searching the optimal design in a complex (variables) space. In response, this paper attempt to employ large language models (LLMs) as a optimizer to constr… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  7. arXiv:2407.01469  [pdf, other

    eess.IV

    Unrolling Plug-and-Play Gradient Graph Laplacian Regularizer for Image Restoration

    Authors: Jianghe Cai, Gene Cheung, Fei Chen

    Abstract: Generic deep learning (DL) networks for image restoration like denoising and interpolation lack mathematical interpretability, require voluminous training data to tune a large parameter set, and are fragile during covariance shift. To address these shortcomings, for a general linear image formation model, we first formulate a convex optimization problem with a new graph smoothness prior called gra… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  8. arXiv:2407.00908  [pdf, other

    cs.CL cs.AI

    Fine-grained, Multi-dimensional Summarization Evaluation with LLMs

    Authors: Hwanjun Song, Hang Su, Igor Shalyminov, Jason Cai, Saab Mansour

    Abstract: Automated evaluation is crucial for streamlining text summarization benchmarking and model development, given the costly and time-consuming nature of human evaluation. Traditional methods like ROUGE do not correlate well with human judgment, while recently proposed LLM-based metrics provide only summary-level assessment using Likert-scale scores. This limits deeper model analysis, e.g., we can onl… ▽ More

    Submitted 9 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted at ACL 2024 (main, long)

  9. arXiv:2406.19435  [pdf, other

    cs.CV

    A Sanity Check for AI-generated Image Detection

    Authors: Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, Weidi Xie

    Abstract: With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been solved". To start with, we present Chameleon dataset, consisting AIgenerated images that are genuinely challenging for human perception. To quantify th… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Project page: https://shilinyan99.github.io/AIDE Code: https://github.com/shilinyan99/AIDE

  10. arXiv:2406.14927  [pdf, other

    cs.CV cs.RO

    Gaussian-Informed Continuum for Physical Property Identification and Simulation

    Authors: Junhao Cai, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, Qifeng Chen

    Abstract: This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 19 pages, 8 figures

  11. arXiv:2406.12846  [pdf, other

    cs.CV

    DrVideo: Document Retrieval Based Long Video Understanding

    Authors: Ziyu Ma, Chenhui Gou, Hengcan Shi, Bin Sun, Shutao Li, Hamid Rezatofighi, Jianfei Cai

    Abstract: Existing methods for long video understanding primarily focus on videos only lasting tens of seconds, with limited exploration of techniques for handling longer videos. The increased number of frames in longer videos presents two main challenges: difficulty in locating key information and performing long-range reasoning. Thus, we propose DrVideo, a document-retrieval-based system designed for long… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 11 pages

  12. arXiv:2406.09680  [pdf, other

    cs.LG cs.DC

    Heterogeneous Federated Learning with Convolutional and Spiking Neural Networks

    Authors: Yingchao Yu, Yuping Yan, Jisong Cai, Yaochu Jin

    Abstract: Federated learning (FL) has emerged as a promising paradigm for training models on decentralized data while safeguarding data privacy. Most existing FL systems, however, assume that all machine learning models are of the same type, although it becomes more likely that different edge devices adopt different types of AI models, including both conventional analogue artificial neural networks (ANNs) a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures, FL@FM-IJCAI'24

  13. arXiv:2406.09591  [pdf

    cond-mat.mes-hall cond-mat.str-el

    Ferromagnetism and Topology of the Higher Flat Band in a Fractional Chern Insulator

    Authors: Heonjoon Park, Jiaqi Cai, Eric Anderson, Xiao-Wei Zhang, Xiaoyu Liu, William Holtzmann, Weijie Li, Chong Wang, Chaowei Hu, Yuzhou Zhao, Takashi Taniguchi, Kenji Watanabe, Jihui Yang, David Cobden, Jiun-Haw Chu, Nicolas Regnault, B. Andrei Bernevig, Liang Fu, Ting Cao, Di Xiao, Xiaodong Xu

    Abstract: The recent observation of the fractional quantum anomalous Hall effect in moiré fractional Chern insulators (FCI) provides opportunities for investigating zero magnetic field anyons. So far, both experimental and theoretical results suggest that filling > 1/3 FCI states in the first Chern band share features with those of the lowest Landau level (LL). To create the possibility of realizing non-Abe… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 24 pages, 4 figures

  14. arXiv:2406.09041  [pdf, other

    cs.CL cs.AI cs.LG

    ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models

    Authors: Jing Liu, Ruihao Gong, Mingyang Zhang, Yefei He, Jianfei Cai, Bohan Zhuang

    Abstract: The typical process for developing LLMs involves pre-training a general foundation model on massive data, followed by fine-tuning on task-specific data to create specialized experts. Serving these experts poses challenges, as loading all experts onto devices is impractical, and frequent switching between experts in response to user requests incurs substantial I/O costs, increasing latency and expe… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Tech report

  15. arXiv:2406.08698  [pdf, other

    astro-ph.HE hep-ph

    Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 17 pages, 12 figures, accepted by PRL

  16. arXiv:2406.05641  [pdf, other

    cs.CV

    PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction

    Authors: Shangyu Chen, Zizheng Pan, Jianfei Cai, Dinh Phung

    Abstract: Personalizing a large-scale pretrained Text-to-Image (T2I) diffusion model is challenging as it typically struggles to make an appropriate trade-off between its training data distribution and the target distribution, i.e., learning a novel concept with only a few target images to achieve personalization (aligning with the personalized target) while preserving text editability (aligning with divers… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  17. arXiv:2406.05588  [pdf, other

    cs.CL cs.AI cs.LG

    CERET: Cost-Effective Extrinsic Refinement for Text Generation

    Authors: Jason Cai, Hang Su, Monica Sunkara, Igor Shalyminov, Saab Mansour

    Abstract: Large Language Models (LLMs) are powerful models for generation tasks, but they may not generate good quality outputs in their first attempt. Apart from model fine-tuning, existing approaches to improve prediction accuracy and quality typically involve LLM self-improvement / self-reflection that incorporate feedback from models themselves. Despite their effectiveness, these methods are hindered by… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: The source code and data samples are released at https://github.com/amazon-science/CERET-LLM-refine

  18. arXiv:2406.04101  [pdf, other

    cs.CV

    How Far Can We Compress Instant-NGP-Based NeRF?

    Authors: Yihang Chen, Qianyi Wu, Mehrtash Harandi, Jianfei Cai

    Abstract: In recent years, Neural Radiance Field (NeRF) has demonstrated remarkable capabilities in representing 3D scenes. To expedite the rendering process, learnable explicit representations have been introduced for combination with implicit NeRF representation, which however results in a large storage space requirement. In this paper, we introduce the Context-based NeRF Compression (CNC) framework, whic… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://yihangchen-ee.github.io/project_cnc/ Code: https://github.com/yihangchen-ee/cnc/. We further propose a 3DGS compression method HAC, which is based on CNC: https://yihangchen-ee.github.io/project_hac/

    Journal ref: CVPR 2024

  19. arXiv:2406.00985  [pdf, other

    cs.CV

    MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models

    Authors: Mingzhen Huang, Jialing Cai, Shan Jia, Vishnu Suresh Lokhande, Siwei Lyu

    Abstract: Text-driven image synthesis has made significant advancements with the development of diffusion models, transforming how visual content is generated from text prompts. Despite these advances, text-driven image editing, a key area in computer graphics, faces unique challenges. A major challenge is making simultaneous edits across multiple objects or attributes. Applying these methods sequentially f… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  20. arXiv:2405.19308  [pdf, other

    cond-mat.mes-hall

    Visualizing the microscopic origins of topology in twisted molybdenum ditelluride

    Authors: Ellis Thompson, Keng Tou Chu, Florie Mesple, Xiao-Wei Zhang, Chaowei Hu, Yuzhou Zhao, Heonjoon Park, Jiaqi Cai, Eric Anderson, Kenji Watanabe, Takashi Taniguchi, Jihui Yang, Jiun-Haw Chu, Xiaodong Xu, Ting Cao, Di Xiao, Matthew Yankowitz

    Abstract: In moiré materials with flat electronic bands and suitable quantum geometry, strong correlations can give rise to novel topological states of matter. The nontrivial band topology of twisted molybdenum ditelluride (tMoTe$_2$) -- responsible for its fractional quantum anomalous Hall (FQAH) states -- is predicted to arise from a layer-pseudospin skyrmion lattice. Tracing the layer polarization of wav… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 7 pages, 4 figures, Extended Data, 9 figures, Supplementary Information, 8 pages, 5 figures

  21. arXiv:2405.10269  [pdf, other

    cond-mat.mes-hall

    Direct magnetic imaging of fractional Chern insulators in twisted MoTe$_2$ with a superconducting sensor

    Authors: Evgeny Redekop, Canxun Zhang, Heonjoon Park, Jiaqi Cai, Eric Anderson, Owen Sheekey, Trevor Arp, Grigory Babikyan, Samuel Salters, Kenji Watanabe, Takashi Taniguchi, Xiaodong Xu, Andrea F. Young

    Abstract: In the absence of time reversal symmetry, orbital magnetization provides a sensitive probe of topology and interactions, with particularly rich phenomenology in Chern insulators where topological edge states carry large equilibrium currents. Here, we use a nanoscale superconducting sensor to map the magnetic fringe fields in twisted bilayers of MoTe$_2$, where transport and optical sensing experim… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  22. arXiv:2405.09463  [pdf, other

    cs.CV

    Gaze-DETR: Using Expert Gaze to Reduce False Positives in Vulvovaginal Candidiasis Screening

    Authors: Yan Kong, Sheng Wang, Jiangdong Cai, Zihao Zhao, Zhenrong Shen, Yonghao Li, Manman Fei, Qian Wang

    Abstract: Accurate detection of vulvovaginal candidiasis is critical for women's health, yet its sparse distribution and visually ambiguous characteristics pose significant challenges for accurate identification by pathologists and neural networks alike. Our eye-tracking data reveals that areas garnering sustained attention - yet not marked by experts after deliberation - are often aligned with false positi… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: MICCAI-2024 early accept. Our code is available at https://github.com/YanKong0408/Gaze-DETR

  23. arXiv:2405.09153  [pdf, other

    cs.CL cs.LG

    Adapting Abstract Meaning Representation Parsing to the Clinical Narrative -- the SPRING THYME parser

    Authors: Jon Z. Cai, Kristin Wright-Bettner, Martha Palmer, Guergana K. Savova, James H. Martin

    Abstract: This paper is dedicated to the design and evaluation of the first AMR parser tailored for clinical notes. Our objective was to facilitate the precise transformation of the clinical notes into structured AMR expressions, thereby enhancing the interpretability and usability of clinical text data at scale. Leveraging the colon cancer dataset from the Temporal Histories of Your Medical Events (THYME)… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted to the 6th Clinical NLP Workshop at NAACL, 2024

  24. arXiv:2405.07691  [pdf, other

    astro-ph.HE

    Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  25. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  26. arXiv:2405.03806  [pdf, other

    cs.HC

    In Situ AI Prototyping: Infusing Multimodal Prompts into Mobile Settings with MobileMaker

    Authors: Savvas Petridis, Michael Xieyang Liu, Alexander J. Fiannaca, Vivian Tsai, Michael Terry, Carrie J. Cai

    Abstract: Recent advances in multimodal large language models (LLMs) have lowered the barriers to rapidly prototyping AI-powered features via prompting, especially for mobile-intended use cases. Despite the value of situated user feedback, the process of soliciting early, mobile-situated user feedback on AI prototypes remains challenging. The broad scope and flexibility of LLMs means that, for a given use-c… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  27. arXiv:2405.03229  [pdf, ps, other

    math.CO

    Spectral conditions for the existence of (doubly) chorded cycles in graphs with fixed size

    Authors: Jin Cai, Leyou Xu, Bo Zhou

    Abstract: A chorded cycle is a cycle with at least one chord, and a doubly chorded cycle is a cycle with at least two chords. Gould asked in [Graphs Comb. 38 (2022) 189] the question: What spectral conditions imply a graph contains a chorded cycle? For a graph with fixed size, extremal spectral conditions are given to ensure that a graph contains a chorded cycle and a doubly chorded cycle, respectively, via… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  28. arXiv:2405.02876  [pdf, ps, other

    cs.NE cs.LG

    Exploring the Improvement of Evolutionary Computation via Large Language Models

    Authors: Jinyu Cai, Jinglue Xu, Jialong Li, Takuto Ymauchi, Hitoshi Iba, Kenji Tei

    Abstract: Evolutionary computation (EC), as a powerful optimization algorithm, has been applied across various domains. However, as the complexity of problems increases, the limitations of EC have become more apparent. The advent of large language models (LLMs) has not only transformed natural language processing but also extended their capabilities to diverse fields. By harnessing LLMs' vast knowledge and… ▽ More

    Submitted 23 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

    Comments: accepted by GECCO 2024

  29. arXiv:2405.02858  [pdf, ps, other

    cs.SI cs.CL

    Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation

    Authors: Jinyu Cai, Jialong Li, Mingyue Zhang, Munan Li, Chen-Shu Wang, Kenji Tei

    Abstract: Social media platforms such as Twitter, Reddit, and Sina Weibo play a crucial role in global communication but often encounter strict regulations in geopolitically sensitive regions. This situation has prompted users to ingeniously modify their way of communicating, frequently resorting to coded language in these regulated social media environments. This shift in communication is not merely a stra… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE WCCI 2024

  30. arXiv:2405.01047  [pdf, ps, other

    math.OC cs.GT

    Optimal Pricing for Linear-Quadratic Games with Nonlinear Interaction Between Agents

    Authors: Jiamin Cai, Chenyue Zhang, Hoi-To Wai

    Abstract: This paper studies a class of network games with linear-quadratic payoffs and externalities exerted through a strictly concave interaction function. This class of game is motivated by the diminishing marginal effects with peer influences. We analyze the optimal pricing strategy for this class of network game. First, we prove the existence of a unique Nash Equilibrium (NE). Second, we study the opt… ▽ More

    Submitted 3 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 7 pages, 2 figures, accepted by IEEE Control Systems Letters

  31. arXiv:2404.18121  [pdf, ps, other

    math.OC

    Research on the Evaluation Index System of Enterprise Production Efficiency

    Authors: W. Li, J. Cai, C. Wang, Y. Chen, J. Xu, J. Zhao, Y. Chen

    Abstract: This paper focuses on studying the evaluation index system for the production efficiency of tobacco enterprises. Considering the limitations of existing evaluation methods in accurately assessing the production quality of cigarette enterprises, a mathematical model based on the Analytic Hierarchy Process (AHP) is established. This model constructs an evaluation framework for the production efficie… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  32. arXiv:2404.18033  [pdf, other

    cs.CV

    Exposing Text-Image Inconsistency Using Diffusion Models

    Authors: Mingzhen Huang, Shan Jia, Zhou Zhou, Yan Ju, Jialing Cai, Siwei Lyu

    Abstract: In the battle against widespread online misinformation, a growing problem is text-image inconsistency, where images are misleadingly paired with texts with different intent or meaning. Existing classification-based methods for text-image inconsistency can identify contextual inconsistencies but fail to provide explainable justifications for their decisions that humans can understand. Although more… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  33. arXiv:2404.15007  [pdf, ps, other

    cond-mat.mtrl-sci

    Single-Spin Waved-Brim Flat-Top Hat in the Band Edge of GdIH Monolayer

    Authors: Ningning Jia, Zhao Yang, Jiangtao Cai, Zhiheng Lv, Yongting Shi, Tielei Song, Xin Cui, Zhifeng Liu

    Abstract: Exotic electronic bands, such as flat bands, linear crossing bands, spontaneously valley- or spin-polarized bands, in two-dimensional materials have been the hot topics in condensed matter physics. Herein, we first propose a general dispersion model for possible hat-like electronic bands, and then identify an intriguing single-spin \emph{waved-brim flat-top hat} in the valence band edge of a stabl… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  34. arXiv:2404.14305  [pdf, other

    cs.HC

    "I Upload...All Types of Different Things to Say, the World of Blindness Is More Than What They Think It Is": A Study of Blind TikTokers' Identity Work from a Flourishing Perspective

    Authors: Yao Lyu, Jie Cai, Bryan Dosono, Davis Yadav, John M. Carroll

    Abstract: Identity work in Human-Computer Interaction (HCI) has focused on the marginalized group to explore designs to support their asset (what they have). However, little has been explored specifically on the identity work of people with disabilities, specifically, visual impairments. In this study, we interviewed 45 BlindTokers (blind users on TikTok) from various backgrounds to understand their identit… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: ACM CSCW

  35. arXiv:2404.12759  [pdf, other

    cs.LG

    decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points

    Authors: Yi Guo, Fanliu Kong, Xiaoyang Li, Hui Li, Wei Chen, Xiaogang Tian, Jinping Cai, Yang Zhang, Shouda Liu

    Abstract: Quantization emerges as one of the most promising compression technologies for deploying efficient large models for various real time application in recent years. Considering that the storage and IO of weights take up the vast majority of the overhead inside a large model, weight only quantization can lead to large gains. However, existing quantization schemes suffer from significant accuracy degr… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: quantization for deep models

  36. arXiv:2404.10973  [pdf, other

    quant-ph

    Quantum delocalization on correlation landscape: The key to exponentially fast multipartite entanglement generation

    Authors: Yaoming Chu, Xiangbei Li, Jianming Cai

    Abstract: Entanglement, a hallmark of quantum mechanics, is a vital resource for quantum technologies. Generating highly entangled multipartite states is a key goal in current quantum experiments. We unveil a novel framework for understanding entanglement generation dynamics in Hamiltonian systems by quantum delocalization of an effective operator wavefunction on a correlation landscape. Our framework estab… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  37. arXiv:2404.09000  [pdf, other

    eess.IV cs.CV cs.LG

    MaSkel: A Model for Human Whole-body X-rays Generation from Human Masking Images

    Authors: Yingjie Xi, Boyuan Cheng, Jingyao Cai, Jian Jun Zhang, Xiaosong Yang

    Abstract: The human whole-body X-rays could offer a valuable reference for various applications, including medical diagnostics, digital animation modeling, and ergonomic design. The traditional method of obtaining X-ray information requires the use of CT (Computed Tomography) scan machines, which emit potentially harmful radiation. Thus it faces a significant limitation for realistic applications because it… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  38. arXiv:2404.08521  [pdf, other

    cond-mat.mes-hall

    The magnetism measurements of the two-dimensional van der Waals antiferromagnet CrPS4 using dynamic cantilever magnetometry

    Authors: Qi Li, Weili Zhen, Ning Wang, Yang Yu, Senyang Pan, Lin Deng, Jiaqiang Cai, Kang Wang, Lvkuan Zou, Zhongming Zeng, Jinglei Zhang, Haifeng Du

    Abstract: The exploration of van der Waals (vdWs) magnetic materials has sparked great interest in spintronics. However, conventional methods often face challenges in characterizing the magnetic properties of small-sized vdWs materials, especially for antiferromagnets with extremely small magnetic moments. Here, we demonstrate the efficacy of dynamic cantilever magnetometry (DCM) in characterizing the magne… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  39. arXiv:2404.07949  [pdf, other

    cs.CV

    Taming Stable Diffusion for Text to 360° Panorama Image Generation

    Authors: Cheng Zhang, Qianyi Wu, Camilo Cruz Gambardella, Xiaoshui Huang, Dinh Phung, Wanli Ouyang, Jianfei Cai

    Abstract: Generative models, e.g., Stable Diffusion, have enabled the creation of photorealistic images from text prompts. Yet, the generation of 360-degree panorama images from text remains a challenge, particularly due to the dearth of paired text-panorama data and the domain gap between panorama and perspective images. In this paper, we introduce a novel dual-branch diffusion model named PanFusion to gen… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project Page: https://chengzhag.github.io/publication/panfusion Code: https://github.com/chengzhag/PanFusion

  40. "We Need Structured Output": Towards User-centered Constraints on Large Language Model Output

    Authors: Michael Xieyang Liu, Frederick Liu, Alexander J. Fiannaca, Terry Koo, Lucas Dixon, Michael Terry, Carrie J. Cai

    Abstract: Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Journal ref: "We Need Structured Output": Towards User-centered Constraints on LLM Output. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), May 11-16, 2024, Honolulu, HI, USA

  41. arXiv:2404.06395  [pdf, other

    cs.CL cs.LG

    MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

    Authors: Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, Xinrong Zhang, Zheng Leng Thai, Kaihuo Zhang, Chongyi Wang, Yuan Yao, Chenyang Zhao, Jie Zhou, Jie Cai, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

    Abstract: The burgeoning interest in developing Large Language Models (LLMs) with up to trillion parameters has been met with concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores the importance of exploring the potential of Small Language Models (SLMs) as a resource-efficient alternative. In this context, we introduce… ▽ More

    Submitted 3 June, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: revise according to peer review

  42. arXiv:2404.05016  [pdf, other

    cs.CV

    Hyperbolic Learning with Synthetic Captions for Open-World Detection

    Authors: Fanjie Kong, Yanbei Chen, Jiarui Cai, Davide Modolo

    Abstract: Open-world detection poses significant challenges, as it requires the detection of any object using either object class labels or free-form texts. Existing related works often use large-scale manual annotated caption datasets for training, which are extremely expensive to collect. Instead, we propose to transfer knowledge from vision-language models (VLMs) to enrich the open-vocabulary description… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  43. arXiv:2404.04836  [pdf, ps, other

    math.AP

    Global strong solution to the inviscid liquid-gas two-phase flow model in $L^p$ framework

    Authors: Zhigang Wu, Mengqian Liu, Juanzi Cai

    Abstract: This paper is dedicated to the study of the inviscid liquid-gas two-phase flow model in $\mathbb{R}^d\ (d\geq1)$. We establish the global existence of strong solutions to this system with small initial data in hybrid Besov spaces based on general $L^p$-norms. Additionally, we obtain the decay estimates of solutions rely on the constructed Lyapunov functional.

    Submitted 7 April, 2024; originally announced April 2024.

    MSC Class: 35A09; 35B40; 35Q35

  44. arXiv:2404.04801  [pdf, ps, other

    astro-ph.IM astro-ph.HE

    LHAASO-KM2A detector simulation using Geant4

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (254 additional authors not shown)

    Abstract: KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  45. arXiv:2404.04629  [pdf, other

    cs.CV

    DifFUSER: Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation

    Authors: Duy-Tho Le, Hengcan Shi, Jianfei Cai, Hamid Rezatofighi

    Abstract: Diffusion models have recently gained prominence as powerful deep generative models, demonstrating unmatched performance across various domains. However, their potential in multi-sensor fusion remains largely unexplored. In this work, we introduce DifFUSER, a novel approach that leverages diffusion models for multi-modal fusion in 3D object detection and BEV map segmentation. Benefiting from the i… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 23 pages

  46. arXiv:2404.01686  [pdf, other

    cs.CV

    JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

    Authors: Duy-Tho Le, Chenhui Gou, Stavya Datta, Hengcan Shi, Ian Reid, Jianfei Cai, Hamid Rezatofighi

    Abstract: Autonomous robot systems have attracted increasing research attention in recent years, where environment understanding is a crucial step for robot navigation, human-robot interaction, and decision. Real-world robot systems usually collect visual data from multiple sensors and are required to recognize numerous objects and their movements in complex human-crowded settings. Traditional benchmarks, w… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  47. arXiv:2404.01078  [pdf, other

    cs.LG

    Energy-based Model for Accurate Shapley Value Estimation in Interpretable Deep Learning Predictive Modeling

    Authors: Cheng Lu, Jiusun Zeng, Yu Xia, Jinhui Cai, Shihua Luo

    Abstract: As a favorable tool for explainable artificial intelligence (XAI), Shapley value has been widely used to interpret deep learning based predictive models. However, accurate and efficient estimation of Shapley value is difficult since the computation load grows exponentially with the increase of input features. Most existing accelerated estimation methods have to compromise on estimation accuracy wi… ▽ More

    Submitted 5 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  48. arXiv:2404.00905  [pdf, other

    physics.ins-det cond-mat.mes-hall cond-mat.other

    Continuously tunable uniaxial strain control of van der Waals heterostructure devices

    Authors: Zhaoyu Liu, Xuetao Ma, John Cenker, Jiaqi Cai, Zaiyao Fei, Paul Malinowski, Joshua Mutch, Yuzhou Zhao, Kyle Hwangbo, Zhong Lin, Arnab Manna, Jihui Yang, David Cobden, Xiaodong Xu, Matthew Yankowitz, Jiun-Haw Chu

    Abstract: Uniaxial strain has been widely used as a powerful tool for investigating and controlling the properties of quantum materials. However, existing strain techniques have so far mostly been limited to use with bulk crystals. Although recent progress has been made in extending the application of strain to two-dimensional van der Waals (vdW) heterostructures, these techniques have been limited to optic… ▽ More

    Submitted 23 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: 9 pages, 6 figures, to appear in Journal of Applied Physics

    Journal ref: J. Appl. Phys. 135, 204306 (2024)

  49. arXiv:2404.00269  [pdf, other

    cs.CV

    IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images

    Authors: Yushuang Wu, Luyue Shi, Junhao Cai, Weihao Yuan, Lingteng Qiu, Zilong Dong, Liefeng Bo, Shuguang Cui, Xiaoguang Han

    Abstract: Generalizable 3D object reconstruction from single-view RGB-D images remains a challenging task, particularly with real-world data. Current state-of-the-art methods develop Transformer-based implicit field learning, necessitating an intensive learning paradigm that requires dense query-supervision uniformly sampled throughout the entire space. We propose a novel approach, IPoD, which harmonizes im… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: CVPR 2024

  50. arXiv:2403.19902  [pdf, other

    cs.CV

    Heterogeneous Network Based Contrastive Learning Method for PolSAR Land Cover Classification

    Authors: Jianfeng Cai, Yue Ma, Zhixi Feng, Shuyuan Yang

    Abstract: Polarimetric synthetic aperture radar (PolSAR) image interpretation is widely used in various fields. Recently, deep learning has made significant progress in PolSAR image classification. Supervised learning (SL) requires a large amount of labeled PolSAR data with high quality to achieve better performance, however, manually labeled data is insufficient. This causes the SL to fail into overfitting… ▽ More

    Submitted 3 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.