Skip to main content

Showing 1–50 of 2,248 results for author: Chen, K

  1. arXiv:2407.11963  [pdf, other

    cs.CL

    NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

    Authors: Mo Li, Songyang Zhang, Yunxin Liu, Kai Chen

    Abstract: In evaluating the long-context capabilities of large language models (LLMs), identifying content relevant to a user's query from original long documents is a crucial prerequisite for any LLM to answer questions based on long text. We present NeedleBench, a framework consisting of a series of progressively more challenging tasks for assessing bilingual long-context capabilities, spanning multiple l… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.11691  [pdf, other

    cs.CV

    VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

    Authors: Haodong Duan, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, Yuan Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, Dahua Lin, Kai Chen

    Abstract: We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 70 different large multi-modality models, including both proprietary… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  3. arXiv:2407.10815  [pdf, other

    astro-ph.SR physics.plasm-ph physics.space-ph

    Evidence for the helicity barrier from measurements of the turbulence transition range in the solar wind

    Authors: J. R. McIntyre, C. H. K. Chen, J. Squire, R. Meyrand, P. A. Simon

    Abstract: The means by which the turbulent cascade of energy is dissipated in the solar wind, and in other astrophysical systems, is a major open question. It has recently been proposed that a barrier to the transfer of energy can develop at small scales, which can enable heating through ion-cyclotron resonance, under conditions applicable to regions of the solar wind. Such a scenario fundamentally diverges… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  4. arXiv:2407.10760  [pdf, other

    astro-ph.GA

    Little Red Dots: Rapidly Growing Black Holes Reddened by Extended Dusty Flows

    Authors: Zhengrong Li, Kohei Inayoshi, Kejian Chen, Kohei Ichikawa, Luis C. Ho

    Abstract: The James Webb Space Telescope (JWST) observations have revolutionized extragalactic research, particularly with the discovery of little red dots (LRD), which we propose are dust-reddened broad-line active galactic nuclei (AGNs). Their unique v-shape spectral feature observed through JWST/NIRCam challenges us to discern the relative contributions of the galaxy and AGN. We study a spectral energy d… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 14 pages, 6 figures

  5. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang , et al. (34 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  6. arXiv:2407.10499  [pdf, other

    cs.CL

    CIBench: Evaluating Your LLMs with a Code Interpreter Plugin

    Authors: Songyang Zhang, Chuyu Zhang, Yingfan Hu, Haowen Shen, Kuikun Liu, Zerun Ma, Fengzhe Zhou, Wenwei Zhang, Xuming He, Dahua Lin, Kai Chen

    Abstract: While LLM-Based agents, which use external tools to solve complex problems, have made significant progress, benchmarking their ability is challenging, thereby hindering a clear understanding of their limitations. In this paper, we propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. Our evaluation f… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Under review

  7. arXiv:2407.10062  [pdf, other

    cs.CV

    SpikeGS: 3D Gaussian Splatting from Spike Streams with High-Speed Camera Motion

    Authors: Jiyuan Zhang, Kang Chen, Shiyan Chen, Yajing Zheng, Tiejun Huang, Zhaofei Yu

    Abstract: Novel View Synthesis plays a crucial role by generating new 2D renderings from multi-view images of 3D scenes. However, capturing high-speed scenes with conventional cameras often leads to motion blur, hindering the effectiveness of 3D reconstruction. To address this challenge, high-frame-rate dense 3D reconstruction emerges as a vital technique, enabling detailed and accurate modeling of real-wor… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  8. arXiv:2407.09904  [pdf, other

    cs.LG

    Learning a Mini-batch Graph Transformer via Two-stage Interaction Augmentation

    Authors: Wenda Li, Kaixuan Chen, Shunyu Liu, Tongya Zheng, Wenjie Huang, Mingli Song

    Abstract: Mini-batch Graph Transformer (MGT), as an emerging graph learning model, has demonstrated significant advantages in semi-supervised node prediction tasks with improved computational efficiency and enhanced model robustness. However, existing methods for processing local information either rely on sampling or simple aggregation, which respectively result in the loss and squashing of critical neighb… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures, Accept by ECAI2024

  9. arXiv:2407.08713  [pdf, other

    cs.CL cs.AI

    GTA: A Benchmark for General Tool Agents

    Authors: Jize Wang, Zerun Ma, Yining Li, Songyang Zhang, Cailian Chen, Kai Chen, Xinyi Le

    Abstract: Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, fa… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Github repo: https://github.com/open-compass/GTA

  10. arXiv:2407.08701  [pdf, other

    cs.CV

    Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models

    Authors: Zhening Xing, Gereon Fox, Yanhong Zeng, Xingang Pan, Mohamed Elgharib, Christian Theobalt, Kai Chen

    Abstract: Large Language Models have shown remarkable efficacy in generating streaming data such as text and audio, thanks to their temporally uni-directional attention mechanism, which models correlations between the current token and previous tokens. However, video streaming remains much less explored, despite a growing need for live video processing. State-of-the-art video diffusion models leverage bi-di… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: https://live2diff.github.io/

  11. arXiv:2407.08617  [pdf, other

    quant-ph

    Quantum-Train Long Short-Term Memory: Application on Flood Prediction Problem

    Authors: Chu-Hsuan Abraham Lin, Chen-Yu Liu, Kuan-Cheng Chen

    Abstract: Flood prediction is a critical challenge in the context of climate change, with significant implications for ecosystem preservation, human safety, and infrastructure protection. In this study, we tackle this problem by applying the Quantum-Train (QT) technique to a forecasting Long Short-Term Memory (LSTM) model trained by Quantum Machine Learning (QML) with significant parameter reduction. The QT… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 6 pages, 4 figures

  12. arXiv:2407.08547  [pdf

    cond-mat.supr-con

    Necklace-like pattern of vortex bound states

    Authors: Zhiyong Hou, Kailun Chen, Wenshan Hong, Da Wang, Wen Duan, Huan Yang, Shiliang Li, Huiqian Luo, Qiang-Hua Wang, Tao Xiang, Hai-Hu Wen

    Abstract: Vortex is a topological defect in the superconducting condensate when a magnetic field is applied to a type-II superconductor, as elucidated by the Ginzburg-Landau theory. Due to the confinement of the quasiparticles by a vortex, it exhibits a circular shaped pattern of bound states with discrete energy levels, as predicted by the Caroli-de Gennes-Matricon theory in 1964. Here, however, we report… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 29 pages total; 16 pages of main text with 5 figures, 13 pages of supplementary materials with 10 figures

  13. arXiv:2407.08443  [pdf, other

    cs.CV

    Infinite Motion: Extended Motion Generation via Long Text Instructions

    Authors: Mengtian Li, Chengshuo Zhai, Shengxiang Yao, Zhifeng Xie, Keyu Chen, Yu-Gang Jiang

    Abstract: In the realm of motion generation, the creation of long-duration, high-quality motion sequences remains a significant challenge. This paper presents our groundbreaking work on "Infinite Motion", a novel approach that leverages long text to extended motion generation, effectively bridging the gap between short and long-duration motion synthesis. Our core insight is the strategic extension and reass… ▽ More

    Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 12 pages,13 figures

  14. arXiv:2407.06190  [pdf, other

    cs.CV cs.LG cs.RO

    4D Contrastive Superflows are Dense 3D Representation Learners

    Authors: Xiang Xu, Lingdong Kong, Hui Shuai, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Qingshan Liu

    Abstract: In the realm of autonomous driving, accurate 3D perception is the foundation. However, developing such models relies on extensive human annotations -- a process that is both costly and labor-intensive. To address this challenge from a data representation learning perspective, we introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing spatiotempora… ▽ More

    Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; 36 pages, 11 figures, 11 tables; Code at https://github.com/Xiangxu-0103/SuperFlow

  15. arXiv:2407.06103  [pdf, other

    quant-ph

    QTRL: Toward Practical Quantum Reinforcement Learning via Quantum-Train

    Authors: Chen-Yu Liu, Chu-Hsuan Abraham Lin, Chao-Han Huck Yang, Kuan-Cheng Chen, Min-Hsiu Hsieh

    Abstract: Quantum reinforcement learning utilizes quantum layers to process information within a machine learning model. However, both pure and hybrid quantum reinforcement learning face challenges such as data encoding and the use of quantum computers during the inference stage. We apply the Quantum-Train method to reinforcement learning tasks, called QTRL, training the classical policy network model using… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 6 pages, 1 figure

  16. arXiv:2407.05547  [pdf, other

    cs.CV

    LaSe-E2V: Towards Language-guided Semantic-Aware Event-to-Video Reconstruction

    Authors: Kanghao Chen, Hangyu Li, JiaZhou Zhou, Zeyu Wang, Lin Wang

    Abstract: Event cameras harness advantages such as low latency, high temporal resolution, and high dynamic range (HDR), compared to standard cameras. Due to the distinct imaging paradigm shift, a dominant line of research focuses on event-to-video (E2V) reconstruction to bridge event-based and standard computer vision. However, this task remains challenging due to its inherently ill-posed nature: event came… ▽ More

    Submitted 12 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.04324 by other authors

  17. arXiv:2407.05484  [pdf, ps, other

    cs.LG cs.GT

    Learning to Price Homogeneous Data

    Authors: Keran Chen, Joon Suk Huh, Kirthevasan Kandasamy

    Abstract: We study a data pricing problem, where a seller has access to $N$ homogeneous data points (e.g. drawn i.i.d. from some distribution). There are $m$ types of buyers in the market, where buyers of the same type $i$ have the same valuation curve $v_i:[N]\rightarrow [0,1]$, where $v_i(n)$ is the value for having $n$ data points. \textit{A priori}, the seller is unaware of the distribution of buyers, b… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  18. arXiv:2407.05361  [pdf, other

    eess.AS cs.CL

    Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

    Authors: Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, Zhizheng Wu

    Abstract: Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper present Emilia, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, the first op… ▽ More

    Submitted 12 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: Fix typos

  19. arXiv:2407.05313  [pdf, other

    math.AP

    Well-posedness for local and nonlocal quasilinear evolution equations in fluids and geometry

    Authors: Ke Chen, Ruilin Hu, Quoc-Hung Nguyen

    Abstract: We establish a Schauder-type estimate for general local and non-local linear parabolic system $$\partial_tu+\mathbf{L}_su=Λ^γf+g$$ in $(0,\infty)\times\mathbb{R}^d$ where $Λ=(-Δ)^{\frac{1}{2}}$, $0<γ\leq s$, $\mathbf{L}_s$ is the Pesudo-differential operator defined by \begin{equation} \mathbf{L}_su(t,x)=(2π)^{-\frac{d}{2}}\int_{\mathbb{R}^d}\mathsf{A}(t,x,ξ)\hat u(t,ξ)e^{ix\cdotξ}dξ,\quad\quad… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 184 pages, 1 figure

  20. arXiv:2407.04859  [pdf

    cs.CV cs.AI

    Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding

    Authors: Kenneth D. Forbus, Kezhen Chen, Wangcheng Xu, Madeline Usher

    Abstract: One of the purposes of perception is to bridge between sensors and conceptual understanding. Marr's Primal Sketch combined initial edge-finding with multiple downstream processes to capture aspects of visual perception such as grouping and stereopsis. Given the progress made in multiple areas of AI since then, we have developed a new framework inspired by Marr's work, the Hybrid Primal Sketch, whi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 16 pages, 6 figures

  21. arXiv:2407.04693  [pdf, other

    cs.CL cs.AI

    ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

    Authors: Yuzhe Gu, Ziwei Ji, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

    Abstract: Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications. Current hallucination detection and mitigation datasets are limited in domains and sizes, which struggle to scale due to prohibitive labor costs and insufficient reliability of existing hallucination annotators. To facilitate the scalable oversight of LLM hallucin… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 9 pages

  22. arXiv:2407.03320  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

    Authors: Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao , et al. (2 additional authors not shown)

    Abstract: We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. Th… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Technical Report. https://github.com/InternLM/InternLM-XComposer

  23. arXiv:2407.03133  [pdf, other

    cs.CY cs.AI cs.LG stat.ML

    Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness

    Authors: Yingfang Yuan, Kefan Chen, Mehdi Rizvi, Lynne Baillie, Wei Pang

    Abstract: The growing interest in fair AI development is evident. The ''Leave No One Behind'' initiative urges us to address multiple and intersecting forms of inequality in accessing services, resources, and opportunities, emphasising the significance of fairness in AI. This is particularly relevant as an increasing number of AI tools are applied to decision-making processes, such as resource allocation an… ▽ More

    Submitted 11 July, 2024; v1 submitted 24 May, 2024; originally announced July 2024.

  24. arXiv:2407.02791  [pdf, other

    cs.SE cs.AI

    Model-Enhanced LLM-Driven VUI Testing of VPA Apps

    Authors: Suwan Li, Lei Bu, Guangdong Bai, Fuman Xie, Kai Chen, Chang Yue

    Abstract: The flourishing ecosystem centered around voice personal assistants (VPA), such as Amazon Alexa, has led to the booming of VPA apps. The largest app market Amazon skills store, for example, hosts over 200,000 apps. Despite their popularity, the open nature of app release and the easy accessibility of apps also raise significant concerns regarding security, privacy and quality. Consequently, variou… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 13 pages, 11 figures

  25. Unveiling Global Interactive Patterns across Graphs: Towards Interpretable Graph Neural Networks

    Authors: Yuwen Wang, Shunyu Liu, Tongya Zheng, Kaixuan Chen, Mingli Song

    Abstract: Graph Neural Networks (GNNs) have emerged as a prominent framework for graph mining, leading to significant advances across various domains. Stemmed from the node-wise representations of GNNs, existing explanation studies have embraced the subgraph-specific viewpoint that attributes the decision results to the salient features and local structures of nodes. However, graph-level tasks necessitate l… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted in KDD2024

  26. arXiv:2407.01884  [pdf, other

    cs.CV cs.HC

    EIT-1M: One Million EEG-Image-Text Pairs for Human Visual-textual Recognition and More

    Authors: Xu Zheng, Ling Wang, Kanghao Chen, Yuanhuiyi Lyu, Jiazhou Zhou, Lin Wang

    Abstract: Recently, electroencephalography (EEG) signals have been actively incorporated to decode brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI. Accordingly, endeavors have been focused on building EEG-based datasets from visual or textual single-modal stimuli. However, these datasets offer limited EEG epochs per category, and the complex semantics of stimuli… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  27. arXiv:2407.01866  [pdf, other

    cs.CV cs.GR

    Image-GS: Content-Adaptive Image Representation via 2D Gaussians

    Authors: Yunxiang Zhang, Alexandr Kuznetsov, Akshay Jindal, Kenneth Chen, Anton Sochenov, Anton Kaplanyan, Qi Sun

    Abstract: Neural image representations have recently emerged as a promising technique for storing, streaming, and rendering visual data. Coupled with learning-based workflows, these novel representations have demonstrated remarkable visual fidelity and memory efficiency. However, existing neural image representations often rely on explicit uniform data structures without content adaptivity or computation-in… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  28. arXiv:2407.01534  [pdf, other

    cs.NI

    AIGC-Assisted Digital Watermark Services in Low-Earth Orbit Satellite-Terrestrial Edge Networks

    Authors: Kongyang Chen, Yikai Li, Wenjun Lan, Bing Mi, Shaowei Wang

    Abstract: Low Earth Orbit (LEO) satellite communication is a crucial component of future 6G communication networks, contributing to the development of an integrated satellite-terrestrial network. In the forthcoming satellite-to-ground network, the idle computational resources of LEO satellites can serve as edge servers, delivering intelligent task computation services to ground users. Existing research on s… ▽ More

    Submitted 8 March, 2024; originally announced July 2024.

  29. arXiv:2407.01525  [pdf, other

    cs.CV cs.AI cs.CL

    Empowering 3D Visual Grounding with Reasoning Capabilities

    Authors: Chenming Zhu, Tai Wang, Wenwei Zhang, Kai Chen, Xihui Liu

    Abstract: Although great progress has been made in 3D visual grounding, current models still rely on explicit textual descriptions for grounding and lack the ability to reason human intentions from implicit instructions. We propose a new task called 3D reasoning grounding and introduce a new benchmark ScanReason which provides over 10K question-answer-location pairs from five reasoning types that require th… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. A comprehensive and hierarchical 3D reasoning grounding benchmark in the era of foundation models. Project page: https://zcmax.github.io/projects/ScanReason

  30. arXiv:2407.01494  [pdf, other

    cs.CV cs.SD eess.AS

    FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

    Authors: Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, Kai Chen

    Abstract: We study Neural Foley, the automatic generation of high-quality sound effects synchronizing with videos, enabling an immersive audio-visual experience. Despite its wide range of applications, existing approaches encounter limitations when it comes to simultaneously synthesizing high-quality and video-aligned (i.e.,, semantic relevant and temporal synchronized) sounds. To overcome these limitations… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Project page: https://foleycrafter.github.io/

  31. arXiv:2407.01414  [pdf, other

    cs.CV

    StyleShot: A Snapshot on Any Style

    Authors: Junyao Gao, Yanchen Liu, Yanan Sun, Yinhao Tang, Yanhong Zeng, Kai Chen, Cairong Zhao

    Abstract: In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning. We achieve this through constructing a style-aware encoder and a well-organized style dataset called StyleGallery. With dedicated design for style learning, this style-aware encoder is trained to extract expressive style representation with decoupling training… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: project page:https://styleshot.github.io/

  32. arXiv:2407.01185  [pdf, ps, other

    hep-ph hep-ex

    From the $P^{N}_ψ$/$P^Λ_{ψs}$ to $\bar{T}^f_{cc}$: symmetry analysis to the interactions of the $(\bar{c}q)(\bar{c}q)$/$(ccq)(\bar{c}q)$/$(ccq)(ccq)$ di-hadron systems

    Authors: Kan Chen, Bo Wang

    Abstract: We investigate the interactions of the $(\bar{c}q)(\bar{c}q)$/$(ccq)(\bar{c}q)$/$(ccq)(ccq)$ di-hadron systems based on a contact lagrangian possessing the SU(3) flavor and SU(2) spin symmetries. Under the assumptions of two scenarios for the $J^P$ quantum numbers of the $P_ψ^N(4440)$ and $P_ψ^N(4457)$ states, we obtain the parameters ($\tilde{g}_s$, $\tilde{g}_a$) introduced from this contact lag… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 12 pages, 4 figures, 4 Tables

  33. arXiv:2407.01178  [pdf, other

    cs.CL cs.AI cs.LG

    $\text{Memory}^3$: Language Modeling with Explicit Memory

    Authors: Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, Weinan E

    Abstract: The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowled… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    MSC Class: 68T50 ACM Class: I.2.7

  34. arXiv:2407.00180  [pdf, other

    astro-ph.HE

    Super-Eddington Magnetized Neutron Star Accretion Flows: a Self-similar Analysis

    Authors: Ken Chen, Zi-Gao Dai

    Abstract: The properties of super-Eddington accretion disks exhibit substantial distinctions from the sub- Eddington ones. In this paper, we investigate the accretion process of a magnetized neutron star (NS) surrounded by a super-Eddington disk. By constructing self-similar solutions for the disk structure, we study in detail an interaction between the NS magnetosphere and the inner region of the disk, rev… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 14 pages, 7 figures, accepted for publication in ApJ

  35. arXiv:2407.00024  [pdf, other

    cs.CV cs.AI cs.MM

    LMVD: A Large-Scale Multimodal Vlog Dataset for Depression Detection in the Wild

    Authors: Lang He, Kai Chen, Junnan Zhao, Yimeng Wang, Ercheng Pei, Haifeng Chen, Jiewei Jiang, Shiqing Zhang, Jie Zhang, Zhongmin Wang, Tao He, Prayag Tiwari

    Abstract: Depression can significantly impact many aspects of an individual's life, including their personal and social functioning, academic and work performance, and overall quality of life. Many researchers within the field of affective computing are adopting deep learning technology to explore potential patterns related to the detection of depression. However, because of subjects' privacy protection con… ▽ More

    Submitted 8 May, 2024; originally announced July 2024.

  36. arXiv:2406.20085  [pdf, other

    cs.CV

    Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

    Authors: Yicheng Chen, Xiangtai Li, Yining Li, Yanhong Zeng, Jianzong Wu, Xiangyu Zhao, Kai Chen

    Abstract: Diffusion-based models have shown great potential in generating high-quality images with various layouts, which can benefit downstream perception tasks. However, a fully automatic layout generation driven only by language and a suitable metric for measuring multiple generated instances has not been well explored. In this work, we present Auto Cherry-Picker (ACP), a novel framework that generates h… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 19 pages, 7 figures

  37. arXiv:2406.18958  [pdf, other

    cs.CV

    AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation

    Authors: Yanan Sun, Yanchen Liu, Yinhao Tang, Wenjie Pei, Kai Chen

    Abstract: The field of text-to-image (T2I) generation has made significant progress in recent years, largely driven by advancements in diffusion models. Linguistic control enables effective content creation, but struggles with fine-grained control over image generation. This challenge has been explored, to a great extent, by incorporating additional user-supplied spatial conditions, such as depth maps and e… ▽ More

    Submitted 27 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  38. arXiv:2406.18842  [pdf

    cs.CY cs.AI cs.CL

    The global landscape of academic guidelines for generative AI and Large Language Models

    Authors: Junfeng Jiao, Saleh Afroogh, Kevin Chen, David Atkinson, Amit Dhurandhar

    Abstract: The integration of Generative Artificial Intelligence (GAI) and Large Language Models (LLMs) in academia has spurred a global discourse on their potential pedagogical benefits and ethical considerations. Positive reactions highlight some potential, such as collaborative creativity, increased access to education, and empowerment of trainers and trainees. However, negative reactions raise concerns a… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 May, 2024; originally announced June 2024.

  39. arXiv:2406.17770  [pdf, other

    cs.CV

    MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

    Authors: Xiangyu Zhao, Xiangtai Li, Haodong Duan, Haian Huang, Yining Li, Kai Chen, Hua Yang

    Abstract: Multi-modal large language models (MLLMs) have made significant strides in various visual understanding tasks. However, the majority of these models are constrained to process low-resolution images, which limits their effectiveness in perception tasks that necessitate detailed visual information. In our study, we present MG-LLaVA, an innovative MLLM that enhances the model's visual processing capa… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  40. arXiv:2406.17758  [pdf, other

    cs.CV

    MotionBooth: Motion-Aware Customized Text-to-Video Generation

    Authors: Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen

    Abstract: In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. By leveraging a few images of a specific object, we efficiently fine-tune a text-to-video model to capture the object's shape and attributes accurately. Our approach presents subject region loss and video preservation loss to enhance t… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Project page at https://jianzongwu.github.io/projects/motionbooth

  41. arXiv:2406.16978  [pdf, other

    cs.LG cs.AI cs.RO

    MetaFollower: Adaptable Personalized Autonomous Car Following

    Authors: Xianda Chen, Kehua Chen, Meixin Zhu, Hao, Yang, Shaojie Shen, Xuesong Wang, Yinhai Wang

    Abstract: Car-following (CF) modeling, a fundamental component in microscopic traffic simulation, has attracted increasing interest of researchers in the past decades. In this study, we propose an adaptable personalized car-following framework -MetaFollower, by leveraging the power of meta-learning. Specifically, we first utilize Model-Agnostic Meta-Learning (MAML) to extract common driving knowledge from v… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  42. arXiv:2406.16486  [pdf, other

    cs.AI

    Towards Comprehensive Preference Data Collection for Reward Modeling

    Authors: Yulan Hu, Qingyang Li, Sheng Ouyang, Ge Chen, Kaihui Chen, Lijun Mei, Xucheng Ye, Fuzheng Zhang, Yong Liu

    Abstract: Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models (LLMs) with human preferences, thereby enhancing the quality of responses generated. A critical component of RLHF is the reward model, which is trained on preference data and outputs a scalar reward during the inference stage. However, the collection of preference data still lacks thorough investig… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  43. arXiv:2406.16066  [pdf, other

    cs.CE

    Constructing Boundary-identical Microstructures by Guided Diffusion for Fast Multiscale Designs

    Authors: Jingxuan Feng, Lili Wang, Xiaoya Zhai, Kai Chen, Wenming Wu, Ligang Liu, Xiao-Ming Fu

    Abstract: We propose a novel method to construct large-scale boundary-identical microstructure datasets with high attribute coverage for highly efficient multiscale design. Central to our technique is using a deep generative model to generate microstructures under the two conditions, including the specified boundary and homogenized elastic tensor. We achieve the desired dataset by alternately adding microst… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  44. arXiv:2406.16012  [pdf

    eess.IV cs.CV

    Wound Tissue Segmentation in Diabetic Foot Ulcer Images Using Deep Learning: A Pilot Study

    Authors: Mrinal Kanti Dhar, Chuanbo Wang, Yash Patel, Taiyu Zhang, Jeffrey Niezgoda, Sandeep Gopalakrishnan, Keke Chen, Zeyun Yu

    Abstract: Identifying individual tissues, so-called tissue segmentation, in diabetic foot ulcer (DFU) images is a challenging task and little work has been published, largely due to the limited availability of a clinical image dataset. To address this gap, we have created a DFUTissue dataset for the research community to evaluate wound tissue segmentation algorithms. The dataset contains 110 images with tis… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  45. arXiv:2406.14887  [pdf, other

    cs.CL

    InternLM-Law: An Open Source Chinese Legal Large Language Model

    Authors: Zhiwei Fei, Songyang Zhang, Xiaoyu Shen, Dawei Zhu, Xiao Wang, Maosong Cao, Fengzhe Zhou, Yining Li, Wenwei Zhang, Dahua Lin, Kai Chen, Jidong Ge

    Abstract: While large language models (LLMs) have showcased impressive capabilities, they struggle with addressing legal queries due to the intricate complexities and specialized expertise required in the legal field. In this paper, we introduce InternLM-Law, a specialized LLM tailored for addressing diverse legal queries related to Chinese laws, spanning from responding to standard legal questions (e.g., l… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Our dataset, code and models will be released at https://github.com/InternLM/InternLM-Law

  46. arXiv:2406.14855  [pdf, other

    cs.CV cs.CR

    Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models

    Authors: Jie Ren, Kangrui Chen, Yingqian Cui, Shenglai Zeng, Hui Liu, Yue Xing, Jiliang Tang, Lingjuan Lyu

    Abstract: Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. However, the advancement of T2I diffusion models presents significant risks, as the models could be exploited for malicious purposes, such as generating images with violence or nudity, or creating unauthorized portraits of public figures in inappropriate context… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  47. arXiv:2406.14842  [pdf, other

    q-bio.GN cs.HC

    Online t-SNE for single-cell RNA-seq

    Authors: Hui Ma, Kai Chen

    Abstract: Due to the sequential sample arrival, changing experiment conditions, and evolution of knowledge, the demand to continually visualize evolving structures of sequential and diverse single-cell RNA-sequencing (scRNA-seq) data becomes indispensable. However, as one of the state-of-the-art visualization and analysis methods for scRNA-seq, t-distributed stochastic neighbor embedding (t-SNE) merely visu… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  48. arXiv:2406.14544  [pdf, other

    cs.CV cs.CL

    Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

    Authors: Yuxuan Qiao, Haodong Duan, Xinyu Fang, Junming Yang, Lin Chen, Songyang Zhang, Jiaqi Wang, Dahua Lin, Kai Chen

    Abstract: Vision Language Models (VLMs) demonstrate remarkable proficiency in addressing a wide array of visual questions, which requires strong perception and reasoning faculties. Assessing these two competencies independently is crucial for model refinement, despite the inherent difficulty due to the intertwined nature of seeing and reasoning in existing VLMs. To tackle this issue, we present Prism, an in… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  49. arXiv:2406.14515  [pdf, other

    cs.CV cs.MM

    MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

    Authors: Xinyu Fang, Kangrui Mao, Haodong Duan, Xiangyu Zhao, Yining Li, Dahua Lin, Kai Chen

    Abstract: The advent of large vision-language models (LVLMs) has spurred research into their applications in multi-modal contexts, particularly in video understanding. Traditional VideoQA benchmarks, despite providing quantitative metrics, often fail to encompass the full spectrum of video content and inadequately assess models' temporal comprehension. To address these limitations, we introduce MMBench-Vide… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  50. arXiv:2406.13317  [pdf, other

    cs.CV

    M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and Atmosphere

    Authors: Mengqiu Xu, Ming Wu, Kaixin Chen, Yixiang Huang, Mingrui Xu, Yujia Yang, Yiqing Feng, Yiying Guo, Bin Huang, Dongliang Chang, Zhenwei Shi, Chuang Zhang, Zhanyu Ma, Jun Guo

    Abstract: Marine fog poses a significant hazard to global shipping, necessitating effective detection and forecasting to reduce economic losses. In recent years, several machine learning (ML) methods have demonstrated superior detection accuracy compared to traditional meteorological methods. However, most of these works are developed on proprietary datasets, and the few publicly accessible datasets are oft… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.