Skip to main content

Showing 151–200 of 2,123 results for author: Gao, J

  1. arXiv:2403.09813  [pdf, other

    cs.CV cs.RO

    Towards Comprehensive Multimodal Perception: Introducing the Touch-Language-Vision Dataset

    Authors: Ning Cheng, You Li, Jing Gao, Bin Fang, Jinan Xu, Wenjuan Han

    Abstract: Tactility provides crucial support and enhancement for the perception and interaction capabilities of both humans and robots. Nevertheless, the multimodal research related to touch primarily focuses on visual and tactile modalities, with limited exploration in the domain of language. Beyond vocabulary, sentence-level descriptions contain richer semantics. Based on this, we construct a touch-langua… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by ICIC 2024

  2. arXiv:2403.08619  [pdf, other

    hep-ex astro-ph.HE

    Measurements of the charge ratio and polarization of cosmic-ray muons with the Super-Kamiokande detector

    Authors: H. Kitagawa, T. Tada, K. Abe, C. Bronner, Y. Hayato, K. Hiraide, K. Hosokawa, K. Ieki, M. Ikeda, J. Kameda, Y. Kanemura, R. Kaneshima, Y. Kashiwagi, Y. Kataoka, S. Miki, S. Mine, M. Miura, S. Moriyama, Y. Nakano, M. Nakahata, S. Nakayama, Y. Noguchi, K. Okamoto, K. Sato, H. Sekiya , et al. (231 additional authors not shown)

    Abstract: We present the results of the charge ratio ($R$) and polarization ($P^μ_{0}$) measurements using the decay electron events collected from 2008 September to 2022 June by the Super-Kamiokande detector. Because of its underground location and long operation, we performed high precision measurements by accumulating cosmic-ray muons. We measured the muon charge ratio to be $R=1.32 \pm 0.02$… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 29 pages, 45 figures

  3. arXiv:2403.08002  [pdf, other

    cs.CL cs.CV

    Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

    Authors: Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, Ziyi Yang, Hany Awadalla, Julia Gong, Houdong Hu, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Yu Gu, Cliff Wong, Mu Wei, Tristan Naumann, Muhao Chen, Matthew P. Lungren, Akshay Chaudhari, Serena Yeung-Levy, Curtis P. Langlotz , et al. (2 additional authors not shown)

    Abstract: The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant… ▽ More

    Submitted 26 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  4. arXiv:2403.07952  [pdf, other

    cs.CV cs.AI cs.MM

    AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production

    Authors: Jiuniu Wang, Zehua Du, Yuyuan Zhao, Bo Yuan, Kexiang Wang, Jian Liang, Yaxi Zhao, Yihen Lu, Gengliang Li, Junlong Gao, Xin Tu, Zhenyu Guo

    Abstract: The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. The system integrates multiple generative capabilities within a unified framework, so that individual… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 22 pages, 13 figures

  5. arXiv:2403.07796  [pdf, other

    physics.ins-det astro-ph.HE

    Second gadolinium loading to Super-Kamiokande

    Authors: K. Abe, C. Bronner, Y. Hayato, K. Hiraide, K. Hosokawa, K. Ieki, M. Ikeda, J. Kameda, Y. Kanemura, R. Kaneshima, Y. Kashiwagi, Y. Kataoka, S. Miki, S. Mine, M. Miura, S. Moriyama, Y. Nakano, M. Nakahata, S. Nakayama, Y. Noguchi, K. Sato, H. Sekiya, H. Shiba, K. Shimizu, M. Shiozawa , et al. (225 additional authors not shown)

    Abstract: The first loading of gadolinium (Gd) into Super-Kamiokande in 2020 was successful, and the neutron capture efficiency on Gd reached 50\%. To further increase the Gd neutron capture efficiency to 75\%, 26.1 tons of $\rm Gd_2(\rm SO_4)_3\cdot \rm 8H_2O$ was additionally loaded into Super-Kamiokande (SK) from May 31 to July 4, 2022. As the amount of loaded $\rm Gd_2(\rm SO_4)_3\cdot \rm 8H_2O$ was do… ▽ More

    Submitted 18 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: 34 pages, 13 figures, submitted to Nuclear Inst. and Methods in Physics Research, A

    Journal ref: Nuclear Inst. and Methods in Physics Research, A 1065 (2024) 169480

  6. arXiv:2403.06760  [pdf, other

    astro-ph.HE

    Performance of SK-Gd's Upgraded Real-time Supernova Monitoring System

    Authors: Y. Kashiwagi, K. Abe, C. Bronner, Y. Hayato, K. Hiraide, K. Hosokawa, K. Ieki, M. Ikeda, J. Kameda, Y. Kanemura, R. Kaneshima, Y. Kataoka, S. Miki, S. Mine, M. Miura, S. Moriyama, Y. Nakano, M. Nakahata, S. Nakayama, Y. Noguchi, K. Sato, H. Sekiya, H. Shiba, K. Shimizu, M. Shiozawa , et al. (214 additional authors not shown)

    Abstract: Among multi-messenger observations of the next galactic core-collapse supernova, Super-Kamiokande (SK) plays a critical role in detecting the emitted supernova neutrinos, determining the direction to the supernova (SN), and notifying the astronomical community of these observations in advance of the optical signal. On 2022, SK has increased the gadolinium dissolved in its water target (SK-Gd) and… ▽ More

    Submitted 13 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 38 pages, 29 figures, 6 tables

  7. arXiv:2403.06600  [pdf, other

    cs.CV

    BEV2PR: BEV-Enhanced Visual Place Recognition with Structural Cues

    Authors: Fudong Ge, Yiwei Zhang, Shuhan Shen, Yue Wang, Weiming Hu, Jin Gao

    Abstract: In this paper, we propose a new image-based visual place recognition (VPR) framework by exploiting the structural cues in bird's-eye view (BEV) from a single monocular camera. The motivation arises from two key observations about VPR: 1) For the methods based on both camera and LiDAR sensors, the integration of LiDAR in robotic systems has led to increased expenses, while the alignment of data bet… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  8. arXiv:2403.06421  [pdf, other

    cs.CV

    A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos

    Authors: Weixia Zhang, Chengguang Zhu, Jingnan Gao, Yichao Yan, Guangtao Zhai, Xiaokang Yang

    Abstract: The rapid advancement of Artificial Intelligence Generated Content (AIGC) technology has propelled audio-driven talking head generation, gaining considerable research attention for practical applications. However, performance evaluation research lags behind the development of talking head generation techniques. Existing literature relies on heuristic quantitative metrics without human validation,… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  9. arXiv:2403.06175  [pdf

    cond-mat.dis-nn cond-mat.soft

    Universal Origin of Glassy Relaxation as Recognized by Configuration Pattern-matching

    Authors: Hai-Bin Yu, Liang Gao, Jia-Qi Gao, Konrad Samwer

    Abstract: Relaxation processes are crucial in understanding the structural rearrangements of liquids and amorphous materials. However, the overarching principle that governs these processes across vastly different materials remains an open question. Substantial analysis has been carried out based on the motions of individual particles. Here, alternatively, we propose viewing the global configuration as a si… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 22 pages, 6 figures

    Journal ref: National Science Review 2024

  10. arXiv:2403.05110  [pdf, other

    cs.RO cs.AI cs.LG

    Efficient Data Collection for Robotic Manipulation via Compositional Generalization

    Authors: Jensen Gao, Annie Xie, Ted Xiao, Chelsea Finn, Dorsa Sadigh

    Abstract: Data collection has become an increasingly important problem in robotic manipulation, yet there still lacks much understanding of how to effectively collect data to facilitate broad generalization. Recent works on large-scale robotic data collection typically vary many environmental factors of variation (e.g., object types, table textures) during data collection, to cover a diverse range of scenar… ▽ More

    Submitted 21 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: RSS 2024

  11. arXiv:2403.04634  [pdf, other

    cs.CV cs.AI

    Pix2Gif: Motion-Guided Diffusion for GIF Generation

    Authors: Hitesh Kandala, Jianfeng Gao, Jianwei Yang

    Abstract: We present Pix2Gif, a motion-guided diffusion model for image-to-GIF (video) generation. We tackle this problem differently by formulating the task as an image translation problem steered by text and motion magnitude prompts, as shown in teaser fig. To ensure that the model adheres to motion guidance, we propose a new motion-guided warping module to spatially transform the features of the source i… ▽ More

    Submitted 8 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  12. arXiv:2403.04140  [pdf, other

    cs.AI

    Contrastive Augmented Graph2Graph Memory Interaction for Few Shot Continual Learning

    Authors: Biqing Qi, Junqi Gao, Xingquan Chen, Dong Li, Jianxing Liu, Ligang Wu, Bowen Zhou

    Abstract: Few-Shot Class-Incremental Learning (FSCIL) has gained considerable attention in recent years for its pivotal role in addressing continuously arriving classes. However, it encounters additional challenges. The scarcity of samples in new sessions intensifies overfitting, causing incompatibility between the output features of new and old classes, thereby escalating catastrophic forgetting. A prevale… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 12 Pages, 5 figures

  13. arXiv:2403.03493  [pdf, other

    cs.CV

    VastTrack: Vast Category Visual Object Tracking

    Authors: Liang Peng, Junyuan Gao, Xinran Liu, Weihong Li, Shaohua Dong, Zhipeng Zhang, Heng Fan, Libo Zhang

    Abstract: In this paper, we introduce a novel benchmark, dubbed VastTrack, towards facilitating the development of more general visual tracking via encompassing abundant classes and videos. VastTrack possesses several attractive properties: (1) Vast Object Category. In particular, it covers target objects from 2,115 classes, largely surpassing object categories of existing popular benchmarks (e.g., GOT-10k… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Tech. report

  14. arXiv:2403.03270  [pdf, other

    cs.RO

    Bi-KVIL: Keypoints-based Visual Imitation Learning of Bimanual Manipulation Tasks

    Authors: Jianfeng Gao, Xiaoshu Jin, Franziska Krebs, Noémie Jaquier, Tamim Asfour

    Abstract: Visual imitation learning has achieved impressive progress in learning unimanual manipulation tasks from a small set of visual observations, thanks to the latest advances in computer vision. However, learning bimanual coordination strategies and complex object relations from bimanual visual demonstrations, as well as generalizing them to categorical objects in novel cluttered scenes remain unsolve… ▽ More

    Submitted 22 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  15. arXiv:2403.02989  [pdf, other

    astro-ph.IM

    Developments on frequency domain multiplexing readout for large arrays of transition-edge sensor X-ray micro-calorimeters

    Authors: D. Vaccaro, H. Akamatsu, L. Gottardi, M. de Wit, M. P. Bruijn, J. van der Kuur, K. Nagayoshi, E. Taralli, K. Ravensberg, J. R. Gao, J. W. A. den Herder

    Abstract: At SRON we have been developing X-ray TES micro-calorimeters as backup technology for the X-ray Integral Field Unit (X-IFU) of the Athena mission, demonstrating excellent resolving powers both under DC and AC bias. We also developed a frequency-domain multiplexing (FDM) readout technology, where each TES is coupled to a superconducting band-pass LC resonator and AC biased at MHz frequencies throug… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Under publication in Journal of Low Temperature Physics

  16. arXiv:2403.02978  [pdf, other

    astro-ph.IM

    System performance of a TDM test-bed with long flex harness towards the new X-IFU FPA-DM

    Authors: D. Vaccaro, M. de Wit, J. van der Kuur, L. Gottardi, K. Ravensberg, E. Taralli, J. Adams, S. R. Bandler, J. A. Chervenak, W. B. Doriese, M. Durkin, C. Reintsema, K. Sakai, S. J. Smith, N. A. Wakeham, B. Jackson, P. Khosropanah, J. R. Gao, J. W. A. den Herder, P. Roelfsema

    Abstract: SRON (Netherlands Institute for Space Research) is developing the Focal Plane Assembly (FPA) for Athena X-IFU, whose Demonstration Model (DM) will use for the first time a time domain multiplexing (TDM)-based readout system for the on-board transition-edge sensors (TES). We report on the characterization activities on a TDM setup provided by NASA Goddard Space Flight Center (GSFC) and National Ins… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Under publication in Journal of Low Temperature Physics

  17. arXiv:2403.02628  [pdf, other

    cs.CV cs.LG

    Interactive Continual Learning: Fast and Slow Thinking

    Authors: Biqing Qi, Xingquan Chen, Junqi Gao, Dong Li, Jianxing Liu, Ligang Wu, Bowen Zhou

    Abstract: Advanced life forms, sustained by the synergistic interaction of neural cognitive mechanisms, continually acquire and transfer knowledge throughout their lifespan. In contrast, contemporary machine learning paradigms exhibit limitations in emulating the facets of continual learning (CL). Nonetheless, the emergence of large language models (LLMs) presents promising avenues for realizing CL via inte… ▽ More

    Submitted 18 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  18. arXiv:2403.01954  [pdf, other

    cs.CL cs.AI cs.LO

    DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation

    Authors: Chen Xu, Tian Lan, Changlong Yu, Wei Wang, Jun Gao, Yu Ji, Qunxi Dong, Kun Qian, Piji Li, Wei Bi, Bin Hu

    Abstract: Constrained decoding approaches aim to control the meaning or style of text generated by a Pre-trained Language Model (PLM) using specific target words during inference. However, these methods often guide plausible continuations by greedily selecting targets, which, while completing the task, may disrupt the natural patterns of human language generation. In this work, we propose a novel decoding f… ▽ More

    Submitted 7 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE TKDE (Major Revision), 13 pages, 6 figures

  19. arXiv:2403.01774  [pdf, other

    cs.CL

    WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations

    Authors: Haolin Deng, Chang Wang, Xin Li, Dezhang Yuan, Junlang Zhan, Tianhua Zhou, Jin Ma, Jun Gao, Ruifeng Xu

    Abstract: Enhancing the attribution in large language models (LLMs) is a crucial task. One feasible approach is to enable LLMs to cite external sources that support their generations. However, existing datasets and evaluation methods in this domain still exhibit notable limitations. In this work, we formulate the task of attributed query-focused summarization (AQFS) and present WebCiteS, a Chinese dataset f… ▽ More

    Submitted 28 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 20 pages, 7 figures, accepted to ACL 2024 main conference

  20. arXiv:2403.01686  [pdf, other

    astro-ph.HE astro-ph.GA

    AT2023lli: A Tidal Disruption Event with Prominent Optical Early Bump and Delayed Episodic X-ray Emission

    Authors: Shifeng Huang, Ning Jiang, Jiazheng Zhu, Yibo Wang, Tinggui Wang, Shan-Qin Wang, Wen-Pei Gan, En-Wei Liang, Yu-Jing Qin, Zheyu Lin, Lin-Na Xu, Min-Xuan Cai, Ji-An Jiang, Xu Kong, Jiaxun Li, Long Li, Jian-Guo Wang, Ze-Lin Xu, Yongquan Xue, Ye-Fei Yuan, Jingquan Cheng, Lulu Fan, Jie Gao, Lei Hu, Weida Hu , et al. (20 additional authors not shown)

    Abstract: High-cadence, multiwavelength observations have continuously revealed the diversity of tidal disruption events (TDEs), thus greatly advancing our knowledge and understanding of TDEs. In this work, we conducted an intensive optical-UV and X-ray follow-up campaign of TDE AT2023lli, and found a remarkable month-long bump in its UV/optical light curve nearly two months prior to maximum brightness. The… ▽ More

    Submitted 26 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: 14 pages, 8 figures,accepted for publication by ApJL

  21. arXiv:2403.01002  [pdf, other

    cs.CL cs.AI

    Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries

    Authors: Zelalem Gero, Chandan Singh, Yiqing Xie, Sheng Zhang, Tristan Naumann, Jianfeng Gao, Hoifung Poon

    Abstract: Summarizing clinical text is crucial in health decision-support and clinical research. Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation, especially in safety-critical domains such as health. Holistically evaluating text summaries is challenging because they may contain unsubstantiat… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 4 pages

  22. arXiv:2403.00833  [pdf, other

    cs.AI

    Position Paper: Agent AI Towards a Holistic Intelligence

    Authors: Qiuyuan Huang, Naoki Wake, Bidipta Sarkar, Zane Durante, Ran Gong, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Noboru Kuno, Ade Famoti, Ashley Llorens, John Langford, Hoi Vo, Li Fei-Fei, Katsu Ikeuchi, Jianfeng Gao

    Abstract: Recent advancements in large foundation models have remarkably enhanced our understanding of sensory information in open-world environments. In leveraging the power of foundation models, it is crucial for AI research to pivot away from excessive reductionism and toward an emphasis on systems that function as cohesive wholes. Specifically, we emphasize developing Agent AI -- an embodied system that… ▽ More

    Submitted 28 February, 2024; originally announced March 2024.

    Comments: 22 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2401.03568

  23. arXiv:2402.19245  [pdf, other

    quant-ph

    Feedback cooling a levitated nanoparticle's libration to below 100 phonons

    Authors: Jialiang Gao, Fons van der Laan, Joanna A. Zielinska, Andrei Militaru, Lukas Novotny, Martin Frimmer

    Abstract: Macroscopic rotors are interesting model systems to test quantum theory and for quantum sensing. A promising approach for bringing these systems to the quantum regime is to combine sensitive detection with feedback cooling to reduce the thermal occupation of the mechanics. Here, we implement a backward-scattering scheme to efficiently detect all three libration modes of an optically levitated nano… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  24. arXiv:2402.18871  [pdf, other

    eess.IV cs.CV

    LoLiSRFlow: Joint Single Image Low-light Enhancement and Super-resolution via Cross-scale Transformer-based Conditional Flow

    Authors: Ziyu Yue, Jiaxin Gao, Sihan Xie, Yang Liu, Zhixun Su

    Abstract: The visibility of real-world images is often limited by both low-light and low-resolution, however, these issues are only addressed in the literature through Low-Light Enhancement (LLE) and Super- Resolution (SR) methods. Admittedly, a simple cascade of these approaches cannot work harmoniously to cope well with the highly ill-posed problem for simultaneously enhancing visibility and resolution. I… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  25. arXiv:2402.17334  [pdf, other

    cs.IR cs.AI

    BiVRec: Bidirectional View-based Multimodal Sequential Recommendation

    Authors: Jiaxi Hu, Jingtong Gao, Xiangyu Zhao, Yuehong Hu, Yuxuan Liang, Yiqi Wang, Ming He, Zitao Liu, Hongzhi Yin

    Abstract: The integration of multimodal information into sequential recommender systems has attracted significant attention in recent research. In the initial stages of multimodal sequential recommendation models, the mainstream paradigm was ID-dominant recommendations, wherein multimodal information was fused as side information. However, due to their limitations in terms of transferability and information… ▽ More

    Submitted 4 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  26. arXiv:2402.17177  [pdf, other

    cs.CV cs.AI cs.LG

    Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

    Authors: Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, Lichao Sun

    Abstract: Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and show potential in simulating the physical world. Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model's background, related technologies, applications, re… ▽ More

    Submitted 17 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: 37 pages, 18 figures; GitHub: https://github.com/lichao-sun/SoraReview

  27. arXiv:2402.16397  [pdf, other

    cs.CR cs.AI

    Investigating Deep Watermark Security: An Adversarial Transferability Perspective

    Authors: Biqing Qi, Junqi Gao, Yiang Luo, Jianxing Liu, Ligang Wu, Bowen Zhou

    Abstract: The rise of generative neural networks has triggered an increased demand for intellectual property (IP) protection in generated content. Deep watermarking techniques, recognized for their flexibility in IP protection, have garnered significant attention. However, the surge in adversarial transferable attacks poses unprecedented challenges to the security of deep watermarking techniques-an area cur… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 18 pages, 8 figures

  28. arXiv:2402.15991  [pdf, other

    cs.CL

    $C^3$: Confidence Calibration Model Cascade for Inference-Efficient Cross-Lingual Natural Language Understanding

    Authors: Taixi Lu, Haoyu Wang, Huajie Shao, Jing Gao, Huaxiu Yao

    Abstract: Cross-lingual natural language understanding (NLU) is a critical task in natural language processing (NLP). Recent advancements have seen multilingual pre-trained language models (mPLMs) significantly enhance the performance of these tasks. However, mPLMs necessitate substantial resources and incur high computational costs during inference, posing challenges for deployment in real-world and real-t… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  29. arXiv:2402.15759  [pdf

    cs.CV cs.AI

    Increasing SAM Zero-Shot Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation

    Authors: Zekun Jiang, Dongjie Cheng, Ziyuan Qin, Jun Gao, Qicheng Lao, Kang Li, Le Zhang

    Abstract: This study develops and evaluates a novel multimodal medical image zero-shot segmentation algorithm named Text-Visual-Prompt SAM (TV-SAM) without any manual annotations. TV-SAM incorporates and integrates large language model GPT-4, Vision Language Model GLIP, and Segment Anything Model (SAM), to autonomously generate descriptive text prompts and visual bounding box prompts from medical images, th… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 12 pages, 4 figures, 4 tables

  30. arXiv:2402.14883  [pdf, other

    cs.CR cs.AI cs.LG

    Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning

    Authors: Shen Li, Liuyi Yao, Jinyang Gao, Lan Zhang, Yaliang Li

    Abstract: To support various applications, a prevalent and efficient approach for business owners is leveraging their valuable datasets to fine-tune a pre-trained LLM through the API provided by LLM owners or cloud servers. However, this process carries a substantial risk of model misuse, potentially resulting in severe economic consequences for business owners. Thus, safeguarding the copyright of these cus… ▽ More

    Submitted 5 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  31. arXiv:2402.14477  [pdf, other

    cond-mat.mtrl-sci cond-mat.str-el

    Pressure tunable magnetic skyrmion phase in Co8Zn8Mn4 single crystals

    Authors: Zhun Li, Xinrun Mi, Xinming Wang, Jian Lyu, Na Su, Aifeng Wang, Yisheng Chai, Bao Yuan, Wanju Luo, Hui Cheng, Jianxiang Gao, Hongliang Wang, Lijie Hao, Mingquan He, Junying Shen, Young Sun, Xin Tong

    Abstract: In a magnetic skyrmion phase, magnetic moments form vortex-like topological textures which are of both fundamental and industrial interests. In $β$-Mn-type Co-Zn-Mn alloys, chrial magnetic skyrmions emerge above room temperature, providing a unique system for studying the skrymion physics and exploring spintronics applications. However, the magnetic skyrmion phase is typically confined in a narrow… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 7 pages, 4 figures

  32. arXiv:2402.13520  [pdf, other

    physics.soc-ph physics.pop-ph

    Fractal scaling and the aesthetics of trees

    Authors: Jingyi Gao, Mitchell Newberry

    Abstract: Trees in works of art have stirred emotions in viewers for millennia. Leonardo da Vinci described geometric proportions in trees to provide both guidelines for painting and insights into tree form and function. Da Vinci's Rule of trees further implies fractal branching with a particular scaling exponent $α= 2$ governing both proportions between the diameters of adjoining boughs and the number of b… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  33. arXiv:2402.11905  [pdf, other

    cs.CL

    Learning to Edit: Aligning LLMs with Knowledge Editing

    Authors: Yuxin Jiang, Yufei Wang, Chuhan Wu, Wanjun Zhong, Xingshan Zeng, Jiahui Gao, Liangyou Li, Xin Jiang, Lifeng Shang, Ruiming Tang, Qun Liu, Wei Wang

    Abstract: Knowledge editing techniques, aiming to efficiently modify a minor proportion of knowledge in large language models (LLMs) without negatively impacting performance across other inputs, have garnered widespread attention. However, existing methods predominantly rely on memorizing the updated knowledge, impeding LLMs from effectively combining the new knowledge with their inherent knowledge when ans… ▽ More

    Submitted 5 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 17 pages, 8 figures, 9 tables. ACL 2024 main camera-ready version

  34. Characterization of NbTiN films with thicknesses below 20 nm for low power kinetic inductance amplifiers

    Authors: A. Giachero, M. R. Vissers, J. D. Wheeler, M. Malnou, J. E. Austermann, J. Hubmayr, A. Nucciotti, J. N. Ullom, J. Gao

    Abstract: A quantum-limited amplification chain is a fundamental advantage for any application that may benefit from the detection of very faint signals. Reading out arrays of superconducting detectors (TESs or MKIDs), resonant cavities, or qubits, calls for large bandwidth amplifiers in addition to having the lowest possible noise. At millikelvin temperatures, Kinetic Inductance Traveling-Wave Parametric A… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  35. arXiv:2402.11657  [pdf, other

    q-bio.PE q-bio.GN q-bio.QM

    On the importance of assessing topological convergence in Bayesian phylogenetic inference

    Authors: Marius Brusselmans, Luiz Max Carvalho, Samuel L. Hong, Jiansi Gao, Frederick A. Matsen IV, Andrew Rambaut, Philippe Lemey, Marc A. Suchard, Gytis Dudas, Guy Baele

    Abstract: Modern phylogenetics research is often performed within a Bayesian framework, using sampling algorithms such as Markov chain Monte Carlo (MCMC) to approximate the posterior distribution. These algorithms require careful evaluation of the quality of the generated samples. Within the field of phylogenetics, one frequently adopted diagnostic approach is to evaluate the effective sample size (ESS) and… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  36. arXiv:2402.11641  [pdf, other

    cs.LG

    Towards Versatile Graph Learning Approach: from the Perspective of Large Language Models

    Authors: Lanning Wei, Jun Gao, Huan Zhao, Quanming Yao

    Abstract: Graph-structured data are the commonly used and have wide application scenarios in the real world. For these diverse applications, the vast variety of learning tasks, graph domains, and complex graph learning procedures present challenges for human experts when designing versatile graph learning approaches. Facing these challenges, large language models (LLMs) offer a potential solution due to the… ▽ More

    Submitted 23 February, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  37. arXiv:2402.11430  [pdf, other

    cs.CL

    EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language Models

    Authors: Jun Gao, Huan Zhao, Wei Wang, Changlong Yu, Ruifeng Xu

    Abstract: In this study, we present EventRL, a reinforcement learning approach developed to enhance event extraction for large language models (LLMs). EventRL utilizes outcome supervision with specific reward functions to tackle prevalent challenges in LLMs, such as instruction following and hallucination, manifested as the mismatch of event structure and the generation of undefined event types. We evaluate… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  38. arXiv:2402.11314  [pdf, other

    cs.MA cs.AI

    Multi-Generative Agent Collective Decision-Making in Urban Planning: A Case Study for Kendall Square Renovation

    Authors: Jin Gao, Hanyong Xu, Luc Dao

    Abstract: In this study, we develop a multiple-generative agent system to simulate community decision-making for the redevelopment of Kendall Square's Volpe building. Drawing on interviews with local stakeholders, our simulations incorporated varying degrees of communication, demographic data, and life values in the agent prompts. The results revealed that communication among agents improved collective reas… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  39. arXiv:2402.11129  [pdf, other

    cs.CL

    BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering

    Authors: Haoyu Wang, Ruirui Li, Haoming Jiang, Jinjin Tian, Zhengyang Wang, Chen Luo, Xianfeng Tang, Monica Cheng, Tuo Zhao, Jing Gao

    Abstract: Retrieval-augmented Large Language Models (LLMs) offer substantial benefits in enhancing performance across knowledge-intensive scenarios. However, these methods often face challenges with complex inputs and encounter difficulties due to noisy knowledge retrieval, notably hindering model effectiveness. To address this issue, we introduce BlendFilter, a novel approach that elevates retrieval-augmen… ▽ More

    Submitted 11 July, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  40. arXiv:2402.07754  [pdf, other

    cs.CL cs.AI cs.LG

    Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models

    Authors: Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong

    Abstract: Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models. In this work, we propose Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought, a well-established technique for improving the reasoning ability of autoregressive language m… ▽ More

    Submitted 15 July, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Multiple updates (add boolean logic dataset, add DoT based on SEDD model and add detailed mathematical formulation in Appendix)

  41. arXiv:2402.07066  [pdf, other

    cs.CR cs.LG stat.ME

    Differentially Private Range Queries with Correlated Input Perturbation

    Authors: Prathamesh Dharangutte, Jie Gao, Ruobin Gong, Guanyang Wang

    Abstract: This work proposes a class of locally differentially private mechanisms for linear queries, in particular range queries, that leverages correlated input perturbation to simultaneously achieve unbiasedness, consistency, statistical transparency, and control over utility requirements in terms of accuracy targets expressed either in certain query margins or as implied by the hierarchical database str… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Comments: 26 pages, 8 figures

  42. arXiv:2402.06665  [pdf, other

    cs.AI cs.CL cs.LG cs.RO

    The Essential Role of Causality in Foundation World Models for Embodied AI

    Authors: Tarun Gupta, Wenbo Gong, Chao Ma, Nick Pawlowski, Agrin Hilmkil, Meyer Scetbon, Marc Rigter, Ade Famoti, Ashley Juan Llorens, Jianfeng Gao, Stefan Bauer, Danica Kragic, Bernhard Schölkopf, Cheng Zhang

    Abstract: Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for E… ▽ More

    Submitted 29 April, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  43. arXiv:2402.06656  [pdf, other

    q-fin.ST cs.AI cs.LG

    DiffsFormer: A Diffusion Transformer on Stock Factor Augmentation

    Authors: Yuan Gao, Haokun Chen, Xiang Wang, Zhicai Wang, Xue Wang, Jinyang Gao, Bolin Ding

    Abstract: Machine learning models have demonstrated remarkable efficacy and efficiency in a wide range of stock forecasting tasks. However, the inherent challenges of data scarcity, including low signal-to-noise ratio (SNR) and data homogeneity, pose significant obstacles to accurate forecasting. To address this issue, we propose a novel approach that utilizes artificial intelligence-generated samples (AIGS… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  44. arXiv:2402.06196  [pdf, other

    cs.CL cs.AI

    Large Language Models: A Survey

    Authors: Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao

    Abstract: Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffman… ▽ More

    Submitted 20 February, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2401.14423

  45. arXiv:2402.05929  [pdf, other

    cs.AI cs.LG cs.RO

    An Interactive Agent Foundation Model

    Authors: Zane Durante, Bidipta Sarkar, Ran Gong, Rohan Taori, Yusuke Noda, Paul Tang, Ehsan Adeli, Shrinidhi Kowshika Lakshmikanth, Kevin Schulman, Arnold Milstein, Demetri Terzopoulos, Ade Famoti, Noboru Kuno, Ashley Llorens, Hoi Vo, Katsu Ikeuchi, Li Fei-Fei, Jianfeng Gao, Naoki Wake, Qiuyuan Huang

    Abstract: The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks. Our training paradi… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  46. arXiv:2402.04672  [pdf, other

    cs.CV

    G-NAS: Generalizable Neural Architecture Search for Single Domain Generalization Object Detection

    Authors: Fan Wu, Jinling Gao, Lanqing Hong, Xinbing Wang, Chenghu Zhou, Nanyang Ye

    Abstract: In this paper, we focus on a realistic yet challenging task, Single Domain Generalization Object Detection (S-DGOD), where only one source domain's data can be used for training object detectors, but have to generalize multiple distinct target domains. In S-DGOD, both high-capacity fitting and generalization abilities are needed due to the task's complexity. Differentiable Neural Architecture Sear… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted by AAAI24

  47. arXiv:2402.03774  [pdf, other

    cs.LG cs.AI cs.CL

    Learning a Decision Tree Algorithm with Transformers

    Authors: Yufan Zhuang, Liyuan Liu, Chandan Singh, Jingbo Shang, Jianfeng Gao

    Abstract: Decision trees are renowned for their interpretability capability to achieve high predictive performance, especially on tabular data. Traditionally, they are constructed through recursive algorithms, where they partition the data at every node in a tree. However, identifying the best partition is challenging, as decision trees optimized for local segments may not bring global generalization. To ad… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  48. arXiv:2402.02419  [pdf

    physics.flu-dyn nlin.PS

    On the local streamline pattern of planar polynomial velocity field with nonzero linear part

    Authors: Jian Gao, Hongping Ma, Rong Wang, Wennan Zou

    Abstract: The streamline pattern of planar polynomial velocity field is far from fully understood. In the community of fluid mechanics, most studies simply focus on the velocity gradient, or the linear part of the velocity field, but few studies on high-order terms. This paper is concerned with the local streamline pattern (LSP) of velocity field around an isotropic point. In virtue of the concept and metho… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 20 pages, 4 figures, 4 tables

  49. arXiv:2402.02110  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    Composite Active Learning: Towards Multi-Domain Active Learning with Theoretical Guarantees

    Authors: Guang-Yuan Hao, Hengguan Huang, Haotian Wang, Jie Gao, Hao Wang

    Abstract: Active learning (AL) aims to improve model performance within a fixed labeling budget by choosing the most informative data points to label. Existing AL focuses on the single-domain setting, where all data come from the same domain (e.g., the same dataset). However, many real-world tasks often involve multiple domains. For example, in visual recognition, it is often desirable to train an image cla… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Journal ref: AAAI 2024

  50. arXiv:2402.01761  [pdf, other

    cs.CL cs.AI cs.LG

    Rethinking Interpretability in the Era of Large Language Models

    Authors: Chandan Singh, Jeevana Priya Inala, Michel Galley, Rich Caruana, Jianfeng Gao

    Abstract: Interpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks. Simultaneously, large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks, offering a chance to rethink opportunities in interpretable machine learning. Notably, the capability to explain in n… ▽ More

    Submitted 30 January, 2024; originally announced February 2024.

    Comments: 7 pages