Skip to main content

Showing 1–50 of 169 results for author: Fu, T

  1. arXiv:2407.08150  [pdf, other

    cs.CV

    Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding

    Authors: Minghui Wu, Chenxu Zhao, Anyang Su, Donglin Di, Tianyu Fu, Da An, Min He, Ya Gao, Meng Ma, Kun Yan, Ping Wang

    Abstract: Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders. There is currently a lack of research in this area, and most existing benchmarks suffer from several drawbacks: 1) a limited number of modalities and answers with restrictive length; 2) the content and scenarios within… ▽ More

    Submitted 16 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MULTIMEDIA 2024

  2. arXiv:2407.00631  [pdf, other

    cs.LG cs.AI

    TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets

    Authors: Jintai Chen, Yaojun Hu, Yue Wang, Yingzhou Lu, Xu Cao, Miao Lin, Hongxia Xu, Jian Wu, Cao Xiao, Jimeng Sun, Lucas Glass, Kexin Huang, Marinka Zitnik, Tianfan Fu

    Abstract: Clinical trials are pivotal for developing new medical treatments, yet they typically pose some risks such as patient mortality, adverse events, and enrollment failure that waste immense efforts spanning over a decade. Applying artificial intelligence (AI) to forecast or simulate key events in clinical trials holds great potential for providing insights to guide trial designs. However, complex dat… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  3. Bijective Enumeration and Sign-Imbalance for Permutation Depth and Excedances

    Authors: Sen-Peng Eu, Tung-Shan Fu, Yuan-Hsun Lo

    Abstract: We present a simplified variant of Biane's bijection between permutations and 3-colored Motzkin paths with weight that keeps track of the inversion number, excedance number and a statistic so-called depth of a permutation. This generalizes a result by Guay-Paquet and Petersen about a continued fraction of the generating function for depth on the permutations of n elements. In terms of weighted Mot… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: In Proceedings GASCom 2024, arXiv:2406.14588

    Journal ref: EPTCS 403, 2024, pp. 87-91

  4. arXiv:2406.16377  [pdf, other

    cs.CL cs.AI

    On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

    Authors: Deng Cai, Huayang Li, Tingchen Fu, Siheng Li, Weiwen Xu, Shuaiyi Li, Bowen Cao, Zhisong Zhang, Xinting Huang, Leyang Cui, Yan Wang, Lemao Liu, Taro Watanabe, Shuming Shi

    Abstract: Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.16087  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy

    Authors: Chen Wang, Kaiyi Ji, Junyi Geng, Zhongqiang Ren, Taimeng Fu, Fan Yang, Yifan Guo, Haonan He, Xiangyu Chen, Zitong Zhan, Qiwei Du, Shaoshu Su, Bowen Li, Yuheng Qiu, Yi Du, Qihang Li, Yifan Yang, Xiao Lin, Zhipeng Zhao

    Abstract: Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeS… ▽ More

    Submitted 6 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  6. arXiv:2406.14909  [pdf, other

    cs.LG cs.AI cs.CL

    MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

    Authors: Tianyu Fu, Haofeng Huang, Xuefei Ning, Genghan Zhang, Boju Chen, Tianqi Wu, Hongyi Wang, Zixiao Huang, Shiyao Li, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Sparse attention can effectively mitigate the significant memory and throughput demands of Large Language Models (LLMs) in long contexts. Existing methods typically employ a uniform sparse attention mask, applying the same sparse pattern across different attention heads and input lengths. However, this uniform approach fails to capture the diverse attention patterns inherent in LLMs, ignoring thei… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 10 pages

    ACM Class: I.2.7

  7. arXiv:2406.14629  [pdf, other

    cs.CL cs.AI

    Can LLMs Learn by Teaching? A Preliminary Study

    Authors: Xuefei Ning, Zifu Wang, Shiyao Li, Zinan Lin, Peiran Yao, Tianyu Fu, Matthew B. Blaschko, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Teaching to improve student models (e.g., knowledge distillation) is an extensively studied methodology in LLMs. However, for humans, teaching not only improves students but also improves teachers. We ask: Can LLMs also learn by teaching (LbT)? If yes, we can potentially unlock the possibility of continuously advancing the models without solely relying on human-produced data or stronger models. In… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review

  8. arXiv:2406.10263  [pdf, other

    cs.SE

    A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model

    Authors: Wenrui Zhang, Tiehang Fu, Ting Yuan, Ge Zhang, Dong Chen, Jie Wang

    Abstract: Recent advancements in Retrieval-Augmented Generation have significantly enhanced code completion at the repository level. Various RAG-based code completion systems are proposed based on different design choices. For instance, gaining more effectiveness at the cost of repeating the retrieval-generation process multiple times. However, the indiscriminate use of retrieval in current methods reveals… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  9. arXiv:2406.08656  [pdf, other

    cs.CV cs.AI cs.CL

    TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

    Authors: Weixi Feng, Jiachen Li, Michael Saxon, Tsu-jui Fu, Wenhu Chen, William Yang Wang

    Abstract: Video generation has many unique challenges beyond those of image generation. The temporal dimension introduces extensive possible variations across frames, over which consistency and continuity may be violated. In this study, we move beyond evaluating simple actions and argue that generated videos should incorporate the emergence of new concepts and their relation transitions like in real-world v… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  10. arXiv:2406.06045  [pdf, other

    cs.CV cs.AI

    Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training

    Authors: Ke Niu, Haiyang Yu, Xuelin Qian, Teng Fu, Bin Li, Xiangyang Xue

    Abstract: Existing person re-identification (Re-ID) methods principally deploy the ImageNet-1K dataset for model initialization, which inevitably results in sub-optimal situations due to the large domain gap. One of the key challenges is that building large-scale person Re-ID datasets is time-consuming. Some previous efforts address this problem by collecting person images from the internet e.g., LUPerson,… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  11. arXiv:2406.03474  [pdf, other

    cs.CV

    AD-H: Autonomous Driving with Hierarchical Agents

    Authors: Zaibin Zhang, Shiyu Tang, Yuanhang Zhang, Talas Fu, Yifan Wang, Yang Liu, Dong Wang, Jing Shao, Lijun Wang, Huchuan Lu

    Abstract: Due to the impressive capabilities of multimodal large language models (MLLMs), recent works have focused on employing MLLM-based agents for autonomous driving in large-scale and dynamic environments. However, prevalent approaches often directly translate high-level instructions into low-level vehicle control signals, which deviates from the inherent language generation paradigm of MLLMs and fails… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  12. arXiv:2406.03403  [pdf, other

    cs.LG cs.AI q-bio.QM

    Structure-based Drug Design Benchmark: Do 3D Methods Really Dominate?

    Authors: Kangyu Zheng, Yingzhou Lu, Zaixi Zhang, Zhongwei Wan, Yao Ma, Marinka Zitnik, Tianfan Fu

    Abstract: Currently, the field of structure-based drug design is dominated by three main types of algorithms: search-based algorithms, deep generative models, and reinforcement learning. While existing works have typically focused on comparing models within a single algorithmic category, cross-algorithm comparisons remain scarce. In this paper, to fill the gap, we establish a benchmark to evaluate the perfo… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  13. arXiv:2406.02059  [pdf, other

    cs.LG

    Graph Adversarial Diffusion Convolution

    Authors: Songtao Liu, Jinghui Chen, Tianfan Fu, Lu Lin, Marinka Zitnik, Dinghao Wu

    Abstract: This paper introduces a min-max optimization formulation for the Graph Signal Denoising (GSD) problem. In this formulation, we first maximize the second term of GSD by introducing perturbations to the graph structure based on Laplacian distance and then minimize the overall loss of the GSD. By solving the min-max optimization problem, we derive a new variant of the Graph Diffusion Convolution (GDC… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  14. arXiv:2405.18750  [pdf, other

    cs.CV

    T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

    Authors: Jiachen Li, Weixi Feng, Tsu-Jui Fu, Xinyi Wang, Sugato Basu, Wenhu Chen, William Yang Wang

    Abstract: Diffusion-based text-to-video (T2V) models have achieved significant success but continue to be hampered by the slow sampling speed of their iterative sampling processes. To address the challenge, consistency models have been proposed to facilitate fast inference, albeit at the cost of sample quality. In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achiev… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Project page: https://t2v-turbo.github.io/

  15. arXiv:2405.14213  [pdf, other

    cs.CV cs.CL

    From Text to Pixel: Advancing Long-Context Understanding in MLLMs

    Authors: Yujie Lu, Xiujun Li, Tsu-Jui Fu, Miguel Eckstein, William Yang Wang

    Abstract: The rapid progress in Multimodal Large Language Models (MLLMs) has significantly advanced their ability to process and understand complex visual and textual information. However, the integration of multiple images and extensive textual contexts remains a challenge due to the inherent limitation of the models' capacity to handle long input sequences efficiently. In this paper, we introduce SEEKER,… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  16. arXiv:2405.13432  [pdf, other

    cs.CL cs.AI

    Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

    Authors: Tingchen Fu, Deng Cai, Lemao Liu, Shuming Shi, Rui Yan

    Abstract: Supervised fine-tuning (SFT) on instruction-following corpus is a crucial approach toward the alignment of large language models (LLMs). However, the performance of LLMs on standard knowledge and reasoning benchmarks tends to suffer from deterioration at the latter stage of the SFT process, echoing the phenomenon of alignment tax. Through our pilot study, we put a hypothesis that the data biases a… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted to the findings of ACL2024

  17. arXiv:2405.08979  [pdf, other

    cs.LG q-bio.MN q-bio.QM

    drGAT: Attention-Guided Gene Assessment of Drug Response Utilizing a Drug-Cell-Gene Heterogeneous Network

    Authors: Yoshitaka Inoue, Hunmin Lee, Tianfan Fu, Augustin Luna

    Abstract: Drug development is a lengthy process with a high failure rate. Increasingly, machine learning is utilized to facilitate the drug development processes. These models aim to enhance our understanding of drug characteristics, including their activity in biological contexts. However, a major challenge in drug response (DR) prediction is model interpretability as it aids in the validation of findings.… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  18. arXiv:2405.06662  [pdf, ps, other

    q-bio.BM cs.CL cs.LG

    Language Interaction Network for Clinical Trial Approval Estimation

    Authors: Chufan Gao, Tianfan Fu, Jimeng Sun

    Abstract: Clinical trial outcome prediction seeks to estimate the likelihood that a clinical trial will successfully reach its intended endpoint. This process predominantly involves the development of machine learning models that utilize a variety of data sources such as descriptions of the clinical trials, characteristics of the drug molecules, and specific disease conditions being targeted. Accurate predi… ▽ More

    Submitted 26 April, 2024; originally announced May 2024.

  19. arXiv:2404.14777  [pdf, other

    cs.CL cs.LG

    CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning

    Authors: Ling Yue, Tianfan Fu

    Abstract: Large Language Models (LLMs) and multi-agent systems have shown impressive capabilities in natural language tasks but face challenges in clinical trial applications, primarily due to limited access to external knowledge. Recognizing the potential of advanced clinical trial tools that aggregate and predict based on the latest medical data, we propose an integrated solution to enhance their accessib… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  20. arXiv:2404.14294  [pdf, other

    cs.CL cs.AI

    A Survey on Efficient Inference for Large Language Models

    Authors: Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang

    Abstract: Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within the field have been directed towards developing techniques aimed at enhancing the efficiency of LLM inference. This p… ▽ More

    Submitted 8 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  21. arXiv:2404.13235  [pdf, other

    cs.LG

    TrialDura: Hierarchical Attention Transformer for Interpretable Clinical Trial Duration Prediction

    Authors: Ling Yue, Jonathan Li, Md Zabirul Islam, Bolun Xia, Tianfan Fu, Jintai Chen

    Abstract: The clinical trial process, also known as drug development, is an indispensable step toward the development of new treatments. The major objective of interventional clinical trials is to assess the safety and effectiveness of drug-based treatment in treating certain diseases in the human body. However, clinical trials are lengthy, labor-intensive, and costly. The duration of a clinical trial is a… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  22. arXiv:2404.07973  [pdf, other

    cs.CV

    Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

    Authors: Haotian Zhang, Haoxuan You, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang

    Abstract: While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks. In this work, we unveil Ferret-v2, a significant upgrade to Ferret, with three key designs. (1) Any resolution grounding and… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Preprint. 14 pages, 4 figures

  23. arXiv:2404.02003  [pdf, other

    cs.LG

    AUTODIFF: Autoregressive Diffusion Modeling for Structure-based Drug Design

    Authors: Xinze Li, Penglei Wang, Tianfan Fu, Wenhao Gao, Chengtao Li, Leilei Shi, Junhong Liu

    Abstract: Structure-based drug design (SBDD), which aims to generate molecules that can bind tightly to the target protein, is an essential problem in drug discovery, and previous approaches have achieved initial success. However, most existing methods still suffer from invalid local structure or unrealistic conformation issues, which are mainly due to the poor leaning of bond angles or torsional angles. To… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  24. arXiv:2404.01596  [pdf, other

    cs.RO cs.AI

    PhysORD: A Neuro-Symbolic Approach for Physics-infused Motion Prediction in Off-road Driving

    Authors: Zhipeng Zhao, Bowen Li, Yi Du, Taimeng Fu, Chen Wang

    Abstract: Motion prediction is critical for autonomous off-road driving, however, it presents significantly more challenges than on-road driving because of the complex interaction between the vehicle and the terrain. Traditional physics-based approaches encounter difficulties in accurately modeling dynamic systems and external disturbance. In contrast, data-driven neural networks require extensive datasets… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  25. arXiv:2404.01273  [pdf, other

    cs.LG cs.CL stat.ME

    TWIN-GPT: Digital Twins for Clinical Trials via Large Language Model

    Authors: Yue Wang, Tianfan Fu, Yinlong Xu, Zihan Ma, Hongxia Xu, Yingzhou Lu, Bang Du, Honghao Gao, Jian Wu

    Abstract: Clinical trials are indispensable for medical research and the development of new treatments. However, clinical trials often involve thousands of participants and can span several years to complete, with a high probability of failure during the process. Recently, there has been a burgeoning interest in virtual clinical trials, which simulate real-world scenarios and hold the potential to significa… ▽ More

    Submitted 28 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  26. arXiv:2403.16420  [pdf

    physics.optics

    Real-Time Recognition of Vortex Beams Modes Through Random Diffusive at the Speed of Light

    Authors: Tong Fu, Gang Luo, Jia Cheng Li, Yuan Chao Geng, Xiao Dong Yuan

    Abstract: Optical vortex beam with orbital angular momentum (OAM) has great potential to increase the capacity of optical communication and information processing in classical and quantum regimes. Nevertheless, important challenges that influence the optical data transmission in free space is the existence of diffusers along the optical path, which causes inevitable information loss during the wave propagat… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  27. arXiv:2403.06877  [pdf, other

    cs.RO cs.CV

    SiLVR: Scalable Lidar-Visual Reconstruction with Neural Radiance Fields for Robotic Inspection

    Authors: Yifu Tao, Yash Bhalgat, Lanke Frank Tarimo Fu, Matias Mattamala, Nived Chebrolu, Maurice Fallon

    Abstract: We present a neural-field-based large-scale reconstruction system that fuses lidar and vision data to generate high-quality reconstructions that are geometrically accurate and capture photo-realistic textures. This system adapts the state-of-the-art neural radiance field (NeRF) representation to also incorporate lidar data which adds strong geometric constraints on the depth and surface normals. W… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted at ICRA 2024; Website: https://ori-drs.github.io/projects/silvr/

  28. arXiv:2402.14367  [pdf, other

    cs.LG cs.SI

    Representation Learning for Frequent Subgraph Mining

    Authors: Rex Ying, Tianyu Fu, Andrew Wang, Jiaxuan You, Yu Wang, Jure Leskovec

    Abstract: Identifying frequent subgraphs, also called network motifs, is crucial in analyzing and predicting properties of real-world networks. However, finding large commonly-occurring motifs remains a challenging problem not only due to its NP-hard subroutine of subgraph counting, but also the exponential growth of the number of possible subgraphs patterns. Here we present Subgraph Pattern Miner (SPMiner)… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Oral Presentation in The Graph Representation Learning and Beyond (GRL+) Workshop from The 37th International Conference on Ma- chine Learning, 2020

  29. arXiv:2402.13577  [pdf, other

    cs.CL

    BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models

    Authors: Xueliang Zhao, Xinting Huang, Tingchen Fu, Qintong Li, Shansan Gong, Lemao Liu, Wei Bi, Lingpeng Kong

    Abstract: Multimodal reasoning stands as a pivotal capability for large vision-language models (LVLMs). The integration with Domain-Specific Languages (DSL), offering precise visual representations, equips these models with the opportunity to execute more accurate reasoning in complex and professional domains. However, the vanilla Chain-of-Thought (CoT) prompting method faces challenges in effectively lever… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Preprint

  30. arXiv:2402.06512  [pdf, other

    cs.LG cs.CL

    Multimodal Clinical Trial Outcome Prediction with Large Language Models

    Authors: Wenhao Zheng, Dongsheng Peng, Hongxia Xu, Yun Li, Hongtu Zhu, Tianfan Fu, Huaxiu Yao

    Abstract: The clinical trial is a pivotal and costly process, often spanning multiple years and requiring substantial financial resources. Therefore, the development of clinical trial outcome prediction models aims to exclude drugs likely to fail and holds the potential for significant cost savings. Recent data-driven attempts leverage deep learning methods to integrate multimodal data for predicting clinic… ▽ More

    Submitted 8 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

  31. arXiv:2402.03041  [pdf, other

    cs.NI

    Demystifying Datapath Accelerator Enhanced Off-path SmartNIC

    Authors: Xuzheng Chen, Jie Zhang, Ting Fu, Yifan Shen, Shu Ma, Kun Qian, Lingjun Zhu, Chao Shi, Yin Zhang, Ming Liu, Zeke Wang

    Abstract: Network speeds grow quickly in the modern cloud, so SmartNICs are introduced to offload network processing tasks, even application logic. However, typical multicore SmartNICs such as BlueFiled-2 are only capable of processing control-plane tasks with their embedded CPU that has limited memory bandwidth and computing power. On the other hand, hot cloud applications evolve, such that a limited numbe… ▽ More

    Submitted 23 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    MSC Class: 68M10 ACM Class: C.2.1

  32. arXiv:2401.03868  [pdf, other

    cs.AR cs.AI

    FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs

    Authors: Shulin Zeng, Jun Liu, Guohao Dai, Xinhao Yang, Tianyu Fu, Hongyi Wang, Wenheng Ma, Hanbo Sun, Shiyao Li, Zixiao Huang, Yadong Dai, Jintao Li, Zehao Wang, Ruoyu Zhang, Kairui Wen, Xuefei Ning, Yu Wang

    Abstract: Transformer-based Large Language Models (LLMs) have made a significant impact on various domains. However, LLMs' efficiency suffers from both heavy computation and memory overheads. Compression techniques like sparsification and quantization are commonly used to mitigate the gap between LLM's computation/memory overheads and hardware capacity. However, existing GPU and transformer-based accelerato… ▽ More

    Submitted 9 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted to FPGA'24

  33. arXiv:2401.03482  [pdf, other

    cs.LG stat.ML

    Uncertainty Quantification on Clinical Trial Outcome Prediction

    Authors: Tianyi Chen, Yingzhou Lu, Nan Hao, Capucine Van Rechem, Jintai Chen, Tianfan Fu

    Abstract: The importance of uncertainty quantification is increasingly recognized in the diverse field of machine learning. Accurately assessing model prediction uncertainty can help provide deeper understanding and confidence for researchers and practitioners. This is especially critical in medical diagnosis and drug discovery areas, where reliable predictions directly impact research quality and patient h… ▽ More

    Submitted 18 June, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

  34. arXiv:2312.14249  [pdf, other

    q-bio.GN cs.LG

    GenoCraft: A Comprehensive, User-Friendly Web-Based Platform for High-Throughput Omics Data Analysis and Visualization

    Authors: Yingzhou Lu, Minjie Shen, Yue Zhao, Chenhao Li, Fan Meng, Xiao Wang, David Herrington, Yue Wang, Tim Fu, Capucine Van Rechem

    Abstract: The surge in high-throughput omics data has reshaped the landscape of biological research, underlining the need for powerful, user-friendly data analysis and interpretation tools. This paper presents GenoCraft, a web-based comprehensive software solution designed to handle the entire pipeline of omics data processing. GenoCraft offers a unified platform featuring advanced bioinformatics tools, cov… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  35. arXiv:2312.13289  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Stoichiometry Representation Learning with Polymorphic Crystal Structures

    Authors: Namkyeong Lee, Heewoong Noh, Gyoung S. Na, Tianfan Fu, Jimeng Sun, Chanyoung Park

    Abstract: Despite the recent success of machine learning (ML) in materials science, its success heavily relies on the structural description of crystal, which is itself computationally demanding and occasionally unattainable. Stoichiometry descriptors can be an alternative approach, which reveals the ratio between elements involved to form a certain compound without any structural information. However, it i… ▽ More

    Submitted 17 November, 2023; originally announced December 2023.

    Comments: NeurIPS 2023 AI4Science Workshop

  36. arXiv:2312.11771  [pdf, other

    physics.optics

    Near-field Spin Chern Number Quantized by Real-space Topology of Optical Structures

    Authors: Tong Fu, Ruo-Yang Zhang, Shiqi Jia, C. T. Chan, Shubo Wang

    Abstract: The Chern number has been widely used to describe the topological properties of periodic structures in the momentum space. Here, we introduce a real-space spin Chern number for the optical near fields of finite-sized structures. This new spin Chern number is intrinsically quantized and equal to the structure's Euler characteristic. The relationship is robust against continuous deformation of the s… ▽ More

    Submitted 1 May, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: 9 pages, 6 figures

  37. arXiv:2312.07351  [pdf, ps, other

    math.CO

    On $q$-Counting of Noncrossing Chains and Parking Functions

    Authors: Yen-Jen Cheng, Sen-Peng Eu, Tung-Shan Fu, Jyun-Cheng Yao

    Abstract: For a finite Coxeter group $W$, Josuat-Vergès derived a $q$-polynomial counting the maximal chains in the lattice of noncrossing partitions of $W$ by weighting some of the covering relations, which we call bad edges, in these chains with a parameter $q$. We study the connection of these weighted chains with parking functions of type $A$ ($B$, respectively) from the perspective of the $q$-polynomia… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: 32 pages, to be published in SIDMA

    MSC Class: 05A19; 05E16; 20F55

  38. arXiv:2310.05365  [pdf, other

    cs.LG cs.AI

    Molecular De Novo Design through Transformer-based Reinforcement Learning

    Authors: Pengcheng Xu, Tao Feng, Tianfan Fu, Siddhartha Laghuvarapu, Jimeng Sun

    Abstract: In this work, we introduce a method to fine-tune a Transformer-based generative model for molecular de novo design. Leveraging the superior sequence learning capacity of Transformers over Recurrent Neural Networks (RNNs), our model can generate molecular structures with desired properties effectively. In contrast to the traditional RNN-based models, our proposed method exhibits superior performanc… ▽ More

    Submitted 8 March, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

  39. arXiv:2309.17102  [pdf, other

    cs.CV

    Guiding Instruction-based Image Editing via Multimodal Large Language Models

    Authors: Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan

    Abstract: Instruction-based image editing improves the controllability and flexibility of image manipulation via natural commands without elaborate descriptions or regional masks. However, human instructions are sometimes too brief for current methods to capture and follow. Multimodal large language models (MLLMs) show promising capabilities in cross-modal understanding and visual-aware response generation… ▽ More

    Submitted 5 February, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: ICLR'24 (Spotlight) ; Project at https://mllm-ie.github.io ; Code at https://github.com/tsujuifu/pytorch_mgie

  40. arXiv:2309.13035  [pdf, other

    cs.RO

    PyPose v0.6: The Imperative Programming Interface for Robotics

    Authors: Zitong Zhan, Xiangfu Li, Qihang Li, Haonan He, Abhinav Pandey, Haitao Xiao, Yangmengfei Xu, Xiangyu Chen, Kuan Xu, Kun Cao, Zhipeng Zhao, Zihan Wang, Huan Xu, Zihang Fang, Yutian Chen, Wentao Wang, Xu Fang, Yi Du, Tianhao Wu, Xiao Lin, Yuheng Qiu, Fan Yang, Jingnan Shi, Shaoshu Su, Yiren Lu , et al. (11 additional authors not shown)

    Abstract: PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, inco… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  41. arXiv:2309.04682  [pdf, other

    cs.CV

    DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions

    Authors: Teng Fu, Xiaocong Wang, Haiyang Yu, Ke Niu, Bin Li, Xiangyang Xue

    Abstract: Multiple object tracking (MOT) tends to become more challenging when severe occlusions occur. In this paper, we analyze the limitations of traditional Convolutional Neural Network-based methods and Transformer-based methods in handling occlusions and propose DNMOT, an end-to-end trainable DeNoising Transformer for MOT. To address the challenge of occlusions, we explicitly simulate the scenarios wh… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

    Comments: ACM Multimedia 2023

  42. arXiv:2309.01219  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

    Authors: Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, Shuming Shi

    Abstract: While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge… ▽ More

    Submitted 24 September, 2023; v1 submitted 3 September, 2023; originally announced September 2023.

    Comments: work in progress; 32 pages

  43. DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting

    Authors: Tianyu Fu, Chiyue Wei, Yu Wang, Rex Ying

    Abstract: We introduce DeSCo, a scalable neural deep subgraph counting pipeline, designed to accurately predict both the count and occurrence position of queries on target graphs post single training. Firstly, DeSCo uses a novel canonical partition and divides the large target graph into small neighborhood graphs, greatly reducing the count variation while guaranteeing no missing or double-counting. Secondl… ▽ More

    Submitted 19 December, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: 8 pages main text, 2 pages references, 11 pages appendix; open source at https://github.com/fuvty/DeSCo

    ACM Class: I.2.8

    Journal ref: WSDM'24, March 4-8, 2024, Merida, Mexico

  44. arXiv:2307.08423  [pdf, other

    cs.LG physics.comp-ph

    Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

    Authors: Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Haiyang Yu, YuQing Xie, Xiang Fu, Alex Strasser, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence , et al. (38 additional authors not shown)

    Abstract: Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Sc… ▽ More

    Submitted 15 November, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

  45. arXiv:2307.06082  [pdf, other

    cs.AI cs.CL cs.CV

    VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

    Authors: Raphael Schumann, Wanrong Zhu, Weixi Feng, Tsu-Jui Fu, Stefan Riezler, William Yang Wang

    Abstract: Incremental decision making in real-world environments is one of the most challenging tasks in embodied artificial intelligence. One particularly demanding scenario is Vision and Language Navigation~(VLN) which requires visual and natural language understanding as well as spatial and temporal reasoning capabilities. The embodied agent needs to ground its understanding of navigation instructions in… ▽ More

    Submitted 24 January, 2024; v1 submitted 12 July, 2023; originally announced July 2023.

    Comments: Accepted at AAAI 2024

  46. arXiv:2306.10363  [pdf, ps, other

    physics.optics

    Broadband and large-area optical chirality generated by an achiral metasurface under achiral excitation

    Authors: Shiqi Jia, Tong Fu, Jie Peng, Shubo Wang

    Abstract: Optical chirality plays an essential role in chiral light-matter interactions with broad applications in sensing and spectroscopy. Conventional methods of generating optical chirality usually employ chiral structures or chiral excitations. Here, we propose to use an achiral metasurface consisting of gold disk array excited by a linearly polarized light to generate optical chirality. Using full-wav… ▽ More

    Submitted 17 June, 2023; originally announced June 2023.

    Comments: 6 pages, 5 figures

  47. arXiv:2306.08975  [pdf

    physics.optics

    Transverse spin-orbit interaction of light

    Authors: Tong Fu, Jiaxin Lin, Yuhao Xu, Junji Jia, Yonglong Wang, Shunping Zhang, Hongxing Xu

    Abstract: Light carries both longitudinal and transverse spin angular momentum. The spin can couple with its orbital counterpart via the Berry phase, known as the spin-orbit interaction (SOI) of light. The SOI of light discovered previously belongs to the longitudinal one, which relies on the Berry phase in momentum space, such as the optical Magnus effect and the spin Hall effect. Here, we show that transv… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  48. iSLAM: Imperative SLAM

    Authors: Taimeng Fu, Shaoshu Su, Yiren Lu, Chen Wang

    Abstract: Simultaneous Localization and Mapping (SLAM) stands as one of the critical challenges in robot navigation. A SLAM system often consists of a front-end component for motion estimation and a back-end system for eliminating estimation drifts. Recent advancements suggest that data-driven methods are highly effective for front-end tasks, while geometry-based methods continue to be essential in the back… ▽ More

    Submitted 21 March, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: The paper has been accepted by IEEE Robotics and Automation Letters (RA-L)

  49. arXiv:2306.04802  [pdf, other

    cs.AI cs.CL cs.LG cs.SI

    A Review on Knowledge Graphs for Healthcare: Resources, Applications, and Promises

    Authors: Hejie Cui, Jiaying Lu, Shiyu Wang, Ran Xu, Wenjing Ma, Shaojun Yu, Yue Yu, Xuan Kan, Chen Ling, Tianfan Fu, Liang Zhao, Joyce Ho, Fei Wang, Carl Yang

    Abstract: Healthcare knowledge graphs (HKGs) are valuable tools for organizing biomedical concepts and their relationships with interpretable structures. The recent advent of large language models (LLMs) has paved the way for building more comprehensive and accurate HKGs. This, in turn, can improve the reliability of generated content and enable better evaluation of LLMs. However, the challenges of HKGs suc… ▽ More

    Submitted 19 February, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

  50. arXiv:2306.04018  [pdf, other

    cs.AI q-bio.QM

    PyTrial: Machine Learning Software and Benchmark for Clinical Trial Applications

    Authors: Zifeng Wang, Brandon Theodorou, Tianfan Fu, Cao Xiao, Jimeng Sun

    Abstract: Clinical trials are conducted to test the effectiveness and safety of potential drugs in humans for regulatory approval. Machine learning (ML) has recently emerged as a new tool to assist in clinical trials. Despite this progress, there have been few efforts to document and benchmark ML4Trial algorithms available to the ML research community. Additionally, the accessibility to clinical trial-relat… ▽ More

    Submitted 5 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.