Skip to main content

Showing 1–50 of 276 results for author: Yao, Z

  1. arXiv:2407.02762  [pdf, other

    cs.LG cs.AI

    SF-GNN: Self Filter for Message Lossless Propagation in Deep Graph Neural Network

    Authors: Yushan Zhu, Wen Zhang, Yajing Xu, Zhen Yao, Mingyang Chen, Huajun Chen

    Abstract: Graph Neural Network (GNN), with the main idea of encoding graph structure information of graphs by propagation and aggregation, has developed rapidly. It achieved excellent performance in representation learning of multiple types of graphs such as homogeneous graphs, heterogeneous graphs, and more complex graphs like knowledge graphs. However, merely stacking GNN layers may not improve the model'… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.02646  [pdf, other

    cs.AI cs.CL

    A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models

    Authors: Daking Rai, Yilun Zhou, Shi Feng, Abulhair Saparov, Ziyu Yao

    Abstract: Mechanistic interpretability (MI) is an emerging sub-field of interpretability that seeks to understand a neural network model by reverse-engineering its internal computations. Recently, MI has garnered significant attention for interpreting transformer-based language models (LMs), resulting in many novel insights yet introducing new challenges. However, there has not been work that comprehensivel… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 11 pages, 11 figures, Preprint

    ACM Class: I.2.7

  3. arXiv:2407.01953  [pdf, other

    cs.CE cs.AI cs.LG q-fin.CP

    CatMemo at the FinLLM Challenge Task: Fine-Tuning Large Language Models using Data Fusion in Financial Applications

    Authors: Yupeng Cao, Zhiyuan Yao, Zhi Chen, Zhiyang Deng

    Abstract: The integration of Large Language Models (LLMs) into financial analysis has garnered significant attention in the NLP community. This paper presents our solution to IJCAI-2024 FinLLM challenge, investigating the capabilities of LLMs within three critical areas of financial tasks: financial classification, financial text summarization, and single stock trading. We adopted Llama3-8B and Mistral-7B a… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  4. arXiv:2406.19227  [pdf, other

    cs.CL

    Aligning Teacher with Student Preferences for Tailored Training Data Generation

    Authors: Yantao Liu, Zhao Zhang, Zijun Yao, Shulin Cao, Lei Hou, Juanzi Li

    Abstract: Large Language Models (LLMs) have shown significant promise as copilots in various tasks. Local deployment of LLMs on edge devices is necessary when handling privacy-sensitive data or latency-sensitive tasks. The computational constraints of such devices make direct deployment of powerful large-scale LLMs impractical, necessitating the Knowledge Distillation from large-scale models to lightweight… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  5. arXiv:2406.19215  [pdf, other

    cs.CL

    SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation

    Authors: Zijun Yao, Weijian Qi, Liangming Pan, Shulin Cao, Linmei Hu, Weichuan Liu, Lei Hou, Juanzi Li

    Abstract: This paper introduces Self-aware Knowledge Retrieval (SeaKR), a novel adaptive RAG model that extracts self-aware uncertainty of LLMs from their internal states. SeaKR activates retrieval when the LLMs present high self-aware uncertainty for generation. To effectively integrate retrieved knowledge snippets, SeaKR re-ranks them based on LLM's self-aware uncertainty to preserve the snippet that redu… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  6. arXiv:2406.17235  [pdf, other

    cs.CV cs.AI cs.DC

    Task-Agnostic Federated Learning

    Authors: Zhengtao Yao, Hong Nguyen, Ajitesh Srivastava, Jose Luis Ambite

    Abstract: In the realm of medical imaging, leveraging large-scale datasets from various institutions is crucial for developing precise deep learning models, yet privacy concerns frequently impede data sharing. federated learning (FL) emerges as a prominent solution for preserving privacy while facilitating collaborative learning. However, its application in real-world scenarios faces several obstacles, such… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  7. arXiv:2406.16972  [pdf, ps, other

    cs.LG cs.AI

    An Efficient NAS-based Approach for Handling Imbalanced Datasets

    Authors: Zhiwei Yao

    Abstract: Class imbalance is a common issue in real-world data distributions, negatively impacting the training of accurate classifiers. Traditional approaches to mitigate this problem fall into three main categories: class re-balancing, information transfer, and representation learning. This paper introduces a novel approach to enhance performance on long-tailed datasets by optimizing the backbone architec… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 7 pages,3 figures

  8. arXiv:2406.14144  [pdf, other

    cs.CL cs.AI cs.LG

    Finding Safety Neurons in Large Language Models

    Authors: Jianhui Chen, Xiaozhi Wang, Zijun Yao, Yushi Bai, Lei Hou, Juanzi Li

    Abstract: Large language models (LLMs) excel in various capabilities but also pose safety risks such as generating harmful content and misinformation, even after safety alignment. In this paper, we explore the inner mechanisms of safety alignment from the perspective of mechanistic interpretability, focusing on identifying and analyzing safety neurons within LLMs that are responsible for safety behaviors. W… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2406.13399  [pdf, other

    cs.AI

    VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework

    Authors: Zhi Yao, Zhiqing Tang, Jiong Lou, Ping Shen, Weijia Jia

    Abstract: The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substanti… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: to be published in IEEE ICWS 2024

  10. arXiv:2406.12288  [pdf, other

    cs.AI

    An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs

    Authors: Daking Rai, Ziyu Yao

    Abstract: Large language models (LLMs) have shown strong arithmetic reasoning capabilities when prompted with Chain-of-Thought (CoT) prompts. However, we have only a limited understanding of how they are processed by LLMs. To demystify it, prior work has primarily focused on ablating different components in the CoT prompt and empirically observing their resulting LLM performance change. Yet, the reason why… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 9 pages, 1 figure, to be published in ACL 2024

    ACM Class: I.2.7

  11. arXiv:2406.12000  [pdf, other

    cs.AI

    Look Further Ahead: Testing the Limits of GPT-4 in Path Planning

    Authors: Mohamed Aghzal, Erion Plaku, Ziyu Yao

    Abstract: Large Language Models (LLMs) have shown impressive capabilities across a wide variety of tasks. However, they still face challenges with long-horizon planning. To study this, we propose path planning tasks as a platform to evaluate LLMs' ability to navigate long trajectories under geometric constraints. Our proposed benchmark systematically tests path-planning skills in complex settings. Using thi… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted at the 2024 IEEE 20th International Conference on Automation Science and Engineering

  12. arXiv:2406.09205  [pdf, other

    cs.CL cs.AI

    ReadCtrl: Personalizing text generation with readability-controlled instruction learning

    Authors: Hieu Tran, Zonghai Yao, Lingxi Li, Hong Yu

    Abstract: Content generation conditioning on users's readability is an important application for personalization. In an era of large language models (LLMs), readability-controlled text generation based on LLMs has become increasingly important. This paper introduces a novel methodology called "Readability-Controlled Instruction Learning (ReadCtrl)," which aims to instruction-tune LLMs to tailor users' reada… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 9 pages

  13. arXiv:2406.04197  [pdf, other

    cs.CL

    DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning

    Authors: Shangqing Tu, Kejian Zhu, Yushi Bai, Zijun Yao, Lei Hou, Juanzi Li

    Abstract: The advancement of large language models (LLMs) relies on evaluation using public benchmarks, but data contamination can lead to overestimated performance. Previous researches focus on detecting contamination by determining whether the model has seen the exact same data during training. In this work, we argue that even training on data similar to benchmark data inflates performance on in-distribut… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 13 pages, 7 figures

  14. arXiv:2406.03324  [pdf, ps, other

    cs.LG

    UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning

    Authors: Yu Zhang, Rui Yu, Zhipeng Yao, Wenyuan Zhang, Jun Wang, Liming Zhang

    Abstract: The Mean Square Error (MSE) is commonly utilized to estimate the solution of the optimal value function in the vast majority of offline reinforcement learning (RL) models and has achieved outstanding performance. However, we find that its principle can lead to overestimation phenomenon for the value function. In this paper, we first theoretically analyze overestimation phenomenon led by MSE and pr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  15. arXiv:2406.00403  [pdf, other

    cs.LG cs.AI

    Dual-perspective Cross Contrastive Learning in Graph Transformers

    Authors: Zelin Yao, Chuang Liu, Xueqi Ma, Mukun Chen, Jia Wu, Xiantao Cai, Bo Du, Wenbin Hu

    Abstract: Graph contrastive learning (GCL) is a popular method for leaning graph representations by maximizing the consistency of features across augmented views. Traditional GCL methods utilize single-perspective i.e. data or model-perspective) augmentation to generate positive samples, restraining the diversity of positive samples. In addition, these positive samples may be unreliable due to uncontrollabl… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures, submitted to IEEE TKDE

  16. arXiv:2405.17837  [pdf, other

    cs.HC

    Enabling Generative Design Tools with LLM Agents for Building Novel Devices: A Case Study on Fluidic Computation Interfaces

    Authors: Qiuyu Lu, Jiawei Fang, Zhihao Yao, Yue Yang, Shiqing Lyu, Haipeng Mi, Lining Yao

    Abstract: In the field of Human-Computer Interaction (HCI), the development of interactive devices represents a significant area of focus. The advent of novel hardware and advanced fabrication techniques has underscored the demand for specialized design tools that democratize the prototyping process for such cutting-edge devices. While these tools simplify the process through parametric design and simulatio… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 25 pages, 12 figures

  17. arXiv:2405.17459  [pdf

    cs.LG cs.AI cs.CL cs.CV

    Integrating Medical Imaging and Clinical Reports Using Multimodal Deep Learning for Advanced Disease Analysis

    Authors: Ziyan Yao, Fei Lin, Sheng Chai, Weijie He, Lu Dai, Xinghui Fei

    Abstract: In this paper, an innovative multi-modal deep learning model is proposed to deeply integrate heterogeneous information from medical images and clinical reports. First, for medical images, convolutional neural networks were used to extract high-dimensional features and capture key visual information such as focal details, texture and spatial distribution. Secondly, for clinical report text, a two-w… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  18. arXiv:2405.15165  [pdf, other

    cs.CL cs.AI cs.SE

    A Solution-based LLM API-using Methodology for Academic Information Seeking

    Authors: Yuanchun Wang, Jifan Yu, Zijun Yao, Jing Zhang, Yuyang Xie, Shangqing Tu, Yiyang Fu, Youhe Feng, Jinkai Zhang, Jingyao Zhang, Bowen Huang, Yuanyao Li, Huihui Yuan, Lei Hou, Juanzi Li, Jie Tang

    Abstract: Applying large language models (LLMs) for academic API usage shows promise in reducing researchers' academic information seeking efforts. However, current LLM API-using methods struggle with complex API coupling commonly encountered in academic queries. To address this, we introduce SoAy, a solution-based LLM API-using methodology for academic information seeking. It uses code with a solution as t… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 22 pages, 13 figures

  19. arXiv:2405.10642  [pdf, other

    cs.LG

    Hi-GMAE: Hierarchical Graph Masked Autoencoders

    Authors: Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu, Shirui Pan, Bo Du

    Abstract: Graph Masked Autoencoders (GMAEs) have emerged as a notable self-supervised learning approach for graph-structured data. Existing GMAE models primarily focus on reconstructing node-level information, categorizing them as single-scale GMAEs. This methodology, while effective in certain contexts, tends to overlook the complex hierarchical structures inherent in many real-world graphs. For instance,… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures, 3 tables

  20. arXiv:2405.10250  [pdf, other

    cs.HC

    IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers

    Authors: Hao Yan, Thomas D. Latoza, Ziyu Yao

    Abstract: Large language models (LLMs) have exhibited a strong promise in automatically generating executable code from natural language descriptions, particularly with interactive features that allow users to engage in the code-generation process by instructing the LLM with iterative feedback. However, existing interaction paradigms often assume that users have expert knowledge to debug source code and are… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  21. arXiv:2405.05663  [pdf, other

    cs.CV

    RPBG: Towards Robust Neural Point-based Graphics in the Wild

    Authors: Qingtian Zhu, Zizhuang Wei, Zhongtian Zheng, Yifan Zhan, Zhuyu Yao, Jiawang Zhang, Kejian Wu, Yinqiang Zheng

    Abstract: Point-based representations have recently gained popularity in novel view synthesis, for their unique advantages, e.g., intuitive geometric representation, simple manipulation, and faster convergence. However, based on our observation, these point-based neural re-rendering methods are only expected to perform well under ideal conditions and suffer from noisy, patchy points and unbounded scenes, wh… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  22. arXiv:2405.03654  [pdf, other

    cs.CR cs.AI

    Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

    Authors: Shang Shang, Xinqiang Zhao, Zhongjiang Yao, Yepeng Yao, Liya Su, Zijing Fan, Xiaodan Zhang, Zhengwei Jiang

    Abstract: To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts.This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content securi… ▽ More

    Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  23. arXiv:2404.17749  [pdf, other

    cs.AI cs.CL

    UMass-BioNLP at MEDIQA-M3G 2024: DermPrompt -- A Systematic Exploration of Prompt Engineering with GPT-4V for Dermatological Diagnosis

    Authors: Parth Vashisht, Abhilasha Lodha, Mukta Maddipatla, Zonghai Yao, Avijit Mitra, Zhichao Yang, Junda Wang, Sunjae Kwon, Hong Yu

    Abstract: This paper presents our team's participation in the MEDIQA-ClinicalNLP2024 shared task B. We present a novel approach to diagnosing clinical dermatology cases by integrating large multimodal models, specifically leveraging the capabilities of GPT-4V under a retriever and a re-ranker framework. Our investigation reveals that GPT-4V, when used as a retrieval agent, can accurately retrieve the correc… ▽ More

    Submitted 8 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted at NAACL-ClinicalNLP workshop 2024

  24. arXiv:2404.17400  [pdf, other

    cs.CV cs.AI eess.IV

    Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement

    Authors: Zishu Yao, Guodong Fan, Jinfu Fan, Min Gan, C. L. Philip Chen

    Abstract: Low-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range corre… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 14 page

  25. arXiv:2404.15729  [pdf, other

    cs.LG

    Gradformer: Graph Transformer with Exponential Decay

    Authors: Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Shirui Pan, Wenbin Hu

    Abstract: Graph Transformers (GTs) have demonstrated their advantages across a wide range of tasks. However, the self-attention mechanism in GTs overlooks the graph's inductive biases, particularly biases related to structure, which are crucial for the graph tasks. Although some methods utilize positional encoding and attention bias to model inductive biases, their effectiveness is still suboptimal analytic… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 9 pages, 7 figures. Accepted by IJCAI 2024

  26. arXiv:2404.12773  [pdf, other

    cs.RO

    LayeredMAPF: a decomposition of MAPF instance without compromising solvability

    Authors: Zhuo Yao, Wei Wang

    Abstract: Generally, the calculation and memory space required for multi-agent path finding (MAPF) grows exponentially as the number of agents increases. This often results in some MAPF instances being unsolvable under limited computational resources and memory space, thereby limiting the application of MAPF in complex scenarios. Hence, we propose a decomposition approach for MAPF instances, which breaks do… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  27. arXiv:2404.11581  [pdf, other

    cs.AI cs.DB

    LLMTune: Accelerate Database Knob Tuning with Large Language Models

    Authors: Xinmei Huang, Haoyang Li, Jing Zhang, Xinxin Zhao, Zhiming Yao, Yiyan Li, Zhuohao Yu, Tieying Zhang, Hong Chen, Cuiping Li

    Abstract: Database knob tuning is a critical challenge in the database community, aiming to optimize knob values to enhance database performance for specific workloads. DBMS often feature hundreds of tunable knobs, posing a significant challenge for DBAs to recommend optimal configurations. Consequently, many machine learning-based tuning methods have been developed to automate this process. Despite the int… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  28. arXiv:2404.09729  [pdf

    eess.SP cs.IT cs.LG stat.ME

    Amplitude-Phase Fusion for Enhanced Electrocardiogram Morphological Analysis

    Authors: Shuaicong Hu, Yanan Wang, Jian Liu, Jingyu Lin, Shengmei Qin, Zhenning Nie, Zhifeng Yao, Wenjie Cai, Cuiwei Yang

    Abstract: Considering the variability of amplitude and phase patterns in electrocardiogram (ECG) signals due to cardiac activity and individual differences, existing entropy-based studies have not fully utilized these two patterns and lack integration. To address this gap, this paper proposes a novel fusion entropy metric, morphological ECG entropy (MEE) for the first time, specifically designed for ECG mor… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 16 pages, 12 figures

    ACM Class: I.5.2

  29. arXiv:2404.06742  [pdf, other

    cs.CL

    Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking

    Authors: Xiaokang Zhang, Zijun Yao, Jing Zhang, Kaifeng Yun, Jifan Yu, Juanzi Li, Jie Tang

    Abstract: Detecting non-factual content is a longstanding goal to increase the trustworthiness of large language models (LLMs) generations. Current factuality probes, trained using humanannotated labels, exhibit limited transferability to out-of-distribution content, while online selfconsistency checking imposes extensive computation burden due to the necessity of generating multiple outputs. This paper pro… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  30. arXiv:2404.06711  [pdf, other

    cs.CL cs.HC

    MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education

    Authors: Murong Yue, Wijdane Mifdal, Yixuan Zhang, Jennifer Suh, Ziyu Yao

    Abstract: Mathematical modeling (MM) is considered a fundamental skill for students in STEM disciplines. Practicing the MM skill is often the most effective when students can engage in group discussion and collaborative problem-solving. However, due to unevenly distributed teachers and educational resources needed to monitor such group activities, students do not always receive equal opportunities for this… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Work in progress

  31. arXiv:2404.03577  [pdf, other

    cs.CL

    Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models

    Authors: Yantao Liu, Zijun Yao, Xin Lv, Yuchen Fan, Shulin Cao, Jifan Yu, Lei Hou, Juanzi Li

    Abstract: Providing knowledge documents for large language models (LLMs) has emerged as a promising solution to update the static knowledge inherent in their parameters. However, knowledge in the document may conflict with the memory of LLMs due to outdated or incorrect knowledge in the LLMs' parameters. This leads to the necessity of examining the capability of LLMs to assimilate supplemental external know… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted by LREC-COLING 2024 as long paper

  32. arXiv:2404.03532  [pdf, other

    cs.CL

    Evaluating Generative Language Models in Information Extraction as Subjective Question Correction

    Authors: Yuchen Fan, Yantao Liu, Zijun Yao, Jifan Yu, Lei Hou, Juanzi Li

    Abstract: Modern Large Language Models (LLMs) have showcased remarkable prowess in various tasks necessitating sophisticated cognitive behaviors. Nevertheless, a paradoxical performance discrepancy is observed, where these models underperform in seemingly elementary tasks like relation extraction and event extraction due to two issues in conventional evaluation. (1) The imprecision of existing evaluation me… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted by LREC-COLING 2024, short paper

  33. arXiv:2404.03491  [pdf, other

    cs.CL cs.AI

    A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation

    Authors: Jifan Yu, Xiaohan Zhang, Yifan Xu, Xuanyu Lei, Zijun Yao, Jing Zhang, Lei Hou, Juanzi Li

    Abstract: Empowered by the large-scale pretrained language models, existing dialogue systems have demonstrated impressive performance conducting fluent and natural-sounding conversations. However, they are still plagued by the hallucination problem, causing unpredictable factual errors in the generated responses. Recently, knowledge-grounded dialogue generation models, that intentionally invoke external kno… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted by LREC-COLING 2024

  34. arXiv:2404.02699  [pdf, other

    cs.CL

    Scalable Model Editing via Customized Expert Networks

    Authors: Zihan Yao, Yu He, Tianyu Qi, Ming Li

    Abstract: Addressing the issue of hallucinations and outdated knowledge in large language models is critical for their reliable application. Model Editing presents a promising avenue for mitigating these challenges in a cost-effective manner. However, existing methods often suffer from unsatisfactory generalization and unintended effects on unrelated samples. To overcome these limitations, we introduce a no… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  35. arXiv:2403.19781  [pdf, other

    q-fin.TR cs.LG cs.MA

    Reinforcement Learning in Agent-Based Market Simulation: Unveiling Realistic Stylized Facts and Behavior

    Authors: Zhiyuan Yao, Zheng Li, Matthew Thomas, Ionut Florescu

    Abstract: Investors and regulators can greatly benefit from a realistic market simulator that enables them to anticipate the consequences of their decisions in real markets. However, traditional rule-based market simulators often fall short in accurately capturing the dynamic behavior of market participants, particularly in response to external market impact events or changes in the behavior of other partic… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accpeted in IJCNN 2024

  36. arXiv:2403.19318  [pdf, other

    cs.CL

    TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

    Authors: Xiaokang Zhang, Jing Zhang, Zeyao Ma, Yang Li, Bohan Zhang, Guanlin Li, Zijun Yao, Kangli Xu, Jinchang Zhou, Daniel Zhang-Li, Jifan Yu, Shu Zhao, Juanzi Li, Jie Tang

    Abstract: We introduce TableLLM, a robust large language model (LLM) with 13 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios. We propose a distant supervision method for training, which comprises a reasoning process extension strategy, aiding in training LLMs to un… ▽ More

    Submitted 1 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: https://tablellm.github.io/

  37. arXiv:2403.14870  [pdf, other

    cs.CV cs.CL cs.LG

    VidLA: Video-Language Alignment at Scale

    Authors: Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan, Son Tran, Benjamin Z. Yao, Belinda Zeng, Mubarak Shah, Trishul Chilimbi

    Abstract: In this paper, we propose VidLA, an approach for video-language alignment at scale. There are two major limitations of previous video-language alignment approaches. First, they do not capture both short-range and long-range temporal dependencies and typically employ complex hierarchical deep network architectures that are hard to integrate with existing pretrained image-text foundation models. To… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  38. arXiv:2403.14123  [pdf, other

    cs.LG cs.AR cs.DC

    AI and Memory Wall

    Authors: Amir Gholami, Zhewei Yao, Sehoon Kim, Coleman Hooper, Michael W. Mahoney, Kurt Keutzer

    Abstract: The availability of unprecedented unsupervised training data, along with neural scaling laws, has resulted in an unprecedented surge in model size and compute requirements for serving/training LLMs. However, the main performance bottleneck is increasingly shifting to memory bandwidth. Over the past 20 years, peak server hardware FLOPS has been scaling at 3.0x/2yrs, outpacing the growth of DRAM and… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Published in IEEE Micro Journal

  39. arXiv:2403.08170  [pdf, other

    cs.CV eess.IV

    Versatile Defense Against Adversarial Attacks on Image Recognition

    Authors: Haibo Zhang, Zhihua Yao, Kouichi Sakurai

    Abstract: Adversarial attacks present a significant security risk to image recognition tasks. Defending against these attacks in a real-life setting can be compared to the way antivirus software works, with a key consideration being how well the defense can adapt to new and evolving attacks. Another important factor is the resources involved in terms of time and cost for training defense models and updating… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  40. arXiv:2403.07747  [pdf, other

    cs.CL cs.AI

    FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models

    Authors: Yan Liu, Renren Jin, Lin Shi, Zheng Yao, Deyi Xiong

    Abstract: To thoroughly assess the mathematical reasoning abilities of Large Language Models (LLMs), we need to carefully curate evaluation datasets covering diverse mathematical concepts and mathematical problems at different difficulty levels. In pursuit of this objective, we propose FineMath in this paper, a fine-grained mathematical evaluation benchmark dataset for assessing Chinese LLMs. FineMath is cr… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  41. arXiv:2403.05845  [pdf, other

    cs.CL cs.AI

    Reverse That Number! Decoding Order Matters in Arithmetic Learning

    Authors: Daniel Zhang-Li, Nianyi Lin, Jifan Yu, Zheyuan Zhang, Zijun Yao, Xiaokang Zhang, Lei Hou, Jing Zhang, Juanzi Li

    Abstract: Recent advancements in pretraining have demonstrated that modern Large Language Models (LLMs) possess the capability to effectively learn arithmetic operations. However, despite acknowledging the significance of digit order in arithmetic computation, current methodologies predominantly rely on sequential, step-by-step approaches for teaching LLMs arithmetic, resulting in a conclusion where obtaini… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  42. arXiv:2403.04797  [pdf, other

    cs.CL cs.LG

    Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

    Authors: Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, Zhangyang Wang

    Abstract: This paper aims to overcome the "lost-in-the-middle" challenge of large language models (LLMs). While recent advancements have successfully enabled LLMs to perform stable language modeling with up to 4 million tokens, the persistent difficulty faced by most LLMs in identifying relevant information situated in the middle of the context has not been adequately tackled. To address this problem, this… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  43. arXiv:2403.03424  [pdf, other

    cs.IR

    Generative News Recommendation

    Authors: Shen Gao, Jiabao Fang, Quan Tu, Zhitao Yao, Zhumin Chen, Pengjie Ren, Zhaochun Ren

    Abstract: Most existing news recommendation methods tackle this task by conducting semantic matching between candidate news and user representation produced by historical clicked news. However, they overlook the high-level connections among different news articles and also ignore the profound relationship between these news articles and users. And the definition of these methods dictates that they can only… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted by WWW 2024

  44. arXiv:2403.03063  [pdf, other

    cs.CV

    CrackNex: a Few-shot Low-light Crack Segmentation Model Based on Retinex Theory for UAV Inspections

    Authors: Zhen Yao, Jiawei Xu, Shuhang Hou, Mooi Choo Chuah

    Abstract: Routine visual inspections of concrete structures are imperative for upholding the safety and integrity of critical infrastructure. Such visual inspections sometimes happen under low-light conditions, e.g., checking for bridge health. Crack segmentation under such conditions is challenging due to the poor contrast between cracks and their surroundings. However, most deep learning methods are desig… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 8 pages, 4 figures, IEEE International Conference on Robotics and Automation (ICRA) 2024

  45. arXiv:2402.17887  [pdf, other

    cs.CL cs.IR

    JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability

    Authors: Junda Wang, Zhichao Yang, Zonghai Yao, Hong Yu

    Abstract: Large Language Models (LLMs) have demonstrated a remarkable potential in medical knowledge acquisition and question-answering. However, LLMs can potentially hallucinate and yield factually incorrect outcomes, even with domain-specific pretraining. Previously, retrieval augmented generation (RAG) has limited success in addressing hallucinations. Unlike previous methods in RAG where the retrieval mo… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  46. arXiv:2402.17232  [pdf, other

    math.NA cs.LG physics.comp-ph

    Two-scale Neural Networks for Partial Differential Equations with Small Parameters

    Authors: Qiao Zhuang, Chris Ziyi Yao, Zhongqiang Zhang, George Em Karniadakis

    Abstract: We propose a two-scale neural network method for solving partial differential equations (PDEs) with small parameters using physics-informed neural networks (PINNs). We directly incorporate the small parameters into the architecture of neural networks. The proposed method enables solving PDEs with small parameters in a simple fashion, without adding Fourier features or other computationally taxing… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    MSC Class: 65N35; 35B25 ACM Class: I.2.6

  47. arXiv:2402.14241  [pdf, ps, other

    cs.CV cs.AI

    A Self-supervised Pressure Map human keypoint Detection Approch: Optimizing Generalization and Computational Efficiency Across Datasets

    Authors: Chengzhang Yu, Xianjun Yang, Wenxia Bao, Shaonan Wang, Zhiming Yao

    Abstract: In environments where RGB images are inadequate, pressure maps is a viable alternative, garnering scholarly attention. This study introduces a novel self-supervised pressure map keypoint detection (SPMKD) method, addressing the current gap in specialized designs for human keypoint extraction from pressure maps. Central to our contribution is the Encoder-Fuser-Decoder (EFD) model, which is a robust… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 5pages, 6figures

  48. arXiv:2402.13919  [pdf, other

    cs.CL cs.AI

    SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization

    Authors: Prakamya Mishra, Zonghai Yao, Parth Vashisht, Feiyun Ouyang, Beining Wang, Vidhi Dhaval Mody, Hong Yu

    Abstract: Large Language Models (LLMs) such as GPT & Llama have demonstrated significant achievements in summarization tasks but struggle with factual inaccuracies, a critical issue in clinical NLP applications where errors could lead to serious consequences. To counter the high costs and limited availability of expert-annotated data for factual alignment, this study introduces an innovative pipeline that u… ▽ More

    Submitted 18 April, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: Equal contribution for the first two authors

  49. arXiv:2402.13452  [pdf, other

    cs.SI cs.CL cs.LG

    LocalTweets to LocalHealth: A Mental Health Surveillance Framework Based on Twitter Data

    Authors: Vijeta Deshpande, Minhwa Lee, Zonghai Yao, Zihao Zhang, Jason Brian Gibbons, Hong Yu

    Abstract: Prior research on Twitter (now X) data has provided positive evidence of its utility in developing supplementary health surveillance systems. In this study, we present a new framework to surveil public health, focusing on mental health (MH) outcomes. We hypothesize that locally posted tweets are indicative of local MH outcomes and collect tweets posted from 765 neighborhoods (census block groups)… ▽ More

    Submitted 26 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Journal ref: LREC-COLING 2024

  50. arXiv:2402.12659  [pdf, other

    cs.CL cs.AI cs.CE

    FinBen: A Holistic Financial Benchmark for Large Language Models

    Authors: Qianqian Xie, Weiguang Han, Zhengyu Chen, Ruoyu Xiang, Xiao Zhang, Yueru He, Mengxi Xiao, Dong Li, Yongfu Dai, Duanyu Feng, Yijing Xu, Haoqiang Kang, Ziyan Kuang, Chenhan Yuan, Kailai Yang, Zheheng Luo, Tianlin Zhang, Zhiwei Liu, Guojun Xiong, Zhiyang Deng, Yuechen Jiang, Zhiyuan Yao, Haohang Li, Yangyang Yu, Gang Hu , et al. (9 additional authors not shown)

    Abstract: LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical… ▽ More

    Submitted 18 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 26 pages, 11 figures