-
FinBen: A Holistic Financial Benchmark for Large Language Models
Authors:
Qianqian Xie,
Weiguang Han,
Zhengyu Chen,
Ruoyu Xiang,
Xiao Zhang,
Yueru He,
Mengxi Xiao,
Dong Li,
Yongfu Dai,
Duanyu Feng,
Yijing Xu,
Haoqiang Kang,
Ziyan Kuang,
Chenhan Yuan,
Kailai Yang,
Zheheng Luo,
Tianlin Zhang,
Zhiwei Liu,
Guojun Xiong,
Zhiyang Deng,
Yuechen Jiang,
Zhiyuan Yao,
Haohang Li,
Yangyang Yu,
Gang Hu
, et al. (9 additional authors not shown)
Abstract:
LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical…
▽ More
LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical aspects: information extraction (IE), textual analysis, question answering (QA), text generation, risk management, forecasting, and decision-making. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading. Our evaluation of 15 representative LLMs, including GPT-4, ChatGPT, and the latest Gemini, reveals several key findings: While LLMs excel in IE and textual analysis, they struggle with advanced reasoning and complex tasks like text generation and forecasting. GPT-4 excels in IE and stock trading, while Gemini is better at text generation and forecasting. Instruction-tuned LLMs improve textual analysis but offer limited benefits for complex tasks such as QA. FinBen has been used to host the first financial LLMs shared task at the FinNLP-AgentScen workshop during IJCAI-2024, attracting 12 teams. Their novel solutions outperformed GPT-4, showcasing FinBen's potential to drive innovation in financial LLMs. All datasets, results, and codes are released for the research community: https://github.com/The-FinAI/PIXIU.
△ Less
Submitted 18 June, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Dólares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs Between Spanish and English
Authors:
Xiao Zhang,
Ruoyu Xiang,
Chenhan Yuan,
Duanyu Feng,
Weiguang Han,
Alejandro Lopez-Lira,
Xiao-Yang Liu,
Sophia Ananiadou,
Min Peng,
Jimin Huang,
Qianqian Xie
Abstract:
Despite Spanish's pivotal role in the global finance industry, a pronounced gap exists in Spanish financial natural language processing (NLP) and application studies compared to English, especially in the era of large language models (LLMs). To bridge this gap, we unveil Toisón de Oro, the first bilingual framework that establishes instruction datasets, finetuned LLMs, and evaluation benchmark for…
▽ More
Despite Spanish's pivotal role in the global finance industry, a pronounced gap exists in Spanish financial natural language processing (NLP) and application studies compared to English, especially in the era of large language models (LLMs). To bridge this gap, we unveil Toisón de Oro, the first bilingual framework that establishes instruction datasets, finetuned LLMs, and evaluation benchmark for financial LLMs in Spanish joint with English. We construct a rigorously curated bilingual instruction dataset including over 144K Spanish and English samples from 15 datasets covering 7 tasks. Harnessing this, we introduce FinMA-ES, an LLM designed for bilingual financial applications. We evaluate our model and existing LLMs using FLARE-ES, the first comprehensive bilingual evaluation benchmark with 21 datasets covering 9 tasks. The FLARE-ES benchmark results reveal a significant multilingual performance gap and bias in existing LLMs. FinMA-ES models surpass SOTA LLMs such as GPT-4 in Spanish financial tasks, due to strategic instruction tuning and leveraging data from diverse linguistic resources, highlighting the positive impact of cross-linguistic transfer. All our datasets, models, and benchmarks have been released.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models
Authors:
Yi Zhao,
Yilin Zhang,
Rong Xiang,
Jing Li,
Hillming Li
Abstract:
Visually Impaired Assistance (VIA) aims to automatically help the visually impaired (VI) handle daily activities. The advancement of VIA primarily depends on developments in Computer Vision (CV) and Natural Language Processing (NLP), both of which exhibit cutting-edge paradigms with large models (LMs). Furthermore, LMs have shown exceptional multimodal abilities to tackle challenging physically-gr…
▽ More
Visually Impaired Assistance (VIA) aims to automatically help the visually impaired (VI) handle daily activities. The advancement of VIA primarily depends on developments in Computer Vision (CV) and Natural Language Processing (NLP), both of which exhibit cutting-edge paradigms with large models (LMs). Furthermore, LMs have shown exceptional multimodal abilities to tackle challenging physically-grounded tasks such as embodied robots. To investigate the potential and limitations of state-of-the-art (SOTA) LMs' capabilities in VIA applications, we present an extensive study for the task of VIA with LMs (VIALM). In this task, given an image illustrating the physical environments and a linguistic request from a VI user, VIALM aims to output step-by-step guidance to assist the VI user in fulfilling the request grounded in the environment. The study consists of a survey reviewing recent LM research and benchmark experiments examining selected LMs' capabilities in VIA. The results indicate that while LMs can potentially benefit VIA, their output cannot be well environment-grounded (i.e., 25.7% GPT-4's responses) and lacks fine-grained guidance (i.e., 32.1% GPT-4's responses).
△ Less
Submitted 10 February, 2024; v1 submitted 29 January, 2024;
originally announced February 2024.
-
UniPoll: A Unified Social Media Poll Generation Framework via Multi-Objective Optimization
Authors:
Yixia Li,
Rong Xiang,
Yanlin Song,
Jing Li
Abstract:
Social media platforms are essential outlets for expressing opinions, providing a valuable resource for capturing public viewpoints via text analytics. However, for many users, passive browsing is their preferred mode of interaction, leading to their perspectives being overlooked by text analytics methods. Meanwhile, social media polls have emerged as a practical feature for gathering public opini…
▽ More
Social media platforms are essential outlets for expressing opinions, providing a valuable resource for capturing public viewpoints via text analytics. However, for many users, passive browsing is their preferred mode of interaction, leading to their perspectives being overlooked by text analytics methods. Meanwhile, social media polls have emerged as a practical feature for gathering public opinions, allowing post authors to pose questions with pre-defined answer options for readers to vote on. To broaden the benefits of polls for posts without them, this article explores the automatic generation of a poll from a social media post by leveraging cutting-edge natural language generation (NLG) techniques. However, existing NLG techniques, primarily developed for general-domain texts, may be ineffective when applied to noisy social media data, which often feature implicit context-question-answer relations. To tackle these challenges, we enrich a post context with its comments and propose a novel unified poll generation framework called UniPoll. It employs prompt tuning with multi-objective optimization to bolster the connection exploration between contexts (posts and comments) and polls (questions and answers). Experimental comparisons on a large-scale Chinese Weibo dataset show that UniPoll significantly outperforms T5, the state-of-the-art NLG model, which generates question and answer separately. Comprehensive qualitative and quantitative analyses further underscore the superiority of UniPoll through various evaluation lenses.
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
A Dual Iterative Refinement Method for Non-rigid Shape Matching
Authors:
Rui Xiang,
Rongjie Lai,
Hongkai Zhao
Abstract:
In this work, a simple and efficient dual iterative refinement (DIR) method is proposed for dense correspondence between two nearly isometric shapes. The key idea is to use dual information, such as spatial and spectral, or local and global features, in a complementary and effective way, and extract more accurate information from current iteration to use for the next iteration. In each DIR iterati…
▽ More
In this work, a simple and efficient dual iterative refinement (DIR) method is proposed for dense correspondence between two nearly isometric shapes. The key idea is to use dual information, such as spatial and spectral, or local and global features, in a complementary and effective way, and extract more accurate information from current iteration to use for the next iteration. In each DIR iteration, starting from current correspondence, a zoom-in process at each point is used to select well matched anchor pairs by a local mapping distortion criterion. These selected anchor pairs are then used to align spectral features (or other appropriate global features) whose dimension adaptively matches the capacity of the selected anchor pairs. Thanks to the effective combination of complementary information in a data-adaptive way, DIR is not only efficient but also robust to render accurate results within a few iterations. By choosing appropriate dual features, DIR has the flexibility to handle patch and partial matching as well. Extensive experiments on various data sets demonstrate the superiority of DIR over other state-of-the-art methods in terms of both accuracy and efficiency.
△ Less
Submitted 19 November, 2020; v1 submitted 25 July, 2020;
originally announced July 2020.
-
3dDepthNet: Point Cloud Guided Depth Completion Network for Sparse Depth and Single Color Image
Authors:
Rui Xiang,
Feng Zheng,
Huapeng Su,
Zhe Zhang
Abstract:
In this paper, we propose an end-to-end deep learning network named 3dDepthNet, which produces an accurate dense depth image from a single pair of sparse LiDAR depth and color image for robotics and autonomous driving tasks. Based on the dimensional nature of depth images, our network offers a novel 3D-to-2D coarse-to-fine dual densification design that is both accurate and lightweight. Depth dens…
▽ More
In this paper, we propose an end-to-end deep learning network named 3dDepthNet, which produces an accurate dense depth image from a single pair of sparse LiDAR depth and color image for robotics and autonomous driving tasks. Based on the dimensional nature of depth images, our network offers a novel 3D-to-2D coarse-to-fine dual densification design that is both accurate and lightweight. Depth densification is first performed in 3D space via point cloud completion, followed by a specially designed encoder-decoder structure that utilizes the projected dense depth from 3D completion and the original RGB-D images to perform 2D image completion. Experiments on the KITTI dataset show our network achieves state-of-art accuracy while being more efficient. Ablation and generalization tests prove that each module in our network has positive influences on the final results, and furthermore, our network is resilient to even sparser depth.
△ Less
Submitted 20 March, 2020;
originally announced March 2020.
-
Efficient and Robust Shape Correspondence via Sparsity-Enforced Quadratic Assignment
Authors:
Rui Xiang,
Rongjie Lai,
Hongkai Zhao
Abstract:
In this work, we introduce a novel local pairwise descriptor and then develop a simple, effective iterative method to solve the resulting quadratic assignment through sparsity control for shape correspondence between two approximate isometric surfaces. Our pairwise descriptor is based on the stiffness and mass matrix of finite element approximation of the Laplace-Beltrami differential operator, wh…
▽ More
In this work, we introduce a novel local pairwise descriptor and then develop a simple, effective iterative method to solve the resulting quadratic assignment through sparsity control for shape correspondence between two approximate isometric surfaces. Our pairwise descriptor is based on the stiffness and mass matrix of finite element approximation of the Laplace-Beltrami differential operator, which is local in space, sparse to represent, and extremely easy to compute while containing global information. It allows us to deal with open surfaces, partial matching, and topological perturbations robustly. To solve the resulting quadratic assignment problem efficiently, the two key ideas of our iterative algorithm are: 1) select pairs with good (approximate) correspondence as anchor points, 2) solve a regularized quadratic assignment problem only in the neighborhood of selected anchor points through sparsity control. These two ingredients can improve and increase the number of anchor points quickly while reducing the computation cost in each quadratic assignment iteration significantly. With enough high-quality anchor points, one may use various pointwise global features with reference to these anchor points to further improve the dense shape correspondence. We use various experiments to show the efficiency, quality, and versatility of our method on large data sets, patches, and point clouds (without global meshes).
△ Less
Submitted 20 March, 2020; v1 submitted 19 March, 2020;
originally announced March 2020.
-
Dual Memory Network Model for Biased Product Review Classification
Authors:
Yunfei Long,
Mingyu Ma,
Qin Lu,
Rong Xiang,
Chu-Ren Huang
Abstract:
In sentiment analysis (SA) of product reviews, both user and product information are proven to be useful. Current tasks handle user profile and product information in a unified model which may not be able to learn salient features of users and products effectively. In this work, we propose a dual user and product memory network (DUPMN) model to learn user profiles and product reviews using separat…
▽ More
In sentiment analysis (SA) of product reviews, both user and product information are proven to be useful. Current tasks handle user profile and product information in a unified model which may not be able to learn salient features of users and products effectively. In this work, we propose a dual user and product memory network (DUPMN) model to learn user profiles and product reviews using separate memory networks. Then, the two representations are used jointly for sentiment prediction. The use of separate models aims to capture user profiles and product information more effectively. Compared to state-of-the-art unified prediction models, the evaluations on three benchmark datasets, IMDB, Yelp13, and Yelp14, show that our dual learning model gives performance gain of 0.6%, 1.2%, and 0.9%, respectively. The improvements are also deemed very significant measured by p-values.
△ Less
Submitted 15 September, 2018;
originally announced September 2018.
-
SCOPE: Scalable Composite Optimization for Learning on Spark
Authors:
Shen-Yi Zhao,
Ru Xiang,
Ying-Hao Shi,
Peng Gao,
Wu-Jun Li
Abstract:
Many machine learning models, such as logistic regression~(LR) and support vector machine~(SVM), can be formulated as composite optimization problems. Recently, many distributed stochastic optimization~(DSO) methods have been proposed to solve the large-scale composite optimization problems, which have shown better performance than traditional batch methods. However, most of these DSO methods are…
▽ More
Many machine learning models, such as logistic regression~(LR) and support vector machine~(SVM), can be formulated as composite optimization problems. Recently, many distributed stochastic optimization~(DSO) methods have been proposed to solve the large-scale composite optimization problems, which have shown better performance than traditional batch methods. However, most of these DSO methods are not scalable enough. In this paper, we propose a novel DSO method, called \underline{s}calable \underline{c}omposite \underline{op}timization for l\underline{e}arning~({SCOPE}), and implement it on the fault-tolerant distributed platform \mbox{Spark}. SCOPE is both computation-efficient and communication-efficient. Theoretical analysis shows that SCOPE is convergent with linear convergence rate when the objective function is convex. Furthermore, empirical results on real datasets show that SCOPE can outperform other state-of-the-art distributed learning methods on Spark, including both batch learning methods and DSO methods.
△ Less
Submitted 11 December, 2016; v1 submitted 30 January, 2016;
originally announced February 2016.
-
Virtual Vector Machine for Bayesian Online Classification
Authors:
Thomas P. Minka,
Rongjing Xiang,
Yuan,
Qi
Abstract:
In a typical online learning scenario, a learner is required to process a large data stream using a small memory buffer. Such a requirement is usually in conflict with a learner's primary pursuit of prediction accuracy. To address this dilemma, we introduce a novel Bayesian online classi cation algorithm, called the Virtual Vector Machine. The virtual vector machine allows you to smoothly trade-of…
▽ More
In a typical online learning scenario, a learner is required to process a large data stream using a small memory buffer. Such a requirement is usually in conflict with a learner's primary pursuit of prediction accuracy. To address this dilemma, we introduce a novel Bayesian online classi cation algorithm, called the Virtual Vector Machine. The virtual vector machine allows you to smoothly trade-off prediction accuracy with memory size. The virtual vector machine summarizes the information contained in the preceding data stream by a Gaussian distribution over the classi cation weights plus a constant number of virtual data points. The virtual data points are designed to add extra non-Gaussian information about the classi cation weights. To maintain the constant number of virtual points, the virtual vector machine adds the current real data point into the virtual point set, merges two most similar virtual points into a new virtual point or deletes a virtual point that is far from the decision boundary. The information lost in this process is absorbed into the Gaussian distribution. The extra information provided by the virtual points leads to improved predictive accuracy over previous online classification algorithms.
△ Less
Submitted 9 May, 2012;
originally announced May 2012.
-
Recurrence-based time series analysis by means of complex network methods
Authors:
Reik V. Donner,
Michael Small,
Jonathan F. Donges,
Norbert Marwan,
Yong Zou,
Ruoxi Xiang,
Jürgen Kurths
Abstract:
Complex networks are an important paradigm of modern complex systems sciences which allows quantitatively assessing the structural properties of systems composed of different interacting entities. During the last years, intensive efforts have been spent on applying network-based concepts also for the analysis of dynamically relevant higher-order statistical properties of time series. Notably, many…
▽ More
Complex networks are an important paradigm of modern complex systems sciences which allows quantitatively assessing the structural properties of systems composed of different interacting entities. During the last years, intensive efforts have been spent on applying network-based concepts also for the analysis of dynamically relevant higher-order statistical properties of time series. Notably, many corresponding approaches are closely related with the concept of recurrence in phase space. In this paper, we review recent methodological advances in time series analysis based on complex networks, with a special emphasis on methods founded on recurrence plots. The potentials and limitations of the individual methods are discussed and illustrated for paradigmatic examples of dynamical systems as well as for real-world time series. Complex network measures are shown to provide information about structural features of dynamical systems that are complementary to those characterized by other methods of time series analysis and, hence, substantially enrich the knowledge gathered from other existing (linear as well as nonlinear) approaches.
△ Less
Submitted 25 October, 2010;
originally announced October 2010.