subscribe to arXiv mailings

arXiv:2406.19415 [pdf, other]

An Analysis of Multilingual FActScore

Authors: Kim Trong Vu, Michael Krumdick, Varshini Reddy, Franck Dernoncourt, Viet Dac Lai

Abstract: FActScore has gained popularity as a metric to estimate the factuality of long-form texts generated by Large Language Models (LLMs) in English. However, there has not been any work in studying the behavior of FActScore in other languages. This paper studies the limitations of each component in the four-component pipeline of FActScore in the multilingual setting. We introduce a new dataset for FAct… ▽ More FActScore has gained popularity as a metric to estimate the factuality of long-form texts generated by Large Language Models (LLMs) in English. However, there has not been any work in studying the behavior of FActScore in other languages. This paper studies the limitations of each component in the four-component pipeline of FActScore in the multilingual setting. We introduce a new dataset for FActScore on texts generated by strong multilingual LLMs. Our evaluation shows that LLMs exhibit distinct behaviors in both fact extraction and fact scoring tasks. No LLM produces consistent and reliable FActScore across languages with varying levels of resources. We also find that the knowledge source plays an important role in the quality of the estimated FActScore. Using Wikipedia as the knowledge source may hinder the true FActScore of long-form text due to its limited coverage in medium- and low-resource languages. We also incorporate three mitigations to our knowledge source that ultimately improve FActScore estimation across all languages. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14394 [pdf, other]

SEC-QA: A Systematic Evaluation Corpus for Financial QA

Authors: Viet Dac Lai, Michael Krumdick, Charles Lovering, Varshini Reddy, Craig Schmidt, Chris Tanner

Abstract: The financial domain frequently deals with large numbers of long documents that are essential for daily operations. Significant effort is put towards automating financial data analysis. However, a persistent challenge, not limited to the finance domain, is the scarcity of datasets that accurately reflect real-world tasks for model evaluation. Existing datasets are often constrained by size, contex… ▽ More The financial domain frequently deals with large numbers of long documents that are essential for daily operations. Significant effort is put towards automating financial data analysis. However, a persistent challenge, not limited to the finance domain, is the scarcity of datasets that accurately reflect real-world tasks for model evaluation. Existing datasets are often constrained by size, context, or relevance to practical applications. Moreover, LLMs are currently trained on trillions of tokens of text, limiting access to novel data or documents that models have not encountered during training for unbiased evaluation. We propose SEC-QA, a continuous dataset generation framework with two key features: 1) the semi-automatic generation of Question-Answer (QA) pairs spanning multiple long context financial documents, which better represent real-world financial scenarios; 2) the ability to continually refresh the dataset using the most recent public document collections, not yet ingested by LLMs. Our experiments show that current retrieval augmented generation methods systematically fail to answer these challenging multi-document questions. In response, we introduce a QA system based on program-of-thought that improves the ability to perform complex information retrieval and quantitative reasoning pipelines, thereby increasing QA accuracy. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2405.04028 [pdf, other]

doi 10.1145/3626772.3657971

Masked Graph Transformer for Large-Scale Recommendation

Authors: Huiyuan Chen, Zhe Xu, Chin-Chia Michael Yeh, Vivian Lai, Yan Zheng, Minghua Xu, Hanghang Tong

Abstract: Graph Transformers have garnered significant attention for learning graph-structured data, thanks to their superb ability to capture long-range dependencies among nodes. However, the quadratic space and time complexity hinders the scalability of Graph Transformers, particularly for large-scale recommendation. Here we propose an efficient Masked Graph Transformer, named MGFormer, capable of capturi… ▽ More Graph Transformers have garnered significant attention for learning graph-structured data, thanks to their superb ability to capture long-range dependencies among nodes. However, the quadratic space and time complexity hinders the scalability of Graph Transformers, particularly for large-scale recommendation. Here we propose an efficient Masked Graph Transformer, named MGFormer, capable of capturing all-pair interactions among nodes with a linear complexity. To achieve this, we treat all user/item nodes as independent tokens, enhance them with positional embeddings, and feed them into a kernelized attention module. Additionally, we incorporate learnable relative degree information to appropriately reweigh the attentions. Experimental results show the superior performance of our MGFormer, even with a single attention layer. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2403.05565 [pdf, other]

OpenHEXAI: An Open-Source Framework for Human-Centered Evaluation of Explainable Machine Learning

Authors: Jiaqi Ma, Vivian Lai, Yiming Zhang, Chacha Chen, Paul Hamilton, Davor Ljubenkov, Himabindu Lakkaraju, Chenhao Tan

Abstract: Recently, there has been a surge of explainable AI (XAI) methods driven by the need for understanding machine learning model behaviors in high-stakes scenarios. However, properly evaluating the effectiveness of the XAI methods inevitably requires the involvement of human subjects, and conducting human-centered benchmarks is challenging in a number of ways: designing and implementing user studies i… ▽ More Recently, there has been a surge of explainable AI (XAI) methods driven by the need for understanding machine learning model behaviors in high-stakes scenarios. However, properly evaluating the effectiveness of the XAI methods inevitably requires the involvement of human subjects, and conducting human-centered benchmarks is challenging in a number of ways: designing and implementing user studies is complex; numerous design choices in the design space of user study lead to problems of reproducibility; and running user studies can be challenging and even daunting for machine learning researchers. To address these challenges, this paper presents OpenHEXAI, an open-source framework for human-centered evaluation of XAI methods. OpenHEXAI features (1) a collection of diverse benchmark datasets, pre-trained models, and post hoc explanation methods; (2) an easy-to-use web application for user study; (3) comprehensive evaluation metrics for the effectiveness of post hoc explanation methods in the context of human-AI decision making tasks; (4) best practice recommendations of experiment documentation; and (5) convenient tools for power analysis and cost estimation. OpenHEAXI is the first large-scale infrastructural effort to facilitate human-centered benchmarks of XAI methods. It simplifies the design and implementation of user studies for XAI methods, thus allowing researchers and practitioners to focus on the scientific questions. Additionally, it enhances reproducibility through standardized designs. Based on OpenHEXAI, we further conduct a systematic benchmark of four state-of-the-art post hoc explanation methods and compare their impacts on human-AI decision making tasks in terms of accuracy, fairness, as well as users' trust and understanding of the machine learning model. △ Less

Submitted 20 February, 2024; originally announced March 2024.

arXiv:2402.10487 [pdf, other]

RPMixer: Shaking Up Time Series Forecasting with Random Projections for Large Spatial-Temporal Data

Authors: Chin-Chia Michael Yeh, Yujie Fan, Xin Dai, Uday Singh Saini, Vivian Lai, Prince Osei Aboagye, Junpeng Wang, Huiyuan Chen, Yan Zheng, Zhongfang Zhuang, Liang Wang, Wei Zhang

Abstract: Spatial-temporal forecasting systems play a crucial role in addressing numerous real-world challenges. In this paper, we investigate the potential of addressing spatial-temporal forecasting problems using general time series forecasting models, i.e., models that do not leverage the spatial relationships among the nodes. We propose a all-Multi-Layer Perceptron (all-MLP) time series forecasting arch… ▽ More Spatial-temporal forecasting systems play a crucial role in addressing numerous real-world challenges. In this paper, we investigate the potential of addressing spatial-temporal forecasting problems using general time series forecasting models, i.e., models that do not leverage the spatial relationships among the nodes. We propose a all-Multi-Layer Perceptron (all-MLP) time series forecasting architecture called RPMixer. The all-MLP architecture was chosen due to its recent success in time series forecasting benchmarks. Furthermore, our method capitalizes on the ensemble-like behavior of deep neural networks, where each individual block within the network behaves like a base learner in an ensemble model, particularly when identity mapping residual connections are incorporated. By integrating random projection layers into our model, we increase the diversity among the blocks' outputs, thereby improving the overall performance of the network. Extensive experiments conducted on the largest spatial-temporal forecasting benchmark datasets demonstrate that the proposed method outperforms alternative methods, including both spatial-temporal graph models and general forecasting models. △ Less

Submitted 12 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

arXiv:2401.06915 [pdf, other]

DocFinQA: A Long-Context Financial Reasoning Dataset

Authors: Varshini Reddy, Rik Koncel-Kedziorski, Viet Dac Lai, Michael Krumdick, Charles Lovering, Chris Tanner

Abstract: For large language models (LLMs) to be effective in the financial domain -- where each decision can have a significant impact -- it is necessary to investigate realistic tasks and data. Financial professionals often interact with documents that are hundreds of pages long, but most financial research datasets only deal with short excerpts from these documents. To address this, we introduce a long-d… ▽ More For large language models (LLMs) to be effective in the financial domain -- where each decision can have a significant impact -- it is necessary to investigate realistic tasks and data. Financial professionals often interact with documents that are hundreds of pages long, but most financial research datasets only deal with short excerpts from these documents. To address this, we introduce a long-document financial QA task. We augment 7,437 questions from the existing FinQA dataset with the full-document context, extending the average context length from under 700 words in FinQA to 123k words in DocFinQA. We conduct extensive experiments over retrieval-based QA pipelines and long-context language models. DocFinQA proves a significant challenge for even state-of-the-art systems. We also provide a case-study on the longest documents in DocFinQA and find that models particularly struggle on these documents. Addressing these challenges may have a wide reaching impact across applications where specificity and long-range contexts are critical, like gene sequences and legal document contract analysis. △ Less

Submitted 29 February, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

Comments: 13 pages

arXiv:2312.17468 [pdf, other]

doi 10.1145/3616855.3635832

Towards Mitigating Dimensional Collapse of Representations in Collaborative Filtering

Authors: Huiyuan Chen, Vivian Lai, Hongye Jin, Zhimeng Jiang, Mahashweta Das, Xia Hu

Abstract: Contrastive Learning (CL) has shown promising performance in collaborative filtering. The key idea is to generate augmentation-invariant embeddings by maximizing the Mutual Information between different augmented views of the same instance. However, we empirically observe that existing CL models suffer from the \textsl{dimensional collapse} issue, where user/item embeddings only span a low-dimensi… ▽ More Contrastive Learning (CL) has shown promising performance in collaborative filtering. The key idea is to generate augmentation-invariant embeddings by maximizing the Mutual Information between different augmented views of the same instance. However, we empirically observe that existing CL models suffer from the \textsl{dimensional collapse} issue, where user/item embeddings only span a low-dimension subspace of the entire feature space. This suppresses other dimensional information and weakens the distinguishability of embeddings. Here we propose a non-contrastive learning objective, named nCL, which explicitly mitigates dimensional collapse of representations in collaborative filtering. Our nCL aims to achieve geometric properties of \textsl{Alignment} and \textsl{Compactness} on the embedding space. In particular, the alignment tries to push together representations of positive-related user-item pairs, while compactness tends to find the optimal coding length of user/item embeddings, subject to a given distortion. More importantly, our nCL does not require data augmentation nor negative sampling during training, making it scalable to large datasets. Experimental results demonstrate the superiority of our nCL. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2311.06602 [pdf, other]

BizBench: A Quantitative Reasoning Benchmark for Business and Finance

Authors: Rik Koncel-Kedziorski, Michael Krumdick, Viet Lai, Varshini Reddy, Charles Lovering, Chris Tanner

Abstract: Answering questions within business and finance requires reasoning, precision, and a wide-breadth of technical knowledge. Together, these requirements make this domain difficult for large language models (LLMs). We introduce BizBench, a benchmark for evaluating models' ability to reason about realistic financial problems. BizBench comprises eight quantitative reasoning tasks, focusing on question-… ▽ More Answering questions within business and finance requires reasoning, precision, and a wide-breadth of technical knowledge. Together, these requirements make this domain difficult for large language models (LLMs). We introduce BizBench, a benchmark for evaluating models' ability to reason about realistic financial problems. BizBench comprises eight quantitative reasoning tasks, focusing on question-answering (QA) over financial data via program synthesis. We include three financially-themed code-generation tasks from newly collected and augmented QA data. Additionally, we isolate the reasoning capabilities required for financial QA: reading comprehension of financial text and tables for extracting intermediate values, and understanding financial concepts and formulas needed to calculate complex solutions. Collectively, these tasks evaluate a model's financial background knowledge, ability to parse financial documents, and capacity to solve problems with code. We conduct an in-depth evaluation of open-source and commercial LLMs, comparing and contrasting the behavior of code-focused and language-focused models. We demonstrate that the current bottleneck in performance is due to LLMs' limited business and financial understanding, highlighting the value of a challenging benchmark for quantitative reasoning within this domain. △ Less

Submitted 12 March, 2024; v1 submitted 11 November, 2023; originally announced November 2023.

Comments: Work in progress

arXiv:2311.06359 [pdf, other]

doi 10.3847/2041-8213/ad132d

Highly Significant Detection of X-Ray Polarization from the Brightest Accreting Neutron Star Sco X-1

Authors: Fabio La Monaca, Alessandro Di Marco, Juri Poutanen, Matteo Bachetti, Sara E. Motta, Alessandro Papitto, Maura Pilia, Fei Xie, Stefano Bianchi, Anna Bobrikova, Enrico Costa, Wei Deng, Mingyu Ge, Giulia Illiano, Shu-Mei Jia, Henric Krawczynski, Eleonora V. Lai, Kuan Liu, Guglielmo Mastroserio, Fabio Muleri, John Rankin, Paolo Soffitta, Alexandra Veledina, Filippo Ambrosino, Melania Del Santo , et al. (94 additional authors not shown)

Abstract: The Imaging X-ray Polarimetry Explorer (IXPE) measured with high significance the X-ray polarization of the brightest Z-source Scorpius X-1, resulting in the nominal 2-8 keV energy band in a polarization degree of 1.0(0.2)% and a polarization angle of 8(6)° at 90% of confidence level. This observation was strictly simultaneous with observations performed by NICER, NuSTAR, and Insight-HXMT, which a… ▽ More The Imaging X-ray Polarimetry Explorer (IXPE) measured with high significance the X-ray polarization of the brightest Z-source Scorpius X-1, resulting in the nominal 2-8 keV energy band in a polarization degree of 1.0(0.2)% and a polarization angle of 8(6)° at 90% of confidence level. This observation was strictly simultaneous with observations performed by NICER, NuSTAR, and Insight-HXMT, which allowed for a precise characterization of its broad-band spectrum from soft to hard X-rays. The source has been observed mainly in its soft state, with short periods of flaring. We also observed low-frequency quasi-periodic oscillations. From a spectro-polarimetric analysis, we associate a polarization to the accretion disk at <3.2% at 90% of confidence level, compatible with expectations for an electron-scattering dominated optically thick atmosphere at the Sco X-1 inclination of 44°; for the higher-energy Comptonized component, we obtain a polarization of 1.3(0.4)%, in agreement with expectations for a slab of Thomson optical depth of ~7 and an electron temperature of ~3 keV. A polarization rotation with respect to previous observations by OSO-8 and PolarLight, and also with respect to the radio-jet position angle, is observed. This result may indicate a variation of the polarization with the source state that can be related to relativistic precession or to a change in the corona geometry with the accretion flow. △ Less

Submitted 24 January, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

Journal ref: ApJL 960 L11 (2024)

arXiv:2311.02561 [pdf, other]

Ego-Network Transformer for Subsequence Classification in Time Series Data

Authors: Chin-Chia Michael Yeh, Huiyuan Chen, Yujie Fan, Xin Dai, Yan Zheng, Vivian Lai, Junpeng Wang, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn Keogh

Abstract: Time series classification is a widely studied problem in the field of time series data mining. Previous research has predominantly focused on scenarios where relevant or foreground subsequences have already been extracted, with each subsequence corresponding to a single label. However, real-world time series data often contain foreground subsequences that are intertwined with background subsequen… ▽ More Time series classification is a widely studied problem in the field of time series data mining. Previous research has predominantly focused on scenarios where relevant or foreground subsequences have already been extracted, with each subsequence corresponding to a single label. However, real-world time series data often contain foreground subsequences that are intertwined with background subsequences. Successfully classifying these relevant subsequences requires not only distinguishing between different classes but also accurately identifying the foreground subsequences amidst the background. To address this challenge, we propose a novel subsequence classification method that represents each subsequence as an ego-network, providing crucial nearest neighbor information to the model. The ego-networks of all subsequences collectively form a time series subsequence graph, and we introduce an algorithm to efficiently construct this graph. Furthermore, we have demonstrated the significance of enforcing temporal consistency in the prediction of adjacent subsequences for the subsequence classification problem. To evaluate the effectiveness of our approach, we conducted experiments using 128 univariate and 30 multivariate time series datasets. The experimental results demonstrate the superior performance of our method compared to alternative approaches. Specifically, our method outperforms the baseline on 104 out of 158 datasets. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2311.02560 [pdf, other]

Temporal Treasure Hunt: Content-based Time Series Retrieval System for Discovering Insights

Authors: Chin-Chia Michael Yeh, Huiyuan Chen, Xin Dai, Yan Zheng, Yujie Fan, Vivian Lai, Junpeng Wang, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei Zhang

Abstract: Time series data is ubiquitous across various domains such as finance, healthcare, and manufacturing, but their properties can vary significantly depending on the domain they originate from. The ability to perform Content-based Time Series Retrieval (CTSR) is crucial for identifying unknown time series examples. However, existing CTSR works typically focus on retrieving time series from a single d… ▽ More Time series data is ubiquitous across various domains such as finance, healthcare, and manufacturing, but their properties can vary significantly depending on the domain they originate from. The ability to perform Content-based Time Series Retrieval (CTSR) is crucial for identifying unknown time series examples. However, existing CTSR works typically focus on retrieving time series from a single domain database, which can be inadequate if the user does not know the source of the query time series. This limitation motivates us to investigate the CTSR problem in a scenario where the database contains time series from multiple domains. To facilitate this investigation, we introduce a CTSR benchmark dataset that comprises time series data from a variety of domains, such as motion, power demand, and traffic. This dataset is sourced from a publicly available time series classification dataset archive, making it easily accessible to researchers in the field. We compare several popular methods for modeling and retrieving time series data using this benchmark dataset. Additionally, we propose a novel distance learning model that outperforms the existing methods. Overall, our study highlights the importance of addressing the CTSR problem across multiple domains and provides a useful benchmark dataset for future research. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2310.03919 [pdf, other]

An Efficient Content-based Time Series Retrieval System

Authors: Chin-Chia Michael Yeh, Huiyuan Chen, Xin Dai, Yan Zheng, Junpeng Wang, Vivian Lai, Yujie Fan, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei Zhang, Jeff M. Phillips

Abstract: A Content-based Time Series Retrieval (CTSR) system is an information retrieval system for users to interact with time series emerged from multiple domains, such as finance, healthcare, and manufacturing. For example, users seeking to learn more about the source of a time series can submit the time series as a query to the CTSR system and retrieve a list of relevant time series with associated met… ▽ More A Content-based Time Series Retrieval (CTSR) system is an information retrieval system for users to interact with time series emerged from multiple domains, such as finance, healthcare, and manufacturing. For example, users seeking to learn more about the source of a time series can submit the time series as a query to the CTSR system and retrieve a list of relevant time series with associated metadata. By analyzing the retrieved metadata, users can gather more information about the source of the time series. Because the CTSR system is required to work with time series data from diverse domains, it needs a high-capacity model to effectively measure the similarity between different time series. On top of that, the model within the CTSR system has to compute the similarity scores in an efficient manner as the users interact with the system in real-time. In this paper, we propose an effective and efficient CTSR model that outperforms alternative models, while still providing reasonable inference runtimes. To demonstrate the capability of the proposed method in solving business problems, we compare it against alternative models using our in-house transaction data. Our findings reveal that the proposed model is the most suitable solution compared to others for our transaction data problem. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2310.03916 [pdf, other]

Toward a Foundation Model for Time Series Data

Authors: Chin-Chia Michael Yeh, Xin Dai, Huiyuan Chen, Yan Zheng, Yujie Fan, Audrey Der, Vivian Lai, Zhongfang Zhuang, Junpeng Wang, Liang Wang, Wei Zhang

Abstract: A foundation model is a machine learning model trained on a large and diverse set of data, typically using self-supervised learning-based pre-training techniques, that can be adapted to various downstream tasks. However, current research on time series pre-training has mostly focused on models pre-trained solely on data from a single domain, resulting in a lack of knowledge about other types of ti… ▽ More A foundation model is a machine learning model trained on a large and diverse set of data, typically using self-supervised learning-based pre-training techniques, that can be adapted to various downstream tasks. However, current research on time series pre-training has mostly focused on models pre-trained solely on data from a single domain, resulting in a lack of knowledge about other types of time series. However, current research on time series pre-training has predominantly focused on models trained exclusively on data from a single domain. As a result, these models possess domain-specific knowledge that may not be easily transferable to time series from other domains. In this paper, we aim to develop an effective time series foundation model by leveraging unlabeled samples from multiple domains. To achieve this, we repurposed the publicly available UCR Archive and evaluated four existing self-supervised learning-based pre-training methods, along with a novel method, on the datasets. We tested these methods using four popular neural network architectures for time series to understand how the pre-training methods interact with different network designs. Our experimental results show that pre-training improves downstream classification tasks by enhancing the convergence of the fine-tuning process. Furthermore, we found that the proposed pre-training method, when combined with the Transformer model, outperforms the alternatives. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2309.09400 [pdf, other]

CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Authors: Thuat Nguyen, Chien Van Nguyen, Viet Dac Lai, Hieu Man, Nghia Trung Ngo, Franck Dernoncourt, Ryan A. Rossi, Thien Huu Nguyen

Abstract: The driving factors behind the development of large language models (LLMs) with impressive learning capabilities are their colossal model sizes and extensive training datasets. Along with the progress in natural language processing, LLMs have been frequently made accessible to the public to foster deeper investigation and applications. However, when it comes to training datasets for these LLMs, es… ▽ More The driving factors behind the development of large language models (LLMs) with impressive learning capabilities are their colossal model sizes and extensive training datasets. Along with the progress in natural language processing, LLMs have been frequently made accessible to the public to foster deeper investigation and applications. However, when it comes to training datasets for these LLMs, especially the recent state-of-the-art models, they are often not fully disclosed. Creating training data for high-performing LLMs involves extensive cleaning and deduplication to ensure the necessary level of quality. The lack of transparency for training data has thus hampered research on attributing and addressing hallucination and bias issues in LLMs, hindering replication efforts and further advancements in the community. These challenges become even more pronounced in multilingual learning scenarios, where the available multilingual text datasets are often inadequately collected and cleaned. Consequently, there is a lack of open-source and readily usable dataset to effectively train LLMs in multiple languages. To overcome this issue, we present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages, tailored for LLM development. Our dataset undergoes meticulous cleaning and deduplication through a rigorous pipeline of multiple stages to accomplish the best quality for model training, including language identification, URL-based filtering, metric-based cleaning, document refinement, and data deduplication. CulturaX is fully released to the public in HuggingFace to facilitate research and advancements in multilingual LLMs: https://huggingface.co/datasets/uonlp/CulturaX. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: Ongoing Work

arXiv:2308.13541 [pdf, other]

doi 10.1145/3604915.3608771

Adversarial Collaborative Filtering for Free

Authors: Huiyuan Chen, Xiaoting Li, Vivian Lai, Chin-Chia Michael Yeh, Yujie Fan, Yan Zheng, Mahashweta Das, Hao Yang

Abstract: Collaborative Filtering (CF) has been successfully used to help users discover the items of interest. Nevertheless, existing CF methods suffer from noisy data issue, which negatively impacts the quality of recommendation. To tackle this problem, many prior studies leverage adversarial learning to regularize the representations of users/items, which improves both generalizability and robustness. Th… ▽ More Collaborative Filtering (CF) has been successfully used to help users discover the items of interest. Nevertheless, existing CF methods suffer from noisy data issue, which negatively impacts the quality of recommendation. To tackle this problem, many prior studies leverage adversarial learning to regularize the representations of users/items, which improves both generalizability and robustness. Those methods often learn adversarial perturbations and model parameters under min-max optimization framework. However, there still have two major drawbacks: 1) Existing methods lack theoretical guarantees of why adding perturbations improve the model generalizability and robustness; 2) Solving min-max optimization is time-consuming. In addition to updating the model parameters, each iteration requires additional computations to update the perturbations, making them not scalable for industry-scale datasets. In this paper, we present Sharpness-aware Collaborative Filtering (SharpCF), a simple yet effective method that conducts adversarial training without extra computational cost over the base optimizer. To achieve this goal, we first revisit the existing adversarial collaborative filtering and discuss its connection with recent Sharpness-aware Minimization. This analysis shows that adversarial training actually seeks model parameters that lie in neighborhoods around the optimal model parameters having uniformly low loss values, resulting in better generalizability. To reduce the computational overhead, SharpCF introduces a novel trajectory loss to measure the alignment between current weights and past weights. Experimental results on real-world datasets demonstrate that our SharpCF achieves superior performance with almost zero additional computational cost comparing to adversarial training. △ Less

Submitted 20 August, 2023; originally announced August 2023.

arXiv:2308.10347 [pdf, other]

doi 10.1145/3604915.3608831

Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation

Authors: Vivian Lai, Huiyuan Chen, Chin-Chia Michael Yeh, Minghua Xu, Yiwei Cai, Hao Yang

Abstract: Transformer and its variants are a powerful class of architectures for sequential recommendation, owing to their ability of capturing a user's dynamic interests from their past interactions. Despite their success, Transformer-based models often require the optimization of a large number of parameters, making them difficult to train from sparse data in sequential recommendation. To address the prob… ▽ More Transformer and its variants are a powerful class of architectures for sequential recommendation, owing to their ability of capturing a user's dynamic interests from their past interactions. Despite their success, Transformer-based models often require the optimization of a large number of parameters, making them difficult to train from sparse data in sequential recommendation. To address the problem of data sparsity, previous studies have utilized self-supervised learning to enhance Transformers, such as pre-training embeddings from item attributes or contrastive data augmentations. However, these approaches encounter several training issues, including initialization sensitivity, manual data augmentations, and large batch-size memory bottlenecks. In this work, we investigate Transformers from the perspective of loss geometry, aiming to enhance the models' data efficiency and generalization in sequential recommendation. We observe that Transformers (e.g., SASRec) can converge to extremely sharp local minima if not adequately regularized. Inspired by the recent Sharpness-Aware Minimization (SAM), we propose SAMRec, which significantly improves the accuracy and robustness of sequential recommendation. SAMRec performs comparably to state-of-the-art self-supervised Transformers, such as S$^3$Rec and CL4SRec, without the need for pre-training or strong data augmentations. △ Less

Submitted 20 August, 2023; originally announced August 2023.

arXiv:2307.16039 [pdf, other]

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

Authors: Viet Dac Lai, Chien Van Nguyen, Nghia Trung Ngo, Thuat Nguyen, Franck Dernoncourt, Ryan A. Rossi, Thien Huu Nguyen

Abstract: A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercia… ▽ More A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi. △ Less

Submitted 1 August, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

arXiv:2307.12949 [pdf, ps, other]

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Authors: Viet Dac Lai, Abel Salinas, Hao Tan, Trung Bui, Quan Tran, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, Thien Huu Nguyen

Abstract: Punctuation restoration is an important task in automatic speech recognition (ASR) which aim to restore the syntactic structure of generated ASR texts to improve readability. While punctuated texts are abundant from written documents, the discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts. This… ▽ More Punctuation restoration is an important task in automatic speech recognition (ASR) which aim to restore the syntactic structure of generated ASR texts to improve readability. While punctuated texts are abundant from written documents, the discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts. This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap. The experiments show that our method achieves state-of-the-art performance on the ASR test set on two benchmark datasets for punctuation restoration. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: Accepted at INTERSPEECH 2023, 6 pages

arXiv:2307.08910 [pdf, other]

Sharpness-Aware Graph Collaborative Filtering

Authors: Huiyuan Chen, Chin-Chia Michael Yeh, Yujie Fan, Yan Zheng, Junpeng Wang, Vivian Lai, Mahashweta Das, Hao Yang

Abstract: Graph Neural Networks (GNNs) have achieved impressive performance in collaborative filtering. However, GNNs tend to yield inferior performance when the distributions of training and test data are not aligned well. Also, training GNNs requires optimizing non-convex neural networks with an abundance of local and global minima, which may differ widely in their performance at test time. Thus, it is es… ▽ More Graph Neural Networks (GNNs) have achieved impressive performance in collaborative filtering. However, GNNs tend to yield inferior performance when the distributions of training and test data are not aligned well. Also, training GNNs requires optimizing non-convex neural networks with an abundance of local and global minima, which may differ widely in their performance at test time. Thus, it is essential to choose the minima carefully. Here we propose an effective training schema, called {gSAM}, under the principle that the \textit{flatter} minima has a better generalization ability than the \textit{sharper} ones. To achieve this goal, gSAM regularizes the flatness of the weight loss landscape by forming a bi-level optimization: the outer problem conducts the standard model training while the inner problem helps the model jump out of the sharp minima. Experimental results show the superiority of our gSAM. △ Less

Submitted 17 July, 2023; originally announced July 2023.

arXiv:2305.14889 [pdf, other]

Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics using Measurement Theory

Authors: Ziang Xiao, Susu Zhang, Vivian Lai, Q. Vera Liao

Abstract: We address a fundamental challenge in Natural Language Generation (NLG) model evaluation -- the design and evaluation of evaluation metrics. Recognizing the limitations of existing automatic metrics and noises from how current human evaluation was conducted, we propose MetricEval, a framework informed by measurement theory, the foundation of educational test design, for conceptualizing and evaluat… ▽ More We address a fundamental challenge in Natural Language Generation (NLG) model evaluation -- the design and evaluation of evaluation metrics. Recognizing the limitations of existing automatic metrics and noises from how current human evaluation was conducted, we propose MetricEval, a framework informed by measurement theory, the foundation of educational test design, for conceptualizing and evaluating the reliability and validity of NLG evaluation metrics. The framework formalizes the source of measurement error and offers statistical tools for evaluating evaluation metrics based on empirical data. With our framework, one can quantify the uncertainty of the metrics to better interpret the result. To exemplify the use of our framework in practice, we analyzed a set of evaluation metrics for summarization and identified issues related to conflated validity structure in human-eval and reliability in LLM-based metrics. Through MetricEval, we aim to promote the design, evaluation, and interpretation of valid and reliable metrics to advance robust and effective NLG models. △ Less

Submitted 22 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: EMNLP 2023

arXiv:2304.05613 [pdf, other]

ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning

Authors: Viet Dac Lai, Nghia Trung Ngo, Amir Pouran Ben Veyseh, Hieu Man, Franck Dernoncourt, Trung Bui, Thien Huu Nguyen

Abstract: Over the last few years, large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP) that fundamentally transform research and developments in the field. ChatGPT represents one of the most exciting LLM systems developed recently to showcase impressive skills for language generation and highly attract public attention. Among various exciting ap… ▽ More Over the last few years, large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP) that fundamentally transform research and developments in the field. ChatGPT represents one of the most exciting LLM systems developed recently to showcase impressive skills for language generation and highly attract public attention. Among various exciting applications discovered for ChatGPT in English, the model can process and generate texts for multiple languages due to its multilingual training data. Given the broad adoption of ChatGPT for English in different problems and areas, a natural question is whether ChatGPT can also be applied effectively for other languages or it is necessary to develop more language-specific technologies. The answer to this question requires a thorough evaluation of ChatGPT over multiple tasks with diverse languages and large datasets (i.e., beyond reported anecdotes), which is still missing or limited in current research. Our work aims to fill this gap for the evaluation of ChatGPT and similar LLMs to provide more comprehensive information for multilingual NLP applications. While this work will be an ongoing effort to include additional experiments in the future, our current paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources. We also focus on the zero-shot learning setting for ChatGPT to improve reproducibility and better simulate the interactions of general users. Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages, calling for further research to develop better models and understanding for multilingual learning. △ Less

Submitted 12 April, 2023; originally announced April 2023.

arXiv:2301.09656 [pdf, other]

Selective Explanations: Leveraging Human Input to Align Explainable AI

Authors: Vivian Lai, Yiming Zhang, Chacha Chen, Q. Vera Liao, Chenhao Tan

Abstract: While a vast collection of explainable AI (XAI) algorithms have been developed in recent years, they are often criticized for significant gaps with how humans produce and consume explanations. As a result, current XAI techniques are often found to be hard to use and lack effectiveness. In this work, we attempt to close these gaps by making AI explanations selective -- a fundamental property of hum… ▽ More While a vast collection of explainable AI (XAI) algorithms have been developed in recent years, they are often criticized for significant gaps with how humans produce and consume explanations. As a result, current XAI techniques are often found to be hard to use and lack effectiveness. In this work, we attempt to close these gaps by making AI explanations selective -- a fundamental property of human explanations -- by selectively presenting a subset from a large set of model reasons based on what aligns with the recipient's preferences. We propose a general framework for generating selective explanations by leveraging human input on a small sample. This framework opens up a rich design space that accounts for different selectivity goals, types of input, and more. As a showcase, we use a decision-support task to explore selective explanations based on what the decision-maker would consider relevant to the decision task. We conducted two experimental studies to examine three out of a broader possible set of paradigms based on our proposed framework: in Study 1, we ask the participants to provide their own input to generate selective explanations, with either open-ended or critique-based input. In Study 2, we show participants selective explanations based on input from a panel of similar users (annotators). Our experiments demonstrate the promise of selective explanations in reducing over-reliance on AI and improving decision outcomes and subjective perceptions of the AI, but also paint a nuanced picture that attributes some of these positive effects to the opportunity to provide one's own input to augment AI explanations. Overall, our work proposes a novel XAI framework inspired by human communication behaviors and demonstrates its potentials to encourage future work to better align AI explanations with human production and consumption of explanations. △ Less

Submitted 7 August, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

Comments: 21 pages, 25 figures

arXiv:2210.03419 [pdf, other]

Event Extraction: A Survey

Authors: Viet Dac Lai

Abstract: Extracting the reported events from text is one of the key research themes in natural language processing. This process includes several tasks such as event detection, argument extraction, role labeling. As one of the most important topics in natural language processing and natural language understanding, the applications of event extraction spans across a wide range of domains such as newswire, b… ▽ More Extracting the reported events from text is one of the key research themes in natural language processing. This process includes several tasks such as event detection, argument extraction, role labeling. As one of the most important topics in natural language processing and natural language understanding, the applications of event extraction spans across a wide range of domains such as newswire, biomedical domain, history and humanity, and cyber security. This report presents a comprehensive survey for event detection from textual documents. In this report, we provide the task definition, the evaluation method, as well as the benchmark datasets and a taxonomy of methodologies for event extraction. We also present our vision of future research direction in event detection. △ Less

Submitted 10 October, 2022; v1 submitted 7 October, 2022; originally announced October 2022.

Comments: 20 pages

arXiv:2206.06383 [pdf, other]

An Exploration of Post-Editing Effectiveness in Text Summarization

Authors: Vivian Lai, Alison Smith-Renner, Ke Zhang, Ruijia Cheng, Wenjuan Zhang, Joel Tetreault, Alejandro Jaimes

Abstract: Automatic summarization methods are efficient but can suffer from low quality. In comparison, manual summarization is expensive but produces higher quality. Can humans and AI collaborate to improve summarization performance? In similar text generation tasks (e.g., machine translation), human-AI collaboration in the form of "post-editing" AI-generated text reduces human workload and improves the qu… ▽ More Automatic summarization methods are efficient but can suffer from low quality. In comparison, manual summarization is expensive but produces higher quality. Can humans and AI collaborate to improve summarization performance? In similar text generation tasks (e.g., machine translation), human-AI collaboration in the form of "post-editing" AI-generated text reduces human workload and improves the quality of AI output. Therefore, we explored whether post-editing offers advantages in text summarization. Specifically, we conducted an experiment with 72 participants, comparing post-editing provided summaries with manual summarization for summary quality, human efficiency, and user experience on formal (XSum news) and informal (Reddit posts) text. This study sheds valuable insights on when post-editing is useful for text summarization: it helped in some cases (e.g., when participants lacked domain knowledge) but not in others (e.g., when provided summaries include inaccurate information). Participants' different editing strategies and needs for assistance offer implications for future human-AI summarization systems. △ Less

Submitted 13 June, 2022; originally announced June 2022.

Comments: 18 pages, 21 figures

arXiv:2204.12070 [pdf, other]

Symlink: A New Dataset for Scientific Symbol-Description Linking

Authors: Viet Dac Lai, Amir Pouran Ben Veyseh, Franck Dernoncourt, Thien Huu Nguyen

Abstract: Mathematical symbols and descriptions appear in various forms across document section boundaries without explicit markup. In this paper, we present a new large-scale dataset that emphasizes extracting symbols and descriptions in scientific documents. Symlink annotates scientific papers of 5 different domains (i.e., computer science, biology, physics, mathematics, and economics). Our experiments on… ▽ More Mathematical symbols and descriptions appear in various forms across document section boundaries without explicit markup. In this paper, we present a new large-scale dataset that emphasizes extracting symbols and descriptions in scientific documents. Symlink annotates scientific papers of 5 different domains (i.e., computer science, biology, physics, mathematics, and economics). Our experiments on Symlink demonstrate the challenges of the symbol-description linking task for existing models and call for further research effort in this area. We will publicly release Symlink to facilitate future research. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2202.09695

arXiv:2204.11788 [pdf, other]

Human-AI Collaboration via Conditional Delegation: A Case Study of Content Moderation

Authors: Vivian Lai, Samuel Carton, Rajat Bhatnagar, Q. Vera Liao, Yunfeng Zhang, Chenhao Tan

Abstract: Despite impressive performance in many benchmark datasets, AI models can still make mistakes, especially among out-of-distribution examples. It remains an open question how such imperfect models can be used effectively in collaboration with humans. Prior work has focused on AI assistance that helps people make individual high-stakes decisions, which is not scalable for a large amount of relatively… ▽ More Despite impressive performance in many benchmark datasets, AI models can still make mistakes, especially among out-of-distribution examples. It remains an open question how such imperfect models can be used effectively in collaboration with humans. Prior work has focused on AI assistance that helps people make individual high-stakes decisions, which is not scalable for a large amount of relatively low-stakes decisions, e.g., moderating social media comments. Instead, we propose conditional delegation as an alternative paradigm for human-AI collaboration where humans create rules to indicate trustworthy regions of a model. Using content moderation as a testbed, we develop novel interfaces to assist humans in creating conditional delegation rules and conduct a randomized experiment with two datasets to simulate in-distribution and out-of-distribution scenarios. Our study demonstrates the promise of conditional delegation in improving model performance and provides insights into design for this novel paradigm, including the effect of AI explanations. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: 18 pages, 44 figures

arXiv:2202.09695 [pdf, other]

SemEval 2022 Task 12: Symlink- Linking Mathematical Symbols to their Descriptions

Authors: Viet Dac Lai, Amir Pouran Ben Veyseh, Franck Dernoncourt, Thien Huu Nguyen

Abstract: Given the increasing number of livestreaming videos, automatic speech recognition and post-processing for livestreaming video transcripts are crucial for efficient data management as well as knowledge mining. A key step in this process is punctuation restoration which restores fundamental text structures such as phrase and sentence boundaries from the video transcripts. This work presents a new hu… ▽ More Given the increasing number of livestreaming videos, automatic speech recognition and post-processing for livestreaming video transcripts are crucial for efficient data management as well as knowledge mining. A key step in this process is punctuation restoration which restores fundamental text structures such as phrase and sentence boundaries from the video transcripts. This work presents a new human-annotated corpus, called BehancePR, for punctuation restoration in livestreaming video transcripts. Our experiments on BehancePR demonstrate the challenges of punctuation restoration for this domain. Furthermore, we show that popular natural language processing toolkits are incapable of detecting sentence boundary on non-punctuated transcripts of livestreaming videos, calling for more research effort to develop robust models for this area. △ Less

Submitted 24 April, 2022; v1 submitted 19 February, 2022; originally announced February 2022.

Comments: SemEval 2022 Task 12

arXiv:2202.06928 [pdf, other]

doi 10.1093/mnras/stac688

The X-ray spectral-timing contribution of the stellar wind in the hard state of Cyg X-1

Authors: E. V. Lai, B. De Marco, A. A. Zdziarski, T. M. Belloni, S. Mondal, P. Uttley, V. Grinberg, J. Wilms, A. Różańska

Abstract: The clumpy stellar wind from the companion star in high mass X-ray binaries causes variable, partial absorption of the emission from the X-ray source. We studied XMM-Newton observations from the 7.22 d-long "Cyg X-1 Hard state Observations of a Complete Binary Orbit in X-rays" (CHOCBOX) monitoring campaign, in order to constrain the effects of the stellar wind on the short-timescale X-ray spectral… ▽ More The clumpy stellar wind from the companion star in high mass X-ray binaries causes variable, partial absorption of the emission from the X-ray source. We studied XMM-Newton observations from the 7.22 d-long "Cyg X-1 Hard state Observations of a Complete Binary Orbit in X-rays" (CHOCBOX) monitoring campaign, in order to constrain the effects of the stellar wind on the short-timescale X-ray spectral-timing properties of the source. We find these properties to change significantly in the presence of the wind. In particular, the longest sampled timescales (corresponding to temporal frequencies of $ν\sim$ 0.1-1 Hz) reveal an enhancement of the fractional variability power, while on the shortest sampled timescales ($ν\sim$ 1-10 Hz) the variability is suppressed. In addition, we observe a reduction (by up to a factor of $\sim$ 1.8) of the otherwise high coherence between soft and hard band light curves, as well as of the amplitude of the hard X-ray lags intrinsic to the X-ray continuum. The observed increase of low frequency variability power can be explained in terms of variations of the wind column density as a consequence of motions of the intervening clumps. In this scenario (and assuming a terminal velocity of $v_{\infty}=2400\ {\rm km\ s^{-1}}$), we obtain an estimate of $l \sim$ 0.5-1.5 $\times 10^{-4} R_{\ast}$ for the average radial size of a clump. On the other hand, we suggest the behaviour at high frequencies to be due to scattering in an optically thicker medium, possibly formed by collision of the stellar wind with the edge of the disc. △ Less

Submitted 14 February, 2022; originally announced February 2022.

Comments: 16 pages, 13 figures

arXiv:2112.11471 [pdf, other]

Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies

Authors: Vivian Lai, Chacha Chen, Q. Vera Liao, Alison Smith-Renner, Chenhao Tan

Abstract: As AI systems demonstrate increasingly strong predictive performance, their adoption has grown in numerous domains. However, in high-stakes domains such as criminal justice and healthcare, full automation is often not desirable due to safety, ethical, and legal concerns, yet fully manual approaches can be inaccurate and time consuming. As a result, there is growing interest in the research communi… ▽ More As AI systems demonstrate increasingly strong predictive performance, their adoption has grown in numerous domains. However, in high-stakes domains such as criminal justice and healthcare, full automation is often not desirable due to safety, ethical, and legal concerns, yet fully manual approaches can be inaccurate and time consuming. As a result, there is growing interest in the research community to augment human decision making with AI assistance. Besides developing AI technologies for this purpose, the emerging field of human-AI decision making must embrace empirical approaches to form a foundational understanding of how humans interact and work with AI to make decisions. To invite and help structure research efforts towards a science of understanding and improving human-AI decision making, we survey recent literature of empirical human-subject studies on this topic. We summarize the study design choices made in over 100 papers in three important aspects: (1) decision tasks, (2) AI models and AI assistance elements, and (3) evaluation metrics. For each aspect, we summarize current trends, discuss gaps in current practices of the field, and make a list of recommendations for future research. Our survey highlights the need to develop common frameworks to account for the design and research spaces of human-AI decision making, so that researchers can make rigorous choices in study design, and the research community can build on each other's work and produce generalizable scientific knowledge. We also hope this survey will serve as a bridge for HCI and AI communities to work together to mutually shape the empirical science and computational technologies for human-AI decision making. △ Less

Submitted 21 December, 2021; originally announced December 2021.

Comments: 36 pages, 2 figures, see https://haidecisionmaking.github.io for website

arXiv:2105.07949 [pdf, other]

Using Transformers to Provide Teachers with Personalized Feedback on their Classroom Discourse: The TalkMoves Application

Authors: Abhijit Suresh, Jennifer Jacobs, Vivian Lai, Chenhao Tan, Wayne Ward, James H. Martin, Tamara Sumner

Abstract: TalkMoves is an innovative application designed to support K-12 mathematics teachers to reflect on, and continuously improve their instructional practices. This application combines state-of-the-art natural language processing capabilities with automated speech recognition to automatically analyze classroom recordings and provide teachers with personalized feedback on their use of specific types o… ▽ More TalkMoves is an innovative application designed to support K-12 mathematics teachers to reflect on, and continuously improve their instructional practices. This application combines state-of-the-art natural language processing capabilities with automated speech recognition to automatically analyze classroom recordings and provide teachers with personalized feedback on their use of specific types of discourse aimed at broadening and deepening classroom conversations about mathematics. These specific discourse strategies are referred to as "talk moves" within the mathematics education community and prior research has documented the ways in which systematic use of these discourse strategies can positively impact student engagement and learning. In this article, we describe the TalkMoves application's cloud-based infrastructure for managing and processing classroom recordings, and its interface for providing teachers with feedback on their use of talk moves during individual teaching episodes. We present the series of model architectures we developed, and the studies we conducted, to develop our best-performing, transformer-based model (F1 = 79.3%). We also discuss several technical challenges that need to be addressed when working with real-world speech and language data from noisy K-12 classrooms. △ Less

Submitted 29 April, 2021; originally announced May 2021.

Comments: Presented at the AAAI 2021 Spring Symposium on Artificial Intelligence for K-12 Education

arXiv:2103.09330 [pdf, other]

Cross-Task Instance Representation Interactions and Label Dependencies for Joint Information Extraction with Graph Convolutional Networks

Authors: Minh Van Nguyen, Viet Dac Lai, Thien Huu Nguyen

Abstract: Existing works on information extraction (IE) have mainly solved the four main tasks separately (entity mention recognition, relation extraction, event trigger detection, and argument extraction), thus failing to benefit from inter-dependencies between tasks. This paper presents a novel deep learning model to simultaneously solve the four tasks of IE in a single model (called FourIE). Compared to… ▽ More Existing works on information extraction (IE) have mainly solved the four main tasks separately (entity mention recognition, relation extraction, event trigger detection, and argument extraction), thus failing to benefit from inter-dependencies between tasks. This paper presents a novel deep learning model to simultaneously solve the four tasks of IE in a single model (called FourIE). Compared to few prior work on jointly performing four IE tasks, FourIE features two novel contributions to capture inter-dependencies between tasks. First, at the representation level, we introduce an interaction graph between instances of the four tasks that is used to enrich the prediction representation for one instance with those from related instances of other tasks. Second, at the label level, we propose a dependency graph for the information types in the four IE tasks that captures the connections between the types expressed in an input sentence. A new regularization mechanism is introduced to enforce the consistency between the golden and predicted type dependency graphs to improve representation learning. We show that the proposed model achieves the state-of-the-art performance for joint IE on both monolingual and multilingual learning settings with three different languages. △ Less

Submitted 26 March, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

Comments: Accepted at NAACL-HLT 2021

arXiv:2102.07811 [pdf, other]

doi 10.1051/0004-6361/202140567

The inner flow geometry in MAXI J1820+070 during hard and hard-intermediate states

Authors: B. De Marco, A. A. Zdziarski, G. Ponti, G. Migliori, T. M. Belloni, A. Segovia Otero, M. Dziełak, E. V. Lai

Abstract: [Abridged] Context: We present a systematic X-ray spectral-timing study of the recently discovered, exceptionally bright black hole X-ray binary system MAXI J1820+070. Our analysis focuses on the first part of the 2018 outburst, covering the rise throughout the hard state, the bright hard and hard-intermediate states, and the transition to the soft-intermediate state. Aims: We address the issue of… ▽ More [Abridged] Context: We present a systematic X-ray spectral-timing study of the recently discovered, exceptionally bright black hole X-ray binary system MAXI J1820+070. Our analysis focuses on the first part of the 2018 outburst, covering the rise throughout the hard state, the bright hard and hard-intermediate states, and the transition to the soft-intermediate state. Aims: We address the issue of constraining the geometry of the innermost accretion flow and its evolution throughout an outburst. Methods: We employed two independent X-ray spectral-timing methods applied to the NICER data of MAXI J1820+070. We first identified and tracked the evolution of a characteristic frequency of soft X-ray reverberation lags. Then, we studied the spectral evolution of the quasi-thermal component responsible for the observed thermal reverberation lags. Results: The frequency of thermal reverberation lags steadily increases throughout most of the outburst, implying that the relative distance between the X-ray source and the disc decreases as the source softens. However, near transition this evolution breaks, showing a sudden increase (decrease) of lag amplitude (frequency). The temperature of the quasi-thermal component in covariance spectra consistently increases throughout all the analysed observations. Conclusions: The behaviour of thermal reverberation lags near transition might be related to the relativistic plasma ejections detected at radio wavelengths, suggesting a causal connection between the two phenomena. Throughout most of the hard and hard-intermediate states the disc is consistent with being truncated (with an inner radius $R_{\rm in}>\sim 10 R_{\rm g}$), reaching close to the innermost stable circular orbit only near transition. △ Less

Submitted 6 August, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

Comments: Accepted for publication in Astronomy & Astrophysics, matches published version

Journal ref: A&A 654, A14 (2021)

arXiv:2101.05303 [pdf, other]

doi 10.1145/3479552

Understanding the Effect of Out-of-distribution Examples and Interactive Explanations on Human-AI Decision Making

Authors: Han Liu, Vivian Lai, Chenhao Tan

Abstract: Although AI holds promise for improving human decision making in societally critical domains, it remains an open question how human-AI teams can reliably outperform AI alone and human alone in challenging prediction tasks (also known as complementary performance). We explore two directions to understand the gaps in achieving complementary performance. First, we argue that the typical experimental… ▽ More Although AI holds promise for improving human decision making in societally critical domains, it remains an open question how human-AI teams can reliably outperform AI alone and human alone in challenging prediction tasks (also known as complementary performance). We explore two directions to understand the gaps in achieving complementary performance. First, we argue that the typical experimental setup limits the potential of human-AI teams. To account for lower AI performance out-of-distribution than in-distribution because of distribution shift, we design experiments with different distribution types and investigate human performance for both in-distribution and out-of-distribution examples. Second, we develop novel interfaces to support interactive explanations so that humans can actively engage with AI assistance. Using virtual pilot studies and large-scale randomized experiments across three tasks, we demonstrate a clear difference between in-distribution and out-of-distribution, and observe mixed results for interactive explanations: while interactive explanations improve human perception of AI assistance's usefulness, they may reinforce human biases and lead to limited performance improvement. Overall, our work points out critical challenges and future directions towards enhancing human performance with AI assistance. △ Less

Submitted 5 October, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

Comments: 45 pages, 24 figures, accepted to CSCW 2021

arXiv:2101.03289 [pdf, other]

Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing

Authors: Minh Van Nguyen, Viet Dac Lai, Amir Pouran Ben Veyseh, Thien Huu Nguyen

Abstract: We introduce Trankit, a light-weight Transformer-based Toolkit for multilingual Natural Language Processing (NLP). It provides a trainable pipeline for fundamental NLP tasks over 100 languages, and 90 pretrained pipelines for 56 languages. Built on a state-of-the-art pretrained language model, Trankit significantly outperforms prior multilingual NLP pipelines over sentence segmentation, part-of-sp… ▽ More We introduce Trankit, a light-weight Transformer-based Toolkit for multilingual Natural Language Processing (NLP). It provides a trainable pipeline for fundamental NLP tasks over 100 languages, and 90 pretrained pipelines for 56 languages. Built on a state-of-the-art pretrained language model, Trankit significantly outperforms prior multilingual NLP pipelines over sentence segmentation, part-of-speech tagging, morphological feature tagging, and dependency parsing while maintaining competitive performance for tokenization, multi-word token expansion, and lemmatization over 90 Universal Dependencies treebanks. Despite the use of a large pretrained transformer, our toolkit is still efficient in memory usage and speed. This is achieved by our novel plug-and-play mechanism with Adapters where a multilingual pretrained transformer is shared across pipelines for different languages. Our toolkit along with pretrained models and code are publicly available at: https://github.com/nlp-uoregon/trankit. A demo website for our toolkit is also available at: http://nlp.uoregon.edu/trankit. Finally, we create a demo video for Trankit at: https://youtu.be/q0KGP3zGjGc. △ Less

Submitted 14 October, 2021; v1 submitted 8 January, 2021; originally announced January 2021.

Comments: Camera-ready version for EACL 2021 Demo

arXiv:2010.14123 [pdf, ps, other]

Event Detection: Gate Diversity and Syntactic Importance Scoresfor Graph Convolution Neural Networks

Authors: Viet Dac Lai, Tuan Ngo Nguyen, Thien Huu Nguyen

Abstract: Recent studies on event detection (ED) haveshown that the syntactic dependency graph canbe employed in graph convolution neural net-works (GCN) to achieve state-of-the-art per-formance. However, the computation of thehidden vectors in such graph-based models isagnostic to the trigger candidate words, po-tentially leaving irrelevant information for thetrigger candidate for event prediction. In addi… ▽ More Recent studies on event detection (ED) haveshown that the syntactic dependency graph canbe employed in graph convolution neural net-works (GCN) to achieve state-of-the-art per-formance. However, the computation of thehidden vectors in such graph-based models isagnostic to the trigger candidate words, po-tentially leaving irrelevant information for thetrigger candidate for event prediction. In addi-tion, the current models for ED fail to exploitthe overall contextual importance scores of thewords, which can be obtained via the depen-dency tree, to boost the performance. In thisstudy, we propose a novel gating mechanismto filter noisy information in the hidden vec-tors of the GCN models for ED based on theinformation from the trigger candidate. Wealso introduce novel mechanisms to achievethe contextual diversity for the gates and theimportance score consistency for the graphsand models in ED. The experiments show thatthe proposed model achieves state-of-the-artperformance on two ED datasets △ Less

Submitted 27 October, 2020; originally announced October 2020.

Comments: EMNLP 2020

arXiv:2008.02178 [pdf, other]

doi 10.1051/0004-6361/202038684

An extreme Ultraluminous X-ray source X-1 in NGC 5055

Authors: Samaresh Mondal, Agata Rozanska, Eleonora Veronica Lai, Barbara De Marco

Abstract: Aims. We analyzed multi-epoch X-ray data of the Ultraluminous X-ray source (ULX) NGC 5055 X-1, with luminosity up to $2.32\times10^{40}\ \rm erg\ s^{-1}$, in order to constrain the physical parameters of the source. Methods. We performed timing and spectral analysis of Chandra and XMM-Newton observations. We used spectral models which assume the emission is from an accreting black hole system. We… ▽ More Aims. We analyzed multi-epoch X-ray data of the Ultraluminous X-ray source (ULX) NGC 5055 X-1, with luminosity up to $2.32\times10^{40}\ \rm erg\ s^{-1}$, in order to constrain the physical parameters of the source. Methods. We performed timing and spectral analysis of Chandra and XMM-Newton observations. We used spectral models which assume the emission is from an accreting black hole system. We fit the data with a multicolor disk (MCD) combined with a powerlaw (PL) or a thermal Comptonization (NTHCOMP) component, and compared those fits with a slim disk model. Results. The lightcurves of the source do not show significant variability. From the hardness ratios (3-10 keV/0.3-3 keV flux) we infer that the source is not spectrally variable. We found that the photon index is tightly, positively correlated with the unabsorbed 0.3-10 keV flux and the hydrogen column density. Furthermore, the temperature emissivity profile indicates a deviation from the standard sub-Eddington thin disk model. The source shows an inverse correlation between luminosity and inner disk temperature in all fitted models. Conclusions. Our analysis favors the source to be in an ultraluminous soft state. The positive correlations between the photon index and the flux, and between the photon index and the hydrogen column density may suggest the source is accreting at high Eddington ratios and might indicate the presence of a wind. The inverse luminosity relation with the inner disk temperature for all spectral models may indicate that the emission is geometrically beamed by an optically thick outflow. △ Less

Submitted 5 August, 2020; originally announced August 2020.

Comments: 8 pages, 10 figures, Accepted for publication in A&A

Journal ref: A&A 642, A94 (2020)

arXiv:2006.10093 [pdf, ps, other]

Extensively Matching for Few-shot Learning Event Detection

Authors: Viet Dac Lai, Franck Dernoncourt, Thien Huu Nguyen

Abstract: Current event detection models under super-vised learning settings fail to transfer to newevent types. Few-shot learning has not beenexplored in event detection even though it al-lows a model to perform well with high gener-alization on new event types. In this work, weformulate event detection as a few-shot learn-ing problem to enable to extend event detec-tion to new event types. We propose two… ▽ More Current event detection models under super-vised learning settings fail to transfer to newevent types. Few-shot learning has not beenexplored in event detection even though it al-lows a model to perform well with high gener-alization on new event types. In this work, weformulate event detection as a few-shot learn-ing problem to enable to extend event detec-tion to new event types. We propose two novelloss factors that matching examples in the sup-port set to provide more training signals to themodel. Moreover, these training signals can beapplied in many metric-based few-shot learn-ing models. Our extensive experiments on theACE-2005 dataset (under a few-shot learningsetting) show that the proposed method can im-prove the performance of few-shot learning △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: 1st Joint Workshop on Narrative Understanding, Storylines, and Events (NUSE) @ ACL 2020

arXiv:2003.07370 [pdf, ps, other]

Harnessing Explanations to Bridge AI and Humans

Authors: Vivian Lai, Samuel Carton, Chenhao Tan

Abstract: Machine learning models are increasingly integrated into societally critical applications such as recidivism prediction and medical diagnosis, thanks to their superior predictive power. In these applications, however, full automation is often not desired due to ethical and legal concerns. The research community has thus ventured into developing interpretable methods that explain machine prediction… ▽ More Machine learning models are increasingly integrated into societally critical applications such as recidivism prediction and medical diagnosis, thanks to their superior predictive power. In these applications, however, full automation is often not desired due to ethical and legal concerns. The research community has thus ventured into developing interpretable methods that explain machine predictions. While these explanations are meant to assist humans in understanding machine predictions and thereby allowing humans to make better decisions, this hypothesis is not supported in many recent studies. To improve human decision-making with AI assistance, we propose future directions for closing the gap between the efficacy of explanations and improvement in human performance. △ Less

Submitted 16 March, 2020; originally announced March 2020.

Comments: 4 pages, CHI 2020 Fair & Responsible AI Workshop

arXiv:2002.05295 [pdf, ps, other]

doi 10.1007/978-3-030-47436-2_18

Exploiting the Matching Information in the Support Set for Few Shot Event Classification

Authors: Viet Dac Lai, Franck Dernoncourt, Thien Huu Nguyen

Abstract: The existing event classification (EC) work primarily focuseson the traditional supervised learning setting in which models are unableto extract event mentions of new/unseen event types. Few-shot learninghas not been investigated in this area although it enables EC models toextend their operation to unobserved event types. To fill in this gap, inthis work, we investigate event classification under… ▽ More The existing event classification (EC) work primarily focuseson the traditional supervised learning setting in which models are unableto extract event mentions of new/unseen event types. Few-shot learninghas not been investigated in this area although it enables EC models toextend their operation to unobserved event types. To fill in this gap, inthis work, we investigate event classification under the few-shot learningsetting. We propose a novel training method for this problem that exten-sively exploit the support set during the training process of a few-shotlearning model. In particular, in addition to matching the query exam-ple with those in the support set for training, we seek to further matchthe examples within the support set themselves. This method providesmore training signals for the models and can be applied to every metric-learning-based few-shot learning methods. Our extensive experiments ontwo benchmark EC datasets show that the proposed method can improvethe best reported few-shot learning models by up to 10% on accuracyfor event classification △ Less

Submitted 19 June, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

Comments: Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2020

arXiv:2001.05871 [pdf, other]

doi 10.1145/10.1145/3313831.3376873

"Why is 'Chicago' deceptive?" Towards Building Model-Driven Tutorials for Humans

Authors: Vivian Lai, Han Liu, Chenhao Tan

Abstract: To support human decision making with machine learning models, we often need to elucidate patterns embedded in the models that are unsalient, unknown, or counterintuitive to humans. While existing approaches focus on explaining machine predictions with real-time assistance, we explore model-driven tutorials to help humans understand these patterns in a training phase. We consider both tutorials wi… ▽ More To support human decision making with machine learning models, we often need to elucidate patterns embedded in the models that are unsalient, unknown, or counterintuitive to humans. While existing approaches focus on explaining machine predictions with real-time assistance, we explore model-driven tutorials to help humans understand these patterns in a training phase. We consider both tutorials with guidelines from scientific papers, analogous to current practices of science communication, and automatically selected examples from training data with explanations. We use deceptive review detection as a testbed and conduct large-scale, randomized human-subject experiments to examine the effectiveness of such tutorials. We find that tutorials indeed improve human performance, with and without real-time assistance. In particular, although deep learning provides superior predictive performance than simple models, tutorials and explanations from simple models are more useful to humans. Our work suggests future directions for human-centered tutorials and explanations towards a synergy between humans and AI. △ Less

Submitted 14 January, 2020; originally announced January 2020.

Comments: 26 pages, 48 figures, CHI 2020

arXiv:1910.11368 [pdf, ps, other]

Extending Event Detection to New Types with Learning from Keywords

Authors: Viet Dac Lai, Thien Huu Nguyen

Abstract: Traditional event detection classifies a word or a phrase in a given sentence for a set of predefined event types. The limitation of such predefined set is that it prevents the adaptation of the event detection models to new event types. We study a novel formulation of event detection that describes types via several keywords to match the contexts in documents. This facilitates the operation of th… ▽ More Traditional event detection classifies a word or a phrase in a given sentence for a set of predefined event types. The limitation of such predefined set is that it prevents the adaptation of the event detection models to new event types. We study a novel formulation of event detection that describes types via several keywords to match the contexts in documents. This facilitates the operation of the models to new types. We introduce a novel feature-based attention mechanism for convolutional neural networks for event detection in the new formulation. Our extensive experiments demonstrate the benefits of the new formulation for new type extension for event detection as well as the proposed attention mechanism for this problem. △ Less

Submitted 24 October, 2019; originally announced October 2019.

arXiv:1910.08534 [pdf, other]

Many Faces of Feature Importance: Comparing Built-in and Post-hoc Feature Importance in Text Classification

Authors: Vivian Lai, Jon Z. Cai, Chenhao Tan

Abstract: Feature importance is commonly used to explain machine predictions. While feature importance can be derived from a machine learning model with a variety of methods, the consistency of feature importance via different methods remains understudied. In this work, we systematically compare feature importance from built-in mechanisms in a model such as attention values and post-hoc methods that approxi… ▽ More Feature importance is commonly used to explain machine predictions. While feature importance can be derived from a machine learning model with a variety of methods, the consistency of feature importance via different methods remains understudied. In this work, we systematically compare feature importance from built-in mechanisms in a model such as attention values and post-hoc methods that approximate model behavior such as LIME. Using text classification as a testbed, we find that 1) no matter which method we use, important features from traditional models such as SVM and XGBoost are more similar with each other, than with deep learning models; 2) post-hoc methods tend to generate more similar important features for two models than built-in methods. We further demonstrate how such similarity varies across instances. Notably, important features do not always resemble each other better when two models agree on the predicted label than when they disagree. △ Less

Submitted 18 October, 2019; originally announced October 2019.

Comments: 17 pages, 18 figures, EMNLP 2019, the code is available at https://vivlai.github.io/

arXiv:1906.05398 [pdf]

Self-driving laboratory for accelerated discovery of thin-film materials

Authors: Benjamin P. MacLeod, Fraser G. L. Parlane, Thomas D. Morrissey, Florian Häse, Loïc M. Roch, Kevan E. Dettelbach, Raphaell Moreira, Lars P. E. Yunker, Michael B. Rooney, Joseph R. Deeth, Veronica Lai, Gordon J. Ng, Henry Situ, Ray H. Zhang, Michael S. Elliott, Ted H. Haley, David J. Dvorak, Alán Aspuru-Guzik, Jason E. Hein, Curtis P. Berlinguette

Abstract: Discovering and optimizing commercially viable materials for clean energy applications typically takes over a decade. Self-driving laboratories that iteratively design, execute, and learn from material science experiments in a fully autonomous loop present an opportunity to accelerate this research. We report here a modular robotic platform driven by a model-based optimization algorithm capable of… ▽ More Discovering and optimizing commercially viable materials for clean energy applications typically takes over a decade. Self-driving laboratories that iteratively design, execute, and learn from material science experiments in a fully autonomous loop present an opportunity to accelerate this research. We report here a modular robotic platform driven by a model-based optimization algorithm capable of autonomously optimizing the optical and electronic properties of thin-film materials by modifying the film composition and processing conditions. We demonstrate this platform by using it to maximize the hole mobility of organic hole transport materials commonly used in perovskite solar cells and consumer electronics. This demonstration highlights the possibilities of using autonomous laboratories to discover organic and inorganic materials relevant to materials sciences and clean energy technologies. △ Less

Submitted 10 March, 2020; v1 submitted 12 June, 2019; originally announced June 2019.

Comments: 43 pages, 9 figures

arXiv:1811.07901 [pdf, other]

doi 10.1145/3287560.3287590

On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection

Authors: Vivian Lai, Chenhao Tan

Abstract: Humans are the final decision makers in critical tasks that involve ethical and legal concerns, ranging from recidivism prediction, to medical diagnosis, to fighting against fake news. Although machine learning models can sometimes achieve impressive performance in these tasks, these tasks are not amenable to full automation. To realize the potential of machine learning for improving human decisio… ▽ More Humans are the final decision makers in critical tasks that involve ethical and legal concerns, ranging from recidivism prediction, to medical diagnosis, to fighting against fake news. Although machine learning models can sometimes achieve impressive performance in these tasks, these tasks are not amenable to full automation. To realize the potential of machine learning for improving human decisions, it is important to understand how assistance from machine learning models affects human performance and human agency. In this paper, we use deception detection as a testbed and investigate how we can harness explanations and predictions of machine learning models to improve human performance while retaining human agency. We propose a spectrum between full human agency and full automation, and develop varying levels of machine assistance along the spectrum that gradually increase the influence of machine predictions. We find that without showing predicted labels, explanations alone slightly improve human performance in the end task. In comparison, human performance is greatly improved by showing predicted labels (>20% relative improvement) and can be further improved by explicitly suggesting strong machine performance. Interestingly, when predicted labels are shown, explanations of machine predictions induce a similar level of accuracy as an explicit statement of strong machine performance. Our results demonstrate a tradeoff between human performance and human agency and show that explanations of machine predictions can moderate this tradeoff. △ Less

Submitted 8 January, 2019; v1 submitted 19 November, 2018; originally announced November 2018.

Comments: 17 pages, 19 figures, in Proceedings of ACM FAT* 2019, dataset & demo available at https://deception.machineintheloop.com

arXiv:1611.05339 [pdf]

CareerMapper: An Automated Resume Evaluation Tool

Authors: Vivian Lai, Kyong Jin Shim, Richard J. Oentaryo, Philips K. Prasetyo, Casey Vu, Ee-Peng Lim, David Lo

Abstract: The advent of the Web brought about major changes in the way people search for jobs and companies look for suitable candidates. As more employers and recruitment firms turn to the Web for job candidate search, an increasing number of people turn to the Web for uploading and creating their online resumes. Resumes are often the first source of information about candidates and also the first item of… ▽ More The advent of the Web brought about major changes in the way people search for jobs and companies look for suitable candidates. As more employers and recruitment firms turn to the Web for job candidate search, an increasing number of people turn to the Web for uploading and creating their online resumes. Resumes are often the first source of information about candidates and also the first item of evaluation in candidate selection. Thus, it is imperative that resumes are complete, free of errors and well-organized. We present an automated resume evaluation tool called "CareerMapper". Our tool is designed to conduct a thorough review of a user's LinkedIn profile and provide best recommendations for improved online resumes by analyzing a large number of online user profiles. △ Less

Submitted 16 November, 2016; originally announced November 2016.

Journal ref: Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2016)

Showing 1–45 of 45 results for author: Lai, V