Skip to main content

Showing 1–18 of 18 results for author: Soon, L

  1. arXiv:2407.01374  [pdf, other

    cs.CL

    Bridging the Gap: Transfer Learning from English PLMs to Malaysian English

    Authors: Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam

    Abstract: Malaysian English is a low resource creole language, where it carries the elements of Malay, Chinese, and Tamil languages, in addition to Standard English. Named Entity Recognition (NER) models underperform when capturing entities from Malaysian English text due to its distinctive morphosyntactic adaptations, semantic features and code-switching (mixing English and Malay). Considering these gaps,… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted in 9th Workshop on Representation Learning for NLP (Rep4NLP) at ACL 2024

  2. arXiv:2406.13217  [pdf, other

    cs.CL

    Bridging Law and Data: Augmenting Reasoning via a Semi-Structured Dataset with IRAC methodology

    Authors: Xiaoxi Kang, Lizhen Qu, Lay-Ki Soon, Zhuang Li, Adnan Trakic

    Abstract: The effectiveness of Large Language Models (LLMs) in legal reasoning is often limited due to the unique legal terminologies and the necessity for highly specialized knowledge. These limitations highlight the need for high-quality data tailored for complex legal reasoning tasks. This paper introduces LEGALSEMI, a benchmark specifically curated for legal scenario analysis. LEGALSEMI comprises 54 leg… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  3. arXiv:2403.01616  [pdf, other

    cs.CL

    Towards Comprehensive Vietnamese Retrieval-Augmented Generation and Large Language Models

    Authors: Nguyen Quang Duc, Le Hai Son, Nguyen Duc Nhan, Nguyen Dich Nhat Minh, Le Thanh Huong, Dinh Viet Sang

    Abstract: This paper presents our contributions towards advancing the state of Vietnamese language understanding and generation through the development and dissemination of open datasets and pre-trained models for Vietnamese Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs).

    Submitted 5 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

  4. arXiv:2402.14521  [pdf, other

    cs.CL

    Malaysian English News Decoded: A Linguistic Resource for Named Entity and Relation Extraction

    Authors: Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam

    Abstract: Standard English and Malaysian English exhibit notable differences, posing challenges for natural language processing (NLP) tasks on Malaysian English. Unfortunately, most of the existing datasets are mainly based on standard English and therefore inadequate for improving NLP tasks in Malaysian English. An experiment using state-of-the-art Named Entity Recognition (NER) solutions on Malaysian Engl… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted at LREC-COLING 2024

  5. arXiv:2402.11178  [pdf, other

    cs.CL

    RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations

    Authors: Haolan Zhan, Zhuang Li, Xiaoxi Kang, Tao Feng, Yuncheng Hua, Lizhen Qu, Yi Ying, Mei Rianto Chandra, Kelly Rosalin, Jureynolds Jureynolds, Suraj Sharma, Shilin Qu, Linhao Luo, Lay-Ki Soon, Zhaleh Semnani Azad, Ingrid Zukerman, Gholamreza Haffari

    Abstract: Norm violations occur when individuals fail to conform to culturally accepted behaviors, which may lead to potential conflicts. Remediating norm violations requires social awareness and cultural sensitivity of the nuances at play. To equip interactive AI systems with a remediation ability, we offer ReNoVi - a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms, as well as… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: work in progress. 15 pages, 7 figures

  6. arXiv:2311.16603  [pdf, other

    cs.DS cs.IR

    l2Match: Optimization Techniques on Subgraph Matching Algorithm using Label Pair, Neighboring Label Index, and Jump-Redo method

    Authors: C. Q. Cheng, K. S. Wong, L. K. Soon

    Abstract: Graph database is designed to store bidirectional relationships between objects and facilitate the traversal process to extract a subgraph. However, the subgraph matching process is an NP-Complete problem. Existing solutions to this problem usually employ a filter-and-verification framework and a divide-and-conquer method. The filter-and-verification framework minimizes the number of inputs to the… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: This short version of this article (6 pages) is accepted by ICEIC 2024

    MSC Class: 05C60 (Primary); 05C30 (Secondary); 68R10 ACM Class: G.4.1; H.3.3

  7. arXiv:2311.11583  [pdf, other

    cs.CL

    How well ChatGPT understand Malaysian English? An Evaluation on Named Entity Recognition and Relation Extraction

    Authors: Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam

    Abstract: Recently, ChatGPT has attracted a lot of interest from both researchers and the general public. While the performance of ChatGPT in named entity recognition and relation extraction from Standard English texts is satisfactory, it remains to be seen if it can perform similarly for Malaysian English. Malaysian English is unique as it exhibits morphosyntactic and semantical adaptation from local conte… ▽ More

    Submitted 28 June, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: Accepted in Generation, Evaluation & Metrics (GEM) Workshop at EMNLP 2023

  8. Can ChatGPT Perform Reasoning Using the IRAC Method in Analyzing Legal Scenarios Like a Lawyer?

    Authors: Xiaoxi Kang, Lizhen Qu, Lay-Ki Soon, Adnan Trakic, Terry Yue Zhuo, Patrick Charles Emerton, Genevieve Grant

    Abstract: Large Language Models (LLMs), such as ChatGPT, have drawn a lot of attentions recently in the legal domain due to its emergent ability to tackle a variety of legal tasks. However, it is still unknown if LLMs are able to analyze a legal case and perform reasoning in the same manner as lawyers. Therefore, we constructed a novel corpus consisting of scenarios pertain to Contract Acts Malaysia and Aus… ▽ More

    Submitted 2 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Findings

    Report number: 2023.findings-emnlp.929

    Journal ref: 2023.findings-emnlp.929

  9. arXiv:2304.12026  [pdf, other

    cs.CL

    SocialDial: A Benchmark for Socially-Aware Dialogue Systems

    Authors: Haolan Zhan, Zhuang Li, Yufei Wang, Linhao Luo, Tao Feng, Xiaoxi Kang, Yuncheng Hua, Lizhen Qu, Lay-Ki Soon, Suraj Sharma, Ingrid Zukerman, Zhaleh Semnani-Azad, Gholamreza Haffari

    Abstract: Dialogue systems have been widely applied in many scenarios and are now more powerful and ubiquitous than ever before. With large neural models and massive available data, current dialogue systems have access to more knowledge than any people in their life. However, current dialogue systems still do not perform at a human level. One major gap between conversational agents and humans lies in their… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: Accepted by SIGIR 2023

  10. arXiv:2205.00387  [pdf, other

    cs.CL cs.IR

    Crude Oil-related Events Extraction and Processing: A Transfer Learning Approach

    Authors: Meisin Lee, Lay-Ki Soon, Eu-Gene Siew

    Abstract: One of the challenges in event extraction via traditional supervised learning paradigm is the need for a sizeable annotated dataset to achieve satisfactory model performance. It is even more challenging when it comes to event extraction in the finance and economics domain, a domain with considerably fewer resources. This paper presents a complete framework for extracting and processing crude oil-r… ▽ More

    Submitted 30 April, 2022; originally announced May 2022.

    MSC Class: 68 ACM Class: H.3; I.2

  11. arXiv:2204.03871  [pdf, other

    cs.CL

    CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction

    Authors: Meisin Lee, Lay-Ki Soon, Eu-Gene Siew, Ly Fie Sugianto

    Abstract: In this paper, we present CrudeOilNews, a corpus of English Crude Oil news for event extraction. It is the first of its kind for Commodity News and serve to contribute towards resource building for economic and financial text mining. This paper describes the data collection process, the annotation methodology and the event typology used in producing the corpus. Firstly, a seed set of 175 news arti… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Accepted at LREC 2022. arXiv admin note: text overlap with arXiv:2105.08214

    MSC Class: 68T99 ACM Class: I.2.7

  12. arXiv:2109.12781  [pdf, other

    cs.CL cs.AI

    Effective Use of Graph Convolution Network and Contextual Sub-Tree forCommodity News Event Extraction

    Authors: Meisin Lee, Lay-Ki Soon, Eu-Gene Siew

    Abstract: Event extraction in commodity news is a less researched area as compared to generic event extraction. However, accurate event extraction from commodity news is useful in abroad range of applications such as under-standing event chains and learning event-event relations, which can then be used for commodity price prediction. The events found in commodity news exhibit characteristics different from… ▽ More

    Submitted 26 September, 2021; originally announced September 2021.

    Comments: Accepted in ECONLP workshop at EMNLP 2021

    ACM Class: I.7; H.4; H.5

  13. arXiv:2105.08214  [pdf, other

    cs.CL

    An Annotated Commodity News Corpus for Event Extraction

    Authors: Meisin Lee, Lay-Ki Soon, Eu-Gene Siew, Ly Fie Sugianto

    Abstract: Commodity News contains a wealth of information such as sum-mary of the recent commodity price movement and notable events that led tothe movement. Through event extraction, useful information extracted fromcommodity news is extremely useful in mining for causal relation betweenevents and commodity price movement, which can be used for commodity priceprediction. To facilitate the future research,… ▽ More

    Submitted 24 August, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

    Comments: Submitted to journal, currently under review

    MSC Class: 68T99 ACM Class: I.2.7

  14. arXiv:2003.00991  [pdf, other

    eess.SP cs.SD eess.AS

    Uniform Array with Broadband Beamforming for Arbitrary Beam Patterns

    Authors: Phan Le Son

    Abstract: Broadband beamforming is a technique to obtain the signal with a wide range of frequencies. It maintains the signal integrity and spatial selectivity over frequencies. This is important in several applications such as microphone array, sonar array or radar where the operation range of the signal is several octaves. Using uniform array for broadband beamforming is the old topic but there is no avai… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

  15. arXiv:1811.00697  [pdf, other

    cs.IR

    Noise Contrastive Estimation for Scalable Linear Models for One-Class Collaborative Filtering

    Authors: Ga Wu, Maksims Volkovs, Chee Loong Soon, Scott Sanner, Himanshu Rai

    Abstract: Previous highly scalable one-class collaborative filtering methods such as Projected Linear Recommendation (PLRec) have advocated using fast randomized SVD to embed items into a latent space, followed by linear regression methods to learn personalized recommendation models per user. Unfortunately, naive SVD embedding methods often exhibit a popularity bias that skews the ability to accurately embe… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

    Comments: 8 pages

  16. arXiv:1602.08447  [pdf

    cs.AI

    A Neutrosophic Recommender System for Medical Diagnosis Based on Algebraic Neutrosophic Measures

    Authors: Mumtaz Ali, Nguyen Van Minh, Le Hoang Son

    Abstract: Neutrosophic set has the ability to handle uncertain, incomplete, inconsistent, indeterminate information in a more accurate way. In this paper, we proposed a neutrosophic recommender system to predict the diseases based on neutrosophic set which includes single-criterion neutrosophic recommender system (SC-NRS) and multi-criterion neutrosophic recommender system (MC-NRS). Further, we investigated… ▽ More

    Submitted 24 February, 2016; originally announced February 2016.

    Comments: Keywords: Medical diagnosis, neutrosophic set, neutrosophic recommender system, non-linear regression model

  17. Prefix-based Labeling Annotation for Effective XML Fragmentation

    Authors: Kok-Leong Koong, Su-Cheng Haw, Lay-Ki Soon, Samini Subramaniam

    Abstract: XML is gradually employed as a standard of data exchange in web environment since its inception in the 90s until present. It serves as a data exchange between systems and other applications. Meanwhile the data volume has grown substantially in the web and thus effective methods of storing and retrieving these data is essential. One recommended way is physically or virtually fragments the large chu… ▽ More

    Submitted 13 May, 2015; originally announced May 2015.

    Comments: 12 pages, invited extension from conference paper. International Journal of Computer Science & Information Technology (IJCSIT), Vol 7, No 2, April 2015

    ACM Class: H.2.4

  18. Fuzzy approaches to context variable in fuzzy geographically weighted clustering

    Authors: Nguyen Van Minh, Le Hoang Son

    Abstract: Fuzzy Geographically Weighted Clustering (FGWC) is considered as a suitable tool for the analysis of geo-demographic data that assists the provision and planning of products and services to local people. Context variables were attached to FGWC in order to accelerate the computing speed of the algorithm and to focus the results on the domain of interests. Nonetheless, the determination of exact, cr… ▽ More

    Submitted 13 April, 2015; originally announced April 2015.

    Comments: 11 pages

    MSC Class: 62H30 ACM Class: I.5.3