subscribe to arXiv mailings

Bridging the Gap: Transfer Learning from English PLMs to Malaysian English

Authors: Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam

Abstract: Malaysian English is a low resource creole language, where it carries the elements of Malay, Chinese, and Tamil languages, in addition to Standard English. Named Entity Recognition (NER) models underperform when capturing entities from Malaysian English text due to its distinctive morphosyntactic adaptations, semantic features and code-switching (mixing English and Malay). Considering these gaps,… ▽ More Malaysian English is a low resource creole language, where it carries the elements of Malay, Chinese, and Tamil languages, in addition to Standard English. Named Entity Recognition (NER) models underperform when capturing entities from Malaysian English text due to its distinctive morphosyntactic adaptations, semantic features and code-switching (mixing English and Malay). Considering these gaps, we introduce MENmBERT and MENBERT, a pre-trained language model with contextual understanding, specifically tailored for Malaysian English. We have fine-tuned MENmBERT and MENBERT using manually annotated entities and relations from the Malaysian English News Article (MEN) Dataset. This fine-tuning process allows the PLM to learn representations that capture the nuances of Malaysian English relevant for NER and RE tasks. MENmBERT achieved a 1.52\% and 26.27\% improvement on NER and RE tasks respectively compared to the bert-base-multilingual-cased model. Although the overall performance of NER does not have a significant improvement, our further analysis shows that there is a significant improvement when evaluated by the 12 entity labels. These findings suggest that pre-training language models on language-specific and geographically-focused corpora can be a promising approach for improving NER performance in low-resource settings. The dataset and code published in this paper provide valuable resources for NLP research work focusing on Malaysian English. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Accepted in 9th Workshop on Representation Learning for NLP (Rep4NLP) at ACL 2024

arXiv:2406.13217 [pdf, other]

Bridging Law and Data: Augmenting Reasoning via a Semi-Structured Dataset with IRAC methodology

Authors: Xiaoxi Kang, Lizhen Qu, Lay-Ki Soon, Zhuang Li, Adnan Trakic

Abstract: The effectiveness of Large Language Models (LLMs) in legal reasoning is often limited due to the unique legal terminologies and the necessity for highly specialized knowledge. These limitations highlight the need for high-quality data tailored for complex legal reasoning tasks. This paper introduces LEGALSEMI, a benchmark specifically curated for legal scenario analysis. LEGALSEMI comprises 54 leg… ▽ More The effectiveness of Large Language Models (LLMs) in legal reasoning is often limited due to the unique legal terminologies and the necessity for highly specialized knowledge. These limitations highlight the need for high-quality data tailored for complex legal reasoning tasks. This paper introduces LEGALSEMI, a benchmark specifically curated for legal scenario analysis. LEGALSEMI comprises 54 legal scenarios, each rigorously annotated by legal experts, based on the comprehensive IRAC (Issue, Rule, Application, Conclusion) framework. In addition, LEGALSEMI is accompanied by a structured knowledge graph (SKG). A series of experiments were conducted to assess the usefulness of LEGALSEMI for IRAC analysis. The experimental results demonstrate the effectiveness of incorporating the SKG for issue identification, rule retrieval, application and conclusion generation using four different LLMs. LEGALSEMI will be publicly available upon acceptance of this paper. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2403.01616 [pdf, other]

Towards Comprehensive Vietnamese Retrieval-Augmented Generation and Large Language Models

Authors: Nguyen Quang Duc, Le Hai Son, Nguyen Duc Nhan, Nguyen Dich Nhat Minh, Le Thanh Huong, Dinh Viet Sang

Abstract: This paper presents our contributions towards advancing the state of Vietnamese language understanding and generation through the development and dissemination of open datasets and pre-trained models for Vietnamese Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs). This paper presents our contributions towards advancing the state of Vietnamese language understanding and generation through the development and dissemination of open datasets and pre-trained models for Vietnamese Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs). △ Less

Submitted 5 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

arXiv:2402.14521 [pdf, other]

Malaysian English News Decoded: A Linguistic Resource for Named Entity and Relation Extraction

Authors: Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam

Abstract: Standard English and Malaysian English exhibit notable differences, posing challenges for natural language processing (NLP) tasks on Malaysian English. Unfortunately, most of the existing datasets are mainly based on standard English and therefore inadequate for improving NLP tasks in Malaysian English. An experiment using state-of-the-art Named Entity Recognition (NER) solutions on Malaysian Engl… ▽ More Standard English and Malaysian English exhibit notable differences, posing challenges for natural language processing (NLP) tasks on Malaysian English. Unfortunately, most of the existing datasets are mainly based on standard English and therefore inadequate for improving NLP tasks in Malaysian English. An experiment using state-of-the-art Named Entity Recognition (NER) solutions on Malaysian English news articles highlights that they cannot handle morphosyntactic variations in Malaysian English. To the best of our knowledge, there is no annotated dataset available to improvise the model. To address these issues, we constructed a Malaysian English News (MEN) dataset, which contains 200 news articles that are manually annotated with entities and relations. We then fine-tuned the spaCy NER tool and validated that having a dataset tailor-made for Malaysian English could improve the performance of NER in Malaysian English significantly. This paper presents our effort in the data acquisition, annotation methodology, and thorough analysis of the annotated dataset. To validate the quality of the annotation, inter-annotator agreement was used, followed by adjudication of disagreements by a subject matter expert. Upon completion of these tasks, we managed to develop a dataset with 6,061 entities and 3,268 relation instances. Finally, we discuss on spaCy fine-tuning setup and analysis on the NER performance. This unique dataset will contribute significantly to the advancement of NLP research in Malaysian English, allowing researchers to accelerate their progress, particularly in NER and relation extraction. The dataset and annotation guideline has been published on Github. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: Accepted at LREC-COLING 2024

arXiv:2402.11178 [pdf, other]

RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations

Authors: Haolan Zhan, Zhuang Li, Xiaoxi Kang, Tao Feng, Yuncheng Hua, Lizhen Qu, Yi Ying, Mei Rianto Chandra, Kelly Rosalin, Jureynolds Jureynolds, Suraj Sharma, Shilin Qu, Linhao Luo, Lay-Ki Soon, Zhaleh Semnani Azad, Ingrid Zukerman, Gholamreza Haffari

Abstract: Norm violations occur when individuals fail to conform to culturally accepted behaviors, which may lead to potential conflicts. Remediating norm violations requires social awareness and cultural sensitivity of the nuances at play. To equip interactive AI systems with a remediation ability, we offer ReNoVi - a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms, as well as… ▽ More Norm violations occur when individuals fail to conform to culturally accepted behaviors, which may lead to potential conflicts. Remediating norm violations requires social awareness and cultural sensitivity of the nuances at play. To equip interactive AI systems with a remediation ability, we offer ReNoVi - a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms, as well as define a sequence of tasks to help understand and remediate norm violations step by step. ReNoVi consists of two parts: 512 human-authored dialogues (real data), and 8,746 synthetic conversations generated by ChatGPT through prompt learning. While collecting sufficient human-authored data is costly, synthetic conversations provide suitable amounts of data to help mitigate the scarcity of training data, as well as the chance to assess the alignment between LLMs and humans in the awareness of social norms. We thus harness the power of ChatGPT to generate synthetic training data for our task. To ensure the quality of both human-authored and synthetic data, we follow a quality control protocol during data collection. Our experimental results demonstrate the importance of remediating norm violations in socio-cultural conversations, as well as the improvement in performance obtained from synthetic data. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: work in progress. 15 pages, 7 figures

arXiv:2311.16603 [pdf, other]

l2Match: Optimization Techniques on Subgraph Matching Algorithm using Label Pair, Neighboring Label Index, and Jump-Redo method

Authors: C. Q. Cheng, K. S. Wong, L. K. Soon

Abstract: Graph database is designed to store bidirectional relationships between objects and facilitate the traversal process to extract a subgraph. However, the subgraph matching process is an NP-Complete problem. Existing solutions to this problem usually employ a filter-and-verification framework and a divide-and-conquer method. The filter-and-verification framework minimizes the number of inputs to the… ▽ More Graph database is designed to store bidirectional relationships between objects and facilitate the traversal process to extract a subgraph. However, the subgraph matching process is an NP-Complete problem. Existing solutions to this problem usually employ a filter-and-verification framework and a divide-and-conquer method. The filter-and-verification framework minimizes the number of inputs to the verification stage by filtering and pruning invalid candidates as much as possible. Meanwhile, subgraph matching is performed on the substructure decomposed from the larger graph to yield partial embedding. Subsequently, the recursive traversal or set intersection technique combines the partial embedding into a complete subgraph. In this paper, we first present a comprehensive literature review of the state-of-the-art solutions. l2Match, a subgraph isomorphism algorithm for small queries utilizing a Label-Pair Index and filtering method, is then proposed and presented as a proof of concept. Empirical experimentation shows that l2Match outperforms related state-of-the-art solutions, and the proposed methods optimize the existing algorithms. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: This short version of this article (6 pages) is accepted by ICEIC 2024

MSC Class: 05C60 (Primary); 05C30 (Secondary); 68R10 ACM Class: G.4.1; H.3.3

arXiv:2311.11583 [pdf, other]

How well ChatGPT understand Malaysian English? An Evaluation on Named Entity Recognition and Relation Extraction

Authors: Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam

Abstract: Recently, ChatGPT has attracted a lot of interest from both researchers and the general public. While the performance of ChatGPT in named entity recognition and relation extraction from Standard English texts is satisfactory, it remains to be seen if it can perform similarly for Malaysian English. Malaysian English is unique as it exhibits morphosyntactic and semantical adaptation from local conte… ▽ More Recently, ChatGPT has attracted a lot of interest from both researchers and the general public. While the performance of ChatGPT in named entity recognition and relation extraction from Standard English texts is satisfactory, it remains to be seen if it can perform similarly for Malaysian English. Malaysian English is unique as it exhibits morphosyntactic and semantical adaptation from local contexts. In this study, we assess ChatGPT's capability in extracting entities and relations from the Malaysian English News (MEN) dataset. We propose a three-step methodology referred to as \textbf{\textit{educate-predict-evaluate}}. The performance of ChatGPT is assessed using F1-Score across 18 unique prompt settings, which were carefully engineered for a comprehensive review. From our evaluation, we found that ChatGPT does not perform well in extracting entities from Malaysian English news articles, with the highest F1-Score of 0.497. Further analysis shows that the morphosyntactic adaptation in Malaysian English caused the limitation. However, interestingly, this morphosyntactic adaptation does not impact the performance of ChatGPT for relation extraction. △ Less

Submitted 28 June, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: Accepted in Generation, Evaluation & Metrics (GEM) Workshop at EMNLP 2023

arXiv:2310.14880 [pdf]

doi 10.18653/v1/2023.findings-emnlp.929

Can ChatGPT Perform Reasoning Using the IRAC Method in Analyzing Legal Scenarios Like a Lawyer?

Authors: Xiaoxi Kang, Lizhen Qu, Lay-Ki Soon, Adnan Trakic, Terry Yue Zhuo, Patrick Charles Emerton, Genevieve Grant

Abstract: Large Language Models (LLMs), such as ChatGPT, have drawn a lot of attentions recently in the legal domain due to its emergent ability to tackle a variety of legal tasks. However, it is still unknown if LLMs are able to analyze a legal case and perform reasoning in the same manner as lawyers. Therefore, we constructed a novel corpus consisting of scenarios pertain to Contract Acts Malaysia and Aus… ▽ More Large Language Models (LLMs), such as ChatGPT, have drawn a lot of attentions recently in the legal domain due to its emergent ability to tackle a variety of legal tasks. However, it is still unknown if LLMs are able to analyze a legal case and perform reasoning in the same manner as lawyers. Therefore, we constructed a novel corpus consisting of scenarios pertain to Contract Acts Malaysia and Australian Social Act for Dependent Child. ChatGPT is applied to perform analysis on the corpus using the IRAC method, which is a framework widely used by legal professionals for organizing legal analysis. Each scenario in the corpus is annotated with a complete IRAC analysis in a semi-structured format so that both machines and legal professionals are able to interpret and understand the annotations. In addition, we conducted the first empirical assessment of ChatGPT for IRAC analysis in order to understand how well it aligns with the analysis of legal professionals. Our experimental results shed lights on possible future research directions to improve alignments between LLMs and legal experts in terms of legal reasoning. △ Less

Submitted 2 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 Findings

Report number: 2023.findings-emnlp.929

Journal ref: 2023.findings-emnlp.929

arXiv:2304.12026 [pdf, other]

SocialDial: A Benchmark for Socially-Aware Dialogue Systems

Authors: Haolan Zhan, Zhuang Li, Yufei Wang, Linhao Luo, Tao Feng, Xiaoxi Kang, Yuncheng Hua, Lizhen Qu, Lay-Ki Soon, Suraj Sharma, Ingrid Zukerman, Zhaleh Semnani-Azad, Gholamreza Haffari

Abstract: Dialogue systems have been widely applied in many scenarios and are now more powerful and ubiquitous than ever before. With large neural models and massive available data, current dialogue systems have access to more knowledge than any people in their life. However, current dialogue systems still do not perform at a human level. One major gap between conversational agents and humans lies in their… ▽ More Dialogue systems have been widely applied in many scenarios and are now more powerful and ubiquitous than ever before. With large neural models and massive available data, current dialogue systems have access to more knowledge than any people in their life. However, current dialogue systems still do not perform at a human level. One major gap between conversational agents and humans lies in their abilities to be aware of social norms. The development of socially-aware dialogue systems is impeded due to the lack of resources. In this paper, we present the first socially-aware dialogue corpus - SocialDial, based on Chinese social culture. SocialDial consists of two parts: 1,563 multi-turn dialogues between two human speakers with fine-grained labels, and 4,870 synthetic conversations generated by ChatGPT. The human corpus covers five categories of social norms, which have 14 sub-categories in total. Specifically, it contains social factor annotations including social relation, context, social distance, and social norms. However, collecting sufficient socially-aware dialogues is costly. Thus, we harness the power of ChatGPT and devise an ontology-based synthetic data generation framework. This framework is able to generate synthetic data at scale. To ensure the quality of synthetic dialogues, we design several mechanisms for quality control during data collection. Finally, we evaluate our dataset using several pre-trained models, such as BERT and RoBERTa. Comprehensive empirical results based on state-of-the-art neural models demonstrate that modeling of social norms for dialogue systems is a promising research direction. To the best of our knowledge, SocialDial is the first socially-aware dialogue dataset that covers multiple social factors and has fine-grained labels. △ Less

Submitted 24 April, 2023; originally announced April 2023.

Comments: Accepted by SIGIR 2023

arXiv:2205.00387 [pdf, other]

Crude Oil-related Events Extraction and Processing: A Transfer Learning Approach

Authors: Meisin Lee, Lay-Ki Soon, Eu-Gene Siew

Abstract: One of the challenges in event extraction via traditional supervised learning paradigm is the need for a sizeable annotated dataset to achieve satisfactory model performance. It is even more challenging when it comes to event extraction in the finance and economics domain, a domain with considerably fewer resources. This paper presents a complete framework for extracting and processing crude oil-r… ▽ More One of the challenges in event extraction via traditional supervised learning paradigm is the need for a sizeable annotated dataset to achieve satisfactory model performance. It is even more challenging when it comes to event extraction in the finance and economics domain, a domain with considerably fewer resources. This paper presents a complete framework for extracting and processing crude oil-related events found in CrudeOilNews corpus, addressing the issue of annotation scarcity and class imbalance by leveraging on the effectiveness of transfer learning. Apart from event extraction, we place special emphasis on event properties (Polarity, Modality, and Intensity) classification to determine the factual certainty of each event. We build baseline models first by supervised learning and then exploit Transfer Learning methods to boost event extraction model performance despite the limited amount of annotated data and severe class imbalance. This is done via methods within the transfer learning framework such as Domain Adaptive Pre-training, Multi-task Learning and Sequential Transfer Learning. Based on experiment results, we are able to improve all event extraction sub-task models both in F1 and MCC1-score as compared to baseline models trained via the standard supervised learning. Accurate and holistic event extraction from crude oil news is very useful for downstream tasks such as understanding event chains and learning event-event relations, which can be used for other downstream tasks such as commodity price prediction, summarisation, etc. to support a wide range of business decision making. △ Less

Submitted 30 April, 2022; originally announced May 2022.

MSC Class: 68 ACM Class: H.3; I.2

arXiv:2204.03871 [pdf, other]

CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction

Authors: Meisin Lee, Lay-Ki Soon, Eu-Gene Siew, Ly Fie Sugianto

Abstract: In this paper, we present CrudeOilNews, a corpus of English Crude Oil news for event extraction. It is the first of its kind for Commodity News and serve to contribute towards resource building for economic and financial text mining. This paper describes the data collection process, the annotation methodology and the event typology used in producing the corpus. Firstly, a seed set of 175 news arti… ▽ More In this paper, we present CrudeOilNews, a corpus of English Crude Oil news for event extraction. It is the first of its kind for Commodity News and serve to contribute towards resource building for economic and financial text mining. This paper describes the data collection process, the annotation methodology and the event typology used in producing the corpus. Firstly, a seed set of 175 news articles were manually annotated, of which a subset of 25 news were used as the adjudicated reference test set for inter-annotator and system evaluation. Agreement was generally substantial and annotator performance was adequate, indicating that the annotation scheme produces consistent event annotations of high quality. Subsequently the dataset is expanded through (1) data augmentation and (2) Human-in-the-loop active learning. The resulting corpus has 425 news articles with approximately 11k events annotated. As part of active learning process, the corpus was used to train basic event extraction models for machine labeling, the resulting models also serve as a validation or as a pilot study demonstrating the use of the corpus in machine learning purposes. The annotated corpus is made available for academic research purpose at https://github.com/meisin/CrudeOilNews-Corpus. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: Accepted at LREC 2022. arXiv admin note: text overlap with arXiv:2105.08214

MSC Class: 68T99 ACM Class: I.2.7

arXiv:2109.12781 [pdf, other]

Effective Use of Graph Convolution Network and Contextual Sub-Tree forCommodity News Event Extraction

Authors: Meisin Lee, Lay-Ki Soon, Eu-Gene Siew

Abstract: Event extraction in commodity news is a less researched area as compared to generic event extraction. However, accurate event extraction from commodity news is useful in abroad range of applications such as under-standing event chains and learning event-event relations, which can then be used for commodity price prediction. The events found in commodity news exhibit characteristics different from… ▽ More Event extraction in commodity news is a less researched area as compared to generic event extraction. However, accurate event extraction from commodity news is useful in abroad range of applications such as under-standing event chains and learning event-event relations, which can then be used for commodity price prediction. The events found in commodity news exhibit characteristics different from generic events, hence posing a unique challenge in event extraction using existing methods. This paper proposes an effective use of Graph Convolutional Networks(GCN) with a pruned dependency parse tree, termed contextual sub-tree, for better event ex-traction in commodity news. The event ex-traction model is trained using feature embed-dings from ComBERT, a BERT-based masked language model that was produced through domain-adaptive pre-training on a commodity news corpus. Experimental results show the efficiency of the proposed solution, which out-performs existing methods with F1 scores as high as 0.90. Furthermore, our pre-trained language model outperforms GloVe by 23%, and BERT and RoBERTa by 7% in terms of argument roles classification. For the goal of re-producibility, the code and trained models are made publicly available1. △ Less

Submitted 26 September, 2021; originally announced September 2021.

Comments: Accepted in ECONLP workshop at EMNLP 2021

ACM Class: I.7; H.4; H.5

arXiv:2105.08214 [pdf, other]

An Annotated Commodity News Corpus for Event Extraction

Authors: Meisin Lee, Lay-Ki Soon, Eu-Gene Siew, Ly Fie Sugianto

Abstract: Commodity News contains a wealth of information such as sum-mary of the recent commodity price movement and notable events that led tothe movement. Through event extraction, useful information extracted fromcommodity news is extremely useful in mining for causal relation betweenevents and commodity price movement, which can be used for commodity priceprediction. To facilitate the future research,… ▽ More Commodity News contains a wealth of information such as sum-mary of the recent commodity price movement and notable events that led tothe movement. Through event extraction, useful information extracted fromcommodity news is extremely useful in mining for causal relation betweenevents and commodity price movement, which can be used for commodity priceprediction. To facilitate the future research, we introduce a new dataset withthe following information identified and annotated: (i) entities (both nomi-nal and named), (ii) events (trigger words and argument roles), (iii) eventmetadata: modality, polarity and intensity and (iv) event-event relations. △ Less

Submitted 24 August, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: Submitted to journal, currently under review

MSC Class: 68T99 ACM Class: I.2.7

arXiv:2003.00991 [pdf, other]

Uniform Array with Broadband Beamforming for Arbitrary Beam Patterns

Authors: Phan Le Son

Abstract: Broadband beamforming is a technique to obtain the signal with a wide range of frequencies. It maintains the signal integrity and spatial selectivity over frequencies. This is important in several applications such as microphone array, sonar array or radar where the operation range of the signal is several octaves. Using uniform array for broadband beamforming is the old topic but there is no avai… ▽ More Broadband beamforming is a technique to obtain the signal with a wide range of frequencies. It maintains the signal integrity and spatial selectivity over frequencies. This is important in several applications such as microphone array, sonar array or radar where the operation range of the signal is several octaves. Using uniform array for broadband beamforming is the old topic but there is no available design method for the arbitrary beam patterns, except the optimization methods. In this paper, we present a new method based on geometry translation and coordinate transformation to design broadband beamformer for arbitrary beam patterns. The new method uses less computation time than the optimization methods and it could help to find a better configuration of the array such as fewer sensors, smaller size. △ Less

Submitted 2 March, 2020; originally announced March 2020.

arXiv:1811.00697 [pdf, other]

Noise Contrastive Estimation for Scalable Linear Models for One-Class Collaborative Filtering

Authors: Ga Wu, Maksims Volkovs, Chee Loong Soon, Scott Sanner, Himanshu Rai

Abstract: Previous highly scalable one-class collaborative filtering methods such as Projected Linear Recommendation (PLRec) have advocated using fast randomized SVD to embed items into a latent space, followed by linear regression methods to learn personalized recommendation models per user. Unfortunately, naive SVD embedding methods often exhibit a popularity bias that skews the ability to accurately embe… ▽ More Previous highly scalable one-class collaborative filtering methods such as Projected Linear Recommendation (PLRec) have advocated using fast randomized SVD to embed items into a latent space, followed by linear regression methods to learn personalized recommendation models per user. Unfortunately, naive SVD embedding methods often exhibit a popularity bias that skews the ability to accurately embed niche items. To address this, we leverage insights from Noise Contrastive Estimation (NCE) to derive a closed-form, efficiently computable "depopularized" embedding. While this method is not ideal for direct recommendation using methods like PureSVD since popularity still plays an important role in recommendation, we find that embedding followed by linear regression to learn personalized user models in a novel method we call NCE-PLRec leverages the improved item embedding of NCE while correcting for its popularity unbiasing in final recommendations. An analysis of the recommendation popularity distribution demonstrates that NCE-PLRec uniformly distributes its recommendations over the popularity spectrum while other methods exhibit distinct biases towards specific popularity subranges, thus artificially restricting their recommendations. Empirically, NCE-PLRec outperforms state-of-the-art methods as well as various ablations of itself on a variety of large-scale recommendation datasets. △ Less

Submitted 1 November, 2018; originally announced November 2018.

Comments: 8 pages

arXiv:1602.08447 [pdf]

A Neutrosophic Recommender System for Medical Diagnosis Based on Algebraic Neutrosophic Measures

Authors: Mumtaz Ali, Nguyen Van Minh, Le Hoang Son

Abstract: Neutrosophic set has the ability to handle uncertain, incomplete, inconsistent, indeterminate information in a more accurate way. In this paper, we proposed a neutrosophic recommender system to predict the diseases based on neutrosophic set which includes single-criterion neutrosophic recommender system (SC-NRS) and multi-criterion neutrosophic recommender system (MC-NRS). Further, we investigated… ▽ More Neutrosophic set has the ability to handle uncertain, incomplete, inconsistent, indeterminate information in a more accurate way. In this paper, we proposed a neutrosophic recommender system to predict the diseases based on neutrosophic set which includes single-criterion neutrosophic recommender system (SC-NRS) and multi-criterion neutrosophic recommender system (MC-NRS). Further, we investigated some algebraic operations of neutrosophic recommender system such as union, complement, intersection, probabilistic sum, bold sum, bold intersection, bounded difference, symmetric difference, convex linear sum of min and max operators, Cartesian product, associativity, commutativity and distributive. Based on these operations, we studied the algebraic structures such as lattices, Kleen algebra, de Morgan algebra, Brouwerian algebra, BCK algebra, Stone algebra and MV algebra. In addition, we introduced several types of similarity measures based on these algebraic operations and studied some of their theoretic properties. Moreover, we accomplished a prediction formula using the proposed algebraic similarity measure. We also proposed a new algorithm for medical diagnosis based on neutrosophic recommender system. Finally to check the validity of the proposed methodology, we made experiments on the datasets Heart, RHC, Breast cancer, Diabetes and DMD. At the end, we presented the MSE and computational time by comparing the proposed algorithm with the relevant ones such as ICSM, DSM, CARE, CFMD, as well as other variants namely Variant 67, Variant 69, and Varian 71 both in tabular and graphical form to analyze the efficiency and accuracy. Finally we analyzed the strength of all 8 algorithms by ANOVA statistical tool. △ Less

Submitted 24 February, 2016; originally announced February 2016.

Comments: Keywords: Medical diagnosis, neutrosophic set, neutrosophic recommender system, non-linear regression model

arXiv:1505.03246 [pdf]

doi 10.5121/ijcsit.2015.7209

Prefix-based Labeling Annotation for Effective XML Fragmentation

Authors: Kok-Leong Koong, Su-Cheng Haw, Lay-Ki Soon, Samini Subramaniam

Abstract: XML is gradually employed as a standard of data exchange in web environment since its inception in the 90s until present. It serves as a data exchange between systems and other applications. Meanwhile the data volume has grown substantially in the web and thus effective methods of storing and retrieving these data is essential. One recommended way is physically or virtually fragments the large chu… ▽ More XML is gradually employed as a standard of data exchange in web environment since its inception in the 90s until present. It serves as a data exchange between systems and other applications. Meanwhile the data volume has grown substantially in the web and thus effective methods of storing and retrieving these data is essential. One recommended way is physically or virtually fragments the large chunk of data and distributes the fragments into different nodes. Fragmentation design of XML document contains of two parts: fragmentation operation and fragmentation method. The three fragmentation operations are Horizontal, Vertical and Hybrid. It determines how the XML should be fragmented. This paper aims to give an overview on the fragmentation design consideration and subsequently, propose a fragmentation technique using number addressing. △ Less

Submitted 13 May, 2015; originally announced May 2015.

Comments: 12 pages, invited extension from conference paper. International Journal of Computer Science & Information Technology (IJCSIT), Vol 7, No 2, April 2015

ACM Class: H.2.4

arXiv:1504.03558 [pdf]

doi 10.5121/csit.2015.50503

Fuzzy approaches to context variable in fuzzy geographically weighted clustering

Authors: Nguyen Van Minh, Le Hoang Son

Abstract: Fuzzy Geographically Weighted Clustering (FGWC) is considered as a suitable tool for the analysis of geo-demographic data that assists the provision and planning of products and services to local people. Context variables were attached to FGWC in order to accelerate the computing speed of the algorithm and to focus the results on the domain of interests. Nonetheless, the determination of exact, cr… ▽ More Fuzzy Geographically Weighted Clustering (FGWC) is considered as a suitable tool for the analysis of geo-demographic data that assists the provision and planning of products and services to local people. Context variables were attached to FGWC in order to accelerate the computing speed of the algorithm and to focus the results on the domain of interests. Nonetheless, the determination of exact, crisp values of the context variable is a hard task. In this paper, we propose two novel methods using fuzzy approaches for that determination. A numerical example is given to illustrate the uses of the proposed methods. △ Less

Submitted 13 April, 2015; originally announced April 2015.

Comments: 11 pages

MSC Class: 62H30 ACM Class: I.5.3

Showing 1–18 of 18 results for author: Soon, L