Skip to main content

Showing 1–50 of 91 results for author: Lim, K

  1. arXiv:2407.10098  [pdf, other

    cs.OS cs.AR cs.DC cs.NI cs.PF

    Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild

    Authors: Jiechen Zhao, Ran Shu, Katie Lim, Zewen Fan, Thomas Anderson, Mingyu Gao, Natalie Enright Jerger

    Abstract: I/O devices in public clouds have integrated increasing numbers of hardware accelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such specialized compute (1) is not explicitly accessible to cloud users with performance guarantee, (2) cannot be leveraged simultaneously by both providers and users, unlike general-purpose compute (e.g., CPUs). Through ten observations, we present… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  2. arXiv:2405.06665  [pdf, other

    cs.CL cs.IR cs.LG

    Enhancing Language Models for Financial Relation Extraction with Named Entities and Part-of-Speech

    Authors: Menglin Li, Kwan Hui Lim

    Abstract: The Financial Relation Extraction (FinRE) task involves identifying the entities and their relation, given a piece of financial statement/text. To solve this FinRE problem, we propose a simple but effective strategy that improves the performance of pre-trained language models by augmenting them with Named Entity Recognition (NER) and Part-Of-Speech (POS), as well as different approaches to combine… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted to ICLR 2024 Tiny Paper Track

  3. Towards Precise Observations of Neural Model Robustness in Classification

    Authors: Wenchuan Mu, Kwan Hui Lim

    Abstract: In deep learning applications, robustness measures the ability of neural models that handle slight changes in input data, which could lead to potential safety hazards, especially in safety-critical applications. Pre-deployment assessment of model robustness is essential, but existing methods often suffer from either high costs or imprecise results. To enhance safety in real-world scenarios, metric… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  4. arXiv:2404.16411  [pdf, other

    cs.AI

    Label-Free Topic-Focused Summarization Using Query Augmentation

    Authors: Wenchuan Mu, Kwan Hui Lim

    Abstract: In today's data and information-rich world, summarization techniques are essential in harnessing vast text to extract key information and enhance decision-making and efficiency. In particular, topic-focused summarization is important due to its ability to tailor content to specific aspects of an extended text. However, this usually requires extensive labelled datasets and considerable computationa… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  5. arXiv:2404.08662  [pdf, other

    cs.IR cs.LG cs.SI

    FewUser: Few-Shot Social User Geolocation via Contrastive Learning

    Authors: Menglin Li, Kwan Hui Lim

    Abstract: To address the challenges of scarcity in geotagged data for social user geolocation, we propose FewUser, a novel framework for Few-shot social User geolocation. We incorporate a contrastive learning strategy between users and locations to improve geolocation performance with no or limited training data. FewUser features a user representation module that harnesses a pre-trained language model (PLM)… ▽ More

    Submitted 28 March, 2024; originally announced April 2024.

    Comments: 17 pages, 3 figures, 8 tables, submitted to ECML-PKDD 2024 for review

  6. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  7. arXiv:2403.14770  [pdf, other

    cs.AR

    Beehive: A Flexible Network Stack for Direct-Attached Accelerators

    Authors: Katie Lim, Matthew Giordano, Theano Stavrinos, Pratyush Patel, Jacob Nelson, Irene Zhang, Baris Kasikci, Tom Anderson

    Abstract: Direct-attached accelerators, where application accelerators are directly connected to the datacenter network via a hardware network stack, offer substantial benefits in terms of reduced latency, CPU overhead, and energy use. However, a key challenge is that modern datacenter network stacks are complex, with interleaved protocol layers, network management functions, and virtualization support. To… ▽ More

    Submitted 30 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  8. arXiv:2403.13872  [pdf, other

    cs.LG cs.SI

    Spatial-Temporal Graph Representation Learning for Tactical Networks Future State Prediction

    Authors: Junhua Liu, Justin Albrethsen, Lincoln Goh, David Yau, Kwan Hui Lim

    Abstract: Resource allocation in tactical ad-hoc networks presents unique challenges due to their dynamic and multi-hop nature. Accurate prediction of future network connectivity is essential for effective resource allocation in such environments. In this paper, we introduce the Spatial-Temporal Graph Encoder-Decoder (STGED) framework for Tactical Communication Networks that leverages both spatial and tempo… ▽ More

    Submitted 14 July, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  9. arXiv:2403.11399  [pdf, other

    cs.CL

    X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment

    Authors: Dongjae Shin, Hyeonseok Lim, Inho Won, Changsu Choi, Minjun Kim, Seungwoo Song, Hangyeol Yoo, Sangmin Kim, Kyungtae Lim

    Abstract: The impressive development of large language models (LLMs) is expanding into the realm of large multimodal models (LMMs), which incorporate multiple types of data beyond text. However, the nature of multimodal models leads to significant expenses in the creation of training data. Furthermore, constructing multilingual data for LMMs presents its own set of challenges due to language diversity and c… ▽ More

    Submitted 1 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  10. arXiv:2403.10882  [pdf, other

    cs.CL cs.AI

    Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean

    Authors: ChangSu Choi, Yongbin Jeong, Seoyoon Park, InHo Won, HyeonSeok Lim, SangMin Kim, Yejee Kang, Chanhyuk Yoon, Jaewan Park, Yiseul Lee, HyeJin Lee, Younggyun Hahm, Hansaem Kim, KyungTae Lim

    Abstract: Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly… ▽ More

    Submitted 21 March, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

  11. arXiv:2403.00786  [pdf, other

    cs.IR cs.SI

    Leveraging Contrastive Learning for Few-shot Geolocation of Social Posts

    Authors: Menglin Li, Kwan Hui Lim

    Abstract: Social geolocation is an important problem of predicting the originating locations of social media posts. However, this task is challenging due to the need for a substantial volume of training data, alongside well-annotated labels. These issues are further exacerbated by new or less popular locations with insufficient labels, further leading to an imbalanced dataset. In this paper, we propose \tex… ▽ More

    Submitted 19 February, 2024; originally announced March 2024.

    Comments: This paper contains 7-page main content and 2-page references and was submitted to IJCAI2024 for review

  12. arXiv:2402.00689  [pdf, other

    cs.CR cs.AI

    Ocassionally Secure: A Comparative Analysis of Code Generation Assistants

    Authors: Ran Elgedawy, John Sadik, Senjuti Dutta, Anuj Gautam, Konstantinos Georgiou, Farzin Gholamrezae, Fujiao Ji, Kyungchan Lim, Qian Liu, Scott Ruoti

    Abstract: $ $Large Language Models (LLMs) are being increasingly utilized in various applications, with code generations being a notable example. While previous research has shown that LLMs have the capability to generate both secure and insecure code, the literature does not take into account what factors help generate secure and effective code. Therefore in this paper we focus on identifying and understan… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 12 pages, 2 figures

  13. arXiv:2401.12472  [pdf, other

    cs.CL

    Contrastive Learning in Distilled Models

    Authors: Valerie Lim, Kai Wen Ng, Kenneth Lim

    Abstract: Natural Language Processing models like BERT can provide state-of-the-art word embeddings for downstream NLP tasks. However, these models yet to perform well on Semantic Textual Similarity, and may be too large to be deployed as lightweight edge applications. We seek to apply a suitable contrastive learning method based on the SimCSE paper, to a model architecture adapted from a knowledge distilla… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  14. arXiv:2401.06443  [pdf, other

    cs.CL

    BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining

    Authors: Minjun Kim, Seungwoo Song, Youhan Lee, Haneol Jang, Kyungtae Lim

    Abstract: The current research direction in generative models, such as the recently developed GPT4, aims to find relevant knowledge information for multimodal and multilingual inputs to provide answers. Under these research circumstances, the demand for multilingual evaluation of visual question answering (VQA) tasks, a representative task of multimodal systems, has increased. Accordingly, we propose a bili… ▽ More

    Submitted 15 March, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  15. arXiv:2401.03676  [pdf, other

    cs.SE cs.AI

    Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education

    Authors: Wei Hung Pan, Ming Jie Chok, Jonathan Leong Shan Wong, Yung Xin Shin, Yeong Shian Poon, Zhou Yang, Chun Yong Chong, David Lo, Mei Kuan Lim

    Abstract: Educators are increasingly concerned about the usage of Large Language Models (LLMs) such as ChatGPT in programming education, particularly regarding the potential exploitation of imperfections in Artificial Intelligence Generated Content (AIGC) Detectors for academic misconduct. In this paper, we present an empirical study where the LLM is examined for its attempts to bypass detection by AIGC Det… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: 11 pages, paper accepted at 46th International Conference on Software Engineering, Software Engineering Education and Training Track (ICSE-SEET 2024)

  16. arXiv:2312.12678  [pdf, other

    q-bio.QM cs.AI cs.LG stat.ME stat.ML

    Causal Discovery for fMRI data: Challenges, Solutions, and a Case Study

    Authors: Eric Rawls, Bryan Andrews, Kelvin Lim, Erich Kummerfeld

    Abstract: Designing studies that apply causal discovery requires navigating many researcher degrees of freedom. This complexity is exacerbated when the study involves fMRI data. In this paper we (i) describe nine challenges that occur when applying causal discovery to fMRI data, (ii) discuss the space of decisions that need to be made, (iii) review how a recent case study made those decisions, (iv) and iden… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  17. arXiv:2311.12355  [pdf, other

    cs.IR cs.CL cs.LG

    Utilizing Language Models for Tour Itinerary Recommendation

    Authors: Ngai Lam Ho, Kwan Hui Lim

    Abstract: Tour itinerary recommendation involves planning a sequence of relevant Point-of-Interest (POIs), which combines challenges from the fields of both Operations Research (OR) and Recommendation Systems (RS). As an OR problem, there is the need to maximize a certain utility (e.g., popularity of POIs in the tour) while adhering to some constraints (e.g., maximum time for the tour). As a RS problem, it… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: PMAI23 @IJCAI 2023 2nd International Workshop on Process Management in the AI era

  18. arXiv:2311.11071  [pdf, other

    cs.IR cs.AI cs.LG cs.SI

    SBTRec- A Transformer Framework for Personalized Tour Recommendation Problem with Sentiment Analysis

    Authors: Ngai Lam Ho, Roy Ka-Wei Lee, Kwan Hui Lim

    Abstract: When traveling to an unfamiliar city for holidays, tourists often rely on guidebooks, travel websites, or recommendation systems to plan their daily itineraries and explore popular points of interest (POIs). However, these approaches may lack optimization in terms of time feasibility, localities, and user preferences. In this paper, we propose the SBTRec algorithm: a BERT-based Trajectory Recommen… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

    Report number: 01

  19. arXiv:2310.19886  [pdf

    cs.LG cs.IR cs.SI

    BTRec: BERT-Based Trajectory Recommendation for Personalized Tours

    Authors: Ngai Lam Ho, Roy Ka-Wei Lee, Kwan Hui Lim

    Abstract: An essential task for tourists having a pleasant holiday is to have a well-planned itinerary with relevant recommendations, especially when visiting unfamiliar cities. Many tour recommendation tools only take into account a limited number of factors, such as popular Points of Interest (POIs) and routing constraints. Consequently, the solutions they provide may not always align with the individual… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: RecSys 2023, Workshop on Recommenders in Tourism

  20. Anti-noise window: Subjective perception of active noise reduction and effect of informational masking

    Authors: Bhan Lam, Kelvin Chee Quan Lim, Kenneth Ooi, Zhen-Ting Ong, Dongyuan Shi, Woon-Seng Gan

    Abstract: Reviving natural ventilation (NV) for urban sustainability presents challenges for indoor acoustic comfort. Active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. However, these approaches are not commonly integrated or evaluated from a perceptual standpoint. This study examines… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Accepted manuscript submitted to Sustainable Cities and Society

    Journal ref: Sustain. Cities Soc., 104763, 2023

  21. arXiv:2306.09626  [pdf, other

    cs.CV

    PAtt-Lite: Lightweight Patch and Attention MobileNet for Challenging Facial Expression Recognition

    Authors: Jia Le Ngwe, Kian Ming Lim, Chin Poo Lee, Thian Song Ong

    Abstract: Facial Expression Recognition (FER) is a machine learning problem that deals with recognizing human facial expressions. While existing work has achieved performance improvements in recent years, FER in the wild and under challenging conditions remains a challenge. In this paper, a lightweight patch and attention network based on MobileNetV1, referred to as PAtt-Lite, is proposed to improve FER per… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  22. arXiv:2305.17445  [pdf, other

    cs.SE

    Synthesizing Speech Test Cases with Text-to-Speech? An Empirical Study on the False Alarms in Automated Speech Recognition Testing

    Authors: Julia Kaiwen Lau, Kelvin Kai Wen Kong, Julian Hao Yong, Per Hoong Tan, Zhou Yang, Zi Qian Yong, Joshua Chern Wey Low, Chun Yong Chong, Mei Kuan Lim, David Lo

    Abstract: Recent studies have proposed the use of Text-To-Speech (TTS) systems to automatically synthesise speech test cases on a scale and uncover a large number of failures in ASR systems. However, the failures uncovered by synthetic test cases may not reflect the actual performance of an ASR system when it transcribes human audio, which we refer to as false alarms. Given a failed test case synthesised fr… ▽ More

    Submitted 18 July, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: 13 pages, Accepted at ISSTA2023

  23. arXiv:2305.06335  [pdf, other

    cs.CL

    K-UniMorph: Korean Universal Morphology and its Feature Schema

    Authors: Eunkyul Leah Jo, Kyuwon Kim, Xihan Wu, KyungTae Lim, Jungyeul Park, Chulwoo Park

    Abstract: We present in this work a new Universal Morphology dataset for Korean. Previously, the Korean language has been underrepresented in the field of morphological paradigms amongst hundreds of diverse world languages. Hence, we propose this Universal Morphological paradigms for the Korean language that preserve its distinct characteristics. For our K-UniMorph dataset, we outline each grammatical crite… ▽ More

    Submitted 17 May, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: Findings of the Association for Computational Linguistics: ACL 2023 (Camera-ready)

  24. Korean Named Entity Recognition Based on Language-Specific Features

    Authors: Yige Chen, KyungTae Lim, Jungyeul Park

    Abstract: In the paper, we propose a novel way of improving named entity recognition in the Korean language using its language-specific features. While the field of named entity recognition has been studied extensively in recent years, the mechanism of efficiently recognizing named entities in Korean has hardly been explored. This is because the Korean language has distinct linguistic properties that preven… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: 44 pages

    Journal ref: Nat. Lang. Eng. 30 (2024) 625-649

  25. arXiv:2304.08495   

    cs.AI cs.MA

    Optimizing Group Utility in Itinerary Planning: A Strategic and Crowd-Aware Approach

    Authors: Junhua Liu, Kwan Hui Lim, Kristin L. Wood, Menglin Li

    Abstract: Itinerary recommendation is a complex sequence prediction problem with numerous real-world applications. This task becomes even more challenging when considering the optimization of multiple user queuing times and crowd levels, as well as numerous involved parameters, such as attraction popularity, queuing time, walking time, and operating hours. Existing solutions typically focus on single-person… ▽ More

    Submitted 10 September, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Will be going through major revision

  26. arXiv:2304.04488  [pdf, other

    cs.DC

    Hybrid Computing for Interactive Datacenter Applications

    Authors: Pratyush Patel, Katie Lim, Kushal Jhunjhunwalla, Ashlie Martinez, Max Demoulin, Jacob Nelson, Irene Zhang, Thomas Anderson

    Abstract: Field-Programmable Gate Arrays (FPGAs) are more energy efficient and cost effective than CPUs for a wide variety of datacenter applications. Yet, for latency-sensitive and bursty workloads, this advantage can be difficult to harness due to high FPGA spin-up costs. We propose that a hybrid FPGA and CPU computing framework can harness the energy efficiency benefits of FPGAs for such workloads at rea… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: 13 pages

  27. arXiv:2303.04566  [pdf, other

    cs.CV cs.SE

    Robustness Evaluation in Hand Pose Estimation Models using Metamorphic Testing

    Authors: Muxin Pu, Chun Yong Chong, Mei Kuan Lim

    Abstract: Hand pose estimation (HPE) is a task that predicts and describes the hand poses from images or video frames. When HPE models estimate hand poses captured in a laboratory or under controlled environments, they normally deliver good performance. However, the real-world environment is complex, and various uncertainties may happen, which could degrade the performance of HPE models. For example, the ha… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: Accepted at 2023 8th International Workshop on Metamorphic Testing, 8 pages

  28. arXiv:2302.09938  [pdf, other

    cs.AI cs.LG cs.SI

    SkillRec: A Data-Driven Approach to Job Skill Recommendation for Career Insights

    Authors: Xiang Qian Ong, Kwan Hui Lim

    Abstract: Understanding the skill sets and knowledge required for any career is of utmost importance, but it is increasingly challenging in today's dynamic world with rapid changes in terms of the tools and techniques used. Thus, it is especially important to be able to accurately identify the required skill sets for any job for better career insights and development. In this paper, we propose and develop t… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted to the 15th International Conference on Computer and Automation Engineering (ICCAE 2023)

  29. arXiv:2302.05582  [pdf, other

    eess.AS cs.CL cs.SD cs.SE

    ASDF: A Differential Testing Framework for Automatic Speech Recognition Systems

    Authors: Daniel Hao Xian Yuen, Andrew Yong Chen Pang, Zhou Yang, Chun Yong Chong, Mei Kuan Lim, David Lo

    Abstract: Recent years have witnessed wider adoption of Automated Speech Recognition (ASR) techniques in various domains. Consequently, evaluating and enhancing the quality of ASR systems is of great importance. This paper proposes ASDF, an Automated Speech Recognition Differential Testing Framework for testing ASR systems. ASDF extends an existing ASR testing tool, the CrossASR++, which synthesizes test ca… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: Accpeted by ICST 2023 Tool Demo Track

  30. arXiv:2212.13900  [pdf, other

    cs.IR cs.AI cs.LG

    POIBERT: A Transformer-based Model for the Tour Recommendation Problem

    Authors: Ngai Lam Ho, Kwan Hui Lim

    Abstract: Tour itinerary planning and recommendation are challenging problems for tourists visiting unfamiliar cities. Many tour recommendation algorithms only consider factors such as the location and popularity of Points of Interest (POIs) but their solutions may not align well with the user's own preferences and other location constraints. Additionally, these solutions do not take into consideration of t… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

    Comments: Accepted to the 2022 IEEE International Conference on Big Data (BigData2022)

  31. arXiv:2211.13611  [pdf, ps, other

    astro-ph.IM cs.SE

    Software Architecture and System Design of Rubin Observatory

    Authors: William O'Mullane, Frossie Economou, Kian-Tat Lim, Fritz Mueller, Tim Jenness, Gregory P. Dubois-Felsmann, Leanne P. Guy, Ian S. Sullivan, Yusra AlSayyad, John D. Swinbank, K. Simon Krughoff

    Abstract: Starting from a description of the Rubin Observatory Data Management System Architecture, and drawing on our experience with and involvement in a range of other projects including Gaia, SDSS, UKIRT, and JCMT, we derive a series of generic design patterns and lessons learned.

    Submitted 24 November, 2022; originally announced November 2022.

    Comments: 10 pages ADASS XXXII submission

    Report number: DMTN-240

  32. arXiv:2211.01336  [pdf, other

    cs.IR cs.AI

    A Transformer-based Framework for POI-level Social Post Geolocation

    Authors: Menglin Li, Kwan Hui Lim, Teng Guo, Junhua Liu

    Abstract: POI-level geo-information of social posts is critical to many location-based applications and services. However, the multi-modality, complexity and diverse nature of social media data and their platforms limit the performance of inferring such fine-grained locations and their subsequent applications. To address this issue, we present a transformer-based general framework, which builds upon pre-tra… ▽ More

    Submitted 26 October, 2022; originally announced November 2022.

    Comments: Full papers are 12 pages in length plus additional 4 pages for references (turns to 18 pages in total after submitting to arxiv). One figure and 5 tables are contained. This paper was submitted to ECIR 2023 for review

  33. arXiv:2210.14260  [pdf, other

    cs.CL

    Universal Evasion Attacks on Summarization Scoring

    Authors: Wenchuan Mu, Kwan Hui Lim

    Abstract: The automatic scoring of summaries is important as it guides the development of summarizers. Scoring is also complex, as it involves multiple aspects such as fluency, grammar, and even textual entailment with the source text. However, summary scoring has not been considered a machine learning task to study its accuracy and robustness. In this study, we place automatic scoring in the context of reg… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  34. arXiv:2210.14257  [pdf, other

    cs.CL

    Revision for Concision: A Constrained Paraphrase Generation Task

    Authors: Wenchuan Mu, Kwan Hui Lim

    Abstract: Academic writing should be concise as concise sentences better keep the readers' attention and convey meaning clearly. Writing concisely is challenging, for writers often struggle to revise their drafts. We introduce and formulate revising for concision as a natural language processing task at the sentence level. Revising for concision requires algorithms to use only necessary words to rewrite a s… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  35. arXiv:2209.09742  [pdf, other

    cs.CL

    Yet Another Format of Universal Dependencies for Korean

    Authors: Yige Chen, Eunkyul Leah Jo, Yundong Yao, KyungTae Lim, Miikka Silfverberg, Francis M. Tyers, Jungyeul Park

    Abstract: In this study, we propose a morpheme-based scheme for Korean dependency parsing and adopt the proposed scheme to Universal Dependencies. We present the linguistic rationale that illustrates the motivation and the necessity of adopting the morpheme-based format, and develop scripts that convert between the original format used by Universal Dependencies and the proposed morpheme-based format automat… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

    Comments: COLING2022, Poster

  36. SERCNN: Stacked Embedding Recurrent Convolutional Neural Network in Detecting Depression on Twitter

    Authors: Heng Ee Tay, Mei Kuan Lim, Chun Yong Chong

    Abstract: Conventional approaches to identify depression are not scalable, and the public has limited awareness of mental health, especially in developing countries. As evident by recent studies, social media has the potential to complement mental health screening on a greater scale. The vast amount of first-person narrative posts in chronological order can provide insights into one's thoughts, feelings, be… ▽ More

    Submitted 5 August, 2022; v1 submitted 29 July, 2022; originally announced July 2022.

    Comments: This paper has been accepted at the AIHA 2022 workshop of the ICPR 2022 conference

  37. arXiv:2207.12021  [pdf, other

    cs.CL

    Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent

    Authors: Ethan A. Chi, Ashwin Paranjape, Abigail See, Caleb Chiam, Trenton Chang, Kathleen Kenealy, Swee Kiat Lim, Amelia Hardy, Chetanya Rastogi, Haojun Li, Alexander Iyabor, Yutong He, Hari Sowrirajan, Peng Qi, Kaushik Ram Sadagopan, Nguyet Minh Phu, Dilara Soylu, Jillian Tang, Avanika Narayan, Giovanni Campagna, Christopher D. Manning

    Abstract: We present Chirpy Cardinal, an open-domain social chatbot. Aiming to be both informative and conversational, our bot chats with users in an authentic, emotionally intelligent way. By integrating controlled neural generation with scaffolded, hand-written dialogue, we let both the user and bot take turns driving the conversation, producing an engaging and socially fluent experience. Deployed in the… ▽ More

    Submitted 16 January, 2023; v1 submitted 25 July, 2022; originally announced July 2022.

    Comments: SIGDIAL '22

  38. arXiv:2207.08911  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Deeply-Learned Generalized Linear Models with Missing Data

    Authors: David K Lim, Naim U Rashid, Junier B Oliva, Joseph G Ibrahim

    Abstract: Deep Learning (DL) methods have dramatically increased in popularity in recent years, with significant growth in their application to supervised learning problems in the biomedical sciences. However, the greater prevalence and complexity of missing data in modern biomedical datasets present significant challenges for DL methods. Here, we provide a formal treatment of missing data in the context of… ▽ More

    Submitted 26 October, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

    Journal ref: Journal of Computational and Graphical Statistics, 2023

  39. arXiv:2206.03277  [pdf

    cs.CY stat.AP

    Driving and charging an EV in Australia: A real-world analysis

    Authors: Thara Philip, Kai Li Lim, Jake Whitehead

    Abstract: As outlined by the Intergovernmental Panel on Climate Change, electric vehicles offer the greatest decarbonisation potential for land transport, in addition to other benefits, including reduced fuel and maintenance costs, improved air quality, reduced noise pollution, and improved national fuel security. Owing to these benefits, governments worldwide are planning and rolling out EV-favourable poli… ▽ More

    Submitted 25 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: This work has been published in Australasian Transport Research Forum (ATRF), proceedings (2022)

  40. arXiv:2205.07335  [pdf, ps, other

    cs.AI cs.LO

    Automating Defeasible Reasoning in Law

    Authors: How Khang Lim, Avishkar Mahajan, Martin Strecker, Meng Weng Wong

    Abstract: The paper studies defeasible reasoning in rule-based systems, in particular about legal norms and contracts. We identify rule modifiers that specify how rules interact and how they can be overridden. We then define rule transformations that eliminate these modifiers, leading in the end to a translation of rules to formulas. For reasoning with and about rules, we contrast two approaches, one in a c… ▽ More

    Submitted 15 May, 2022; originally announced May 2022.

    MSC Class: F.4.1

  41. arXiv:2204.08612  [pdf, other

    cs.CV

    Metamorphic Testing-based Adversarial Attack to Fool Deepfake Detectors

    Authors: Nyee Thoang Lim, Meng Yi Kuan, Muxin Pu, Mei Kuan Lim, Chun Yong Chong

    Abstract: Deepfakes utilise Artificial Intelligence (AI) techniques to create synthetic media where the likeness of one person is replaced with another. There are growing concerns that deepfakes can be maliciously used to create misleading and harmful digital contents. As deepfakes become more common, there is a dire need for deepfake detection technology to help spot deepfake media. Present deepfake detect… ▽ More

    Submitted 31 May, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

    Comments: paper accepted at 26TH International Conference on Pattern Recognition (ICPR2022)

  42. arXiv:2203.06825  [pdf, other

    cs.CV cs.SE

    Fairness Evaluation in Deepfake Detection Models using Metamorphic Testing

    Authors: Muxin Pu, Meng Yi Kuan, Nyee Thoang Lim, Chun Yong Chong, Mei Kuan Lim

    Abstract: Fairness of deepfake detectors in the presence of anomalies are not well investigated, especially if those anomalies are more prominent in either male or female subjects. The primary motivation for this work is to evaluate how deepfake detection model behaves under such anomalies. However, due to the black-box nature of deep learning (DL) and artificial intelligence (AI) systems, it is hard to pre… ▽ More

    Submitted 13 March, 2022; originally announced March 2022.

    Comments: 8 pages, accepted at 7th International Workshop on Metamorphic Testing (MET22)

  43. arXiv:2201.13226  [pdf, other

    cs.CY cs.AI cs.LG

    Online Assessment Misconduct Detection using Internet Protocol and Behavioural Classification

    Authors: Leslie Ching Ow Tiong, HeeJeong Jasmine Lee, Kai Li Lim

    Abstract: With the recent prevalence of remote education, academic assessments are often conducted online, leading to further concerns surrounding assessment misconducts. This paper investigates the potentials of online assessment misconduct (e-cheating) and proposes practical countermeasures against them. The mechanism for detecting the practices of online cheating is presented in the form of an e-cheating… ▽ More

    Submitted 23 January, 2022; originally announced January 2022.

  44. arXiv:2112.12940  [pdf, other

    cs.CL cs.AI

    Analyzing Scientific Publications using Domain-Specific Word Embedding and Topic Modelling

    Authors: Trisha Singhal, Junhua Liu, Lucienne T. M. Blessing, Kwan Hui Lim

    Abstract: The scientific world is changing at a rapid pace, with new technology being developed and new trends being set at an increasing frequency. This paper presents a framework for conducting scientific analyses of academic publications, which is crucial to monitor research trends and identify potential innovations. This framework adopts and combines various techniques of Natural Language Processing, su… ▽ More

    Submitted 23 December, 2021; originally announced December 2021.

    Comments: Accepted at the 2021 IEEE International Conference on Big Data (BigData2021)

  45. MemPool-3D: Boosting Performance and Efficiency of Shared-L1 Memory Many-Core Clusters with 3D Integration

    Authors: Matheus Cavalcante, Anthony Agnesina, Samuel Riedel, Moritz Brunion, Alberto Garcia-Ortiz, Dragomir Milojevic, Francky Catthoor, Sung Kyu Lim, Luca Benini

    Abstract: Three-dimensional integrated circuits promise power, performance, and footprint gains compared to their 2D counterparts, thanks to drastic reductions in the interconnects' length through their smaller form factor. We can leverage the potential of 3D integration by enhancing MemPool, an open-source many-core design with 256 cores and a shared pool of L1 scratchpad memory connected with a low-latenc… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: Accepted for publication in DATE 2022 -- Design, Automation and Test in Europe Conference

  46. arXiv:2108.12471  [pdf, other

    q-bio.QM cs.LG

    Machine learning on DNA-encoded library count data using an uncertainty-aware probabilistic loss function

    Authors: Katherine S. Lim, Andrew G. Reidenbach, Bruce K. Hua, Jeremy W. Mason, Christopher J. Gerry, Paul A. Clemons, Connor W. Coley

    Abstract: DNA-encoded library (DEL) screening and quantitative structure-activity relationship (QSAR) modeling are two techniques used in drug discovery to find small molecules that bind a protein target. Applying QSAR modeling to DEL data can facilitate the selection of compounds for off-DNA synthesis and evaluation. Such a combined approach has been shown recently by training binary classifiers to learn D… ▽ More

    Submitted 27 April, 2022; v1 submitted 27 August, 2021; originally announced August 2021.

  47. arXiv:2106.13121  [pdf, other

    cs.SI cs.AI cs.LG

    Real-time Spatio-temporal Event Detection on Geotagged Social Media

    Authors: Yasmeen George, Shanika Karunasekera, Aaron Harwood, Kwan Hui Lim

    Abstract: A key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or known beforehand. In this work, we propose an online… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

    Comments: Accepted to Journal of Big Data

  48. arXiv:2106.12747  [pdf

    cs.LG

    Automated Agriculture Commodity Price Prediction System with Machine Learning Techniques

    Authors: Zhiyuan Chen, Howe Seng Goh, Kai Ling Sin, Kelly Lim, Nicole Ka Hei Chung, Xin Yu Liew

    Abstract: The intention of this research is to study and design an automated agriculture commodity price prediction system with novel machine learning techniques. Due to the increasing large amounts historical data of agricultural commodity prices and the need of performing accurate prediction of price fluctuations, the solution has largely shifted from statistical methods to machine learning area. However,… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

    Comments: This paper has been submitted to Advances in Science, Technology and Engineering Systems Journal

  49. arXiv:2106.11815  [pdf, other

    cs.LG cs.CY cs.SI

    User Identification across Social Networking Sites using User Profiles and Posting Patterns

    Authors: Prashant Solanki, Kwan Hui Lim, Aaron Harwood

    Abstract: With the prevalence of online social networking sites (OSNs) and mobile devices, people are increasingly reliant on a variety of OSNs for keeping in touch with family and friends, and using it as a source of information. For example, a user might utilise multiple OSNs for different purposes, such as using Flickr to share holiday pictures with family and friends, and Twitter to post short messages… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: Accepted at the 2021 International Joint Conference on Neural Networks (IJCNN'21)

  50. arXiv:2106.11359  [pdf, other

    cs.CV cs.AI cs.LG

    Photozilla: A Large-Scale Photography Dataset and Visual Embedding for 20 Photography Styles

    Authors: Trisha Singhal, Junhua Liu, Lucienne T. M. Blessing, Kwan Hui Lim

    Abstract: The advent of social media platforms has been a catalyst for the development of digital photography that engendered a boom in vision applications. With this motivation, we introduce a large-scale dataset termed 'Photozilla', which includes over 990k images belonging to 10 different photographic styles. The dataset is then used to train 3 classification models to automatically classify the images i… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: In the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021. (Poster)