subscribe to arXiv mailings

Banishing LLM Hallucinations Requires Rethinking Generalization

Authors: Johnny Li, Saksham Consul, Eda Zhou, James Wong, Naila Farooqui, Yuxin Ye, Nithyashree Manohar, Zhuxiaona Wei, Tian Wu, Ben Echols, Sharon Zhou, Gregory Diamos

Abstract: Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional a… ▽ More Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations -- Lamini-1 -- that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.10729 [pdf, other]

A Comprehensive Survey of Foundation Models in Medicine

Authors: Wasif Khan, Seowung Leem, Kyle B. See, Joshua K. Wong, Shaoting Zhang, Ruogu Fang

Abstract: Foundation models (FMs) are large-scale deep-learning models trained on extensive datasets using self-supervised techniques. These models serve as a base for various downstream tasks, including healthcare. FMs have been adopted with great success across various domains within healthcare, including natural language processing (NLP), computer vision, graph learning, biology, and omics. Existing heal… ▽ More Foundation models (FMs) are large-scale deep-learning models trained on extensive datasets using self-supervised techniques. These models serve as a base for various downstream tasks, including healthcare. FMs have been adopted with great success across various domains within healthcare, including natural language processing (NLP), computer vision, graph learning, biology, and omics. Existing healthcare-based surveys have not yet included all of these domains. Therefore, this survey provides a comprehensive overview of FMs in healthcare. We focus on the history, learning strategies, flagship models, applications, and challenges of FMs. We explore how FMs such as the BERT and GPT families are reshaping various healthcare domains, including clinical large language models, medical image analysis, and omics data. Furthermore, we provide a detailed taxonomy of healthcare applications facilitated by FMs, such as clinical NLP, medical computer vision, graph learning, and other biology-related tasks. Despite the promising opportunities FMs provide, they also have several associated challenges, which are explained in detail. We also outline potential future directions to provide researchers and practitioners with insights into the potential and limitations of FMs in healthcare to advance their deployment and mitigate associated risks. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 44 pages, and a more compact version is under review

arXiv:2406.03636 [pdf, other]

Synthetic Programming Elicitation and Repair for Text-to-Code in Very Low-Resource Programming Languages

Authors: Federico Mora, Justin Wong, Haley Lepe, Sahil Bhatia, Karim Elmaaroufi, George Varghese, Joseph E. Gonzalez, Elizabeth Polgreen, Sanjit A. Seshia

Abstract: Recent advances in large language models (LLMs) for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Pro… ▽ More Recent advances in large language models (LLMs) for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Programming Languages (VLPLs). VLPLs appear in crucial settings, including domain-specific languages for internal tools and tool-chains for legacy languages. Inspired by an HCI technique called natural program elicitation, we propose designing an intermediate language that LLMs ``naturally'' know how to use and which can be automatically compiled to a target VLPL. When LLMs generate code that lies outside of this intermediate language, we use compiler techniques to repair the code into programs in the intermediate language. Overall, we introduce \emph{synthetic programming elicitation and compilation} (SPEAC), an approach that enables LLMs to generate syntactically valid code even for VLPLs. We empirically evaluate the performance of SPEAC in a case study and find that, compared to existing retrieval and fine-tuning baselines, SPEAC produces syntactically correct programs significantly more frequently without sacrificing semantic correctness. △ Less

Submitted 29 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: 15 pages, 6 figures, 1 table

arXiv:2406.02963 [pdf, other]

Dataset-Distillation Generative Model for Speech Emotion Recognition

Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Jeremy H. M Wong, Dianwen Ng, Hung-yi Lee, Nancy F. Chen, Eng Siong Chng

Abstract: Deep learning models for speech rely on large datasets, presenting computational challenges. Yet, performance hinges on training data size. Dataset Distillation (DD) aims to learn a smaller dataset without much performance degradation when training with it. DD has been investigated in computer vision but not yet in speech. This paper presents the first approach for DD to speech targeting Speech Em… ▽ More Deep learning models for speech rely on large datasets, presenting computational challenges. Yet, performance hinges on training data size. Dataset Distillation (DD) aims to learn a smaller dataset without much performance degradation when training with it. DD has been investigated in computer vision but not yet in speech. This paper presents the first approach for DD to speech targeting Speech Emotion Recognition on IEMOCAP. We employ Generative Adversarial Networks (GANs) not to mimic real data but to distil key discriminative information of IEMOCAP that is useful for downstream training. The GAN then replaces the original dataset and can sample custom synthetic dataset sizes. It performs comparably when following the original class imbalance but improves performance by 0.3% absolute UAR with balanced classes. It also reduces dataset storage and accelerates downstream training by 95% in both cases and reduces speaker information which could help for a privacy application. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted at Interspeech 2024

arXiv:2405.09546 [pdf, other]

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Authors: Yunhao Ge, Yihe Tang, Jiashu Xu, Cem Gokmen, Chengshu Li, Wensi Ai, Benjamin Jose Martinez, Arman Aydin, Mona Anvari, Ayush K Chakravarthy, Hong-Xing Yu, Josiah Wong, Sanjana Srivastava, Sharon Lee, Shengxin Zha, Laurent Itti, Yunzhu Li, Roberto Martín-Martín, Miao Liu, Pengchuan Zhang, Ruohan Zhang, Li Fei-Fei, Jiajun Wu

Abstract: The systematic evaluation and understanding of computer vision models under varying conditions require large amounts of data with comprehensive and customized labels, which real-world vision datasets rarely satisfy. While current synthetic data generators offer a promising alternative, particularly for embodied AI tasks, they often fall short for computer vision tasks due to low asset and renderin… ▽ More The systematic evaluation and understanding of computer vision models under varying conditions require large amounts of data with comprehensive and customized labels, which real-world vision datasets rarely satisfy. While current synthetic data generators offer a promising alternative, particularly for embodied AI tasks, they often fall short for computer vision tasks due to low asset and rendering quality, limited diversity, and unrealistic physical properties. We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models, based on the newly developed embodied AI benchmark, BEHAVIOR-1K. BVS supports a large number of adjustable parameters at the scene level (e.g., lighting, object placement), the object level (e.g., joint configuration, attributes such as "filled" and "folded"), and the camera level (e.g., field of view, focal length). Researchers can arbitrarily vary these parameters during data generation to perform controlled experiments. We showcase three example application scenarios: systematically evaluating the robustness of models across different continuous axes of domain shift, evaluating scene understanding models on the same set of images, and training and evaluating simulation-to-real transfer for a novel vision task: unary and binary state prediction. Project website: https://behavior-vision-suite.github.io/ △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: CVPR 2024 (Highlight). Project website: https://behavior-vision-suite.github.io/

arXiv:2404.18928 [pdf, other]

Stylus: Automatic Adapter Selection for Diffusion Models

Authors: Michael Luo, Justin Wong, Brandon Trabucco, Yanping Huang, Joseph E. Gonzalez, Zhifeng Chen, Ruslan Salakhutdinov, Ion Stoica

Abstract: Beyond scaling base models with more data or parameters, fine-tuned adapters provide an alternative way to generate high fidelity, custom images at reduced costs. As such, adapters have been widely adopted by open-source communities, accumulating a database of over 100K adapters-most of which are highly customized with insufficient descriptions. This paper explores the problem of matching the prom… ▽ More Beyond scaling base models with more data or parameters, fine-tuned adapters provide an alternative way to generate high fidelity, custom images at reduced costs. As such, adapters have been widely adopted by open-source communities, accumulating a database of over 100K adapters-most of which are highly customized with insufficient descriptions. This paper explores the problem of matching the prompt to a set of relevant adapters, built on recent work that highlight the performance gains of composing adapters. We introduce Stylus, which efficiently selects and automatically composes task-specific adapters based on a prompt's keywords. Stylus outlines a three-stage approach that first summarizes adapters with improved descriptions and embeddings, retrieves relevant adapters, and then further assembles adapters based on prompts' keywords by checking how well they fit the prompt. To evaluate Stylus, we developed StylusDocs, a curated dataset featuring 75K adapters with pre-computed adapter embeddings. In our evaluation on popular Stable Diffusion checkpoints, Stylus achieves greater CLIP-FID Pareto efficiency and is twice as preferred, with humans and multimodal models as evaluators, over the base model. See stylus-diffusion.github.io for more. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Project Website: https://stylus-diffusion.github.io

arXiv:2404.13165 [pdf, other]

Holding the Line: A Study of Writers' Attitudes on Co-creativity with AI

Authors: Morteza Behrooz, Yuandong Tian, William Ngan, Yael Yungster, Justin Wong, David Zax

Abstract: Generative AI has put many professional writers on the defensive; a major negotiation point of the recent Writers Guild of America's strike concerned use of AI. However, must AI threaten writers, their livelihoods or their creativity? And under what conditions, if any, might AI assistance be invited by different types of writers (from the amateur to the professional, from the screenwriter to the n… ▽ More Generative AI has put many professional writers on the defensive; a major negotiation point of the recent Writers Guild of America's strike concerned use of AI. However, must AI threaten writers, their livelihoods or their creativity? And under what conditions, if any, might AI assistance be invited by different types of writers (from the amateur to the professional, from the screenwriter to the novelist)? To explore these questions, we conducted a qualitative study with 37 writers. We found that most writing occurs across five stages and within one of three modes; we additionally map openness to AI assistance to each intersecting stage-mode. We found that most writers were interested in AI assistance to some degree, but some writers felt drawing firm boundaries with an AI was key to their comfort using such systems. Designers can leverage these insights to build agency-respecting AI products for writers. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.11816 [pdf, other]

Tailoring Generative Adversarial Networks for Smooth Airfoil Design

Authors: Joyjit Chattoraj, Jian Cheng Wong, Zhang Zexuan, Manna Dai, Xia Yingzhi, Li Jichao, Xu Xinxing, Ooi Chin Chun, Yang Feng, Dao My Ha, Liu Yong

Abstract: In the realm of aerospace design, achieving smooth curves is paramount, particularly when crafting objects such as airfoils. Generative Adversarial Network (GAN), a widely employed generative AI technique, has proven instrumental in synthesizing airfoil designs. However, a common limitation of GAN is the inherent lack of smoothness in the generated airfoil surfaces. To address this issue, we prese… ▽ More In the realm of aerospace design, achieving smooth curves is paramount, particularly when crafting objects such as airfoils. Generative Adversarial Network (GAN), a widely employed generative AI technique, has proven instrumental in synthesizing airfoil designs. However, a common limitation of GAN is the inherent lack of smoothness in the generated airfoil surfaces. To address this issue, we present a GAN model featuring a customized loss function built to produce seamlessly contoured airfoil designs. Additionally, our model demonstrates a substantial increase in design diversity compared to a conventional GAN augmented with a post-processing smoothing filter. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2403.18639 [pdf, other]

Dependency Aware Incident Linking in Large Cloud Systems

Authors: Supriyo Ghosh, Karish Grover, Jimmy Wong, Chetan Bansal, Rakesh Namineni, Mohit Verma, Saravan Rajmohan

Abstract: Despite significant reliability efforts, large-scale cloud services inevitably experience production incidents that can significantly impact service availability and customer's satisfaction. Worse, in many cases one incident can lead to multiple downstream failures due to cascading effects that creates several related incidents across different dependent services. Often time On-call Engineers (OCE… ▽ More Despite significant reliability efforts, large-scale cloud services inevitably experience production incidents that can significantly impact service availability and customer's satisfaction. Worse, in many cases one incident can lead to multiple downstream failures due to cascading effects that creates several related incidents across different dependent services. Often time On-call Engineers (OCEs) examine these incidents in silos that lead to significant amount of manual toil and increase the overall time-to-mitigate incidents. Therefore, developing efficient incident linking models is of paramount importance for grouping related incidents into clusters so as to quickly resolve major outages and reduce on-call fatigue. Existing incident linking methods mostly leverages textual and contextual information of incidents (e.g., title, description, severity, impacted components), thus failing to leverage the inter-dependencies between services. In this paper, we propose the dependency-aware incident linking (DiLink) framework which leverages both textual and service dependency graph information to improve the accuracy and coverage of incident links not only coming from same service, but also from different services and workloads. Furthermore, we propose a novel method to align the embeddings of multi-modal (i.e., textual and graphical) data using Orthogonal Procrustes. Extensive experimental results on real-world incidents from 5 workloads of Microsoft demonstrate that our alignment method has an F1-score of 0.96 (14% gain over current state-of-the-art methods). We are also in the process of deploying this solution across 610 services from these 5 workloads for continuously supporting OCEs improving incident management and reducing manual toil. △ Less

Submitted 5 February, 2024; originally announced March 2024.

arXiv:2403.15404 [pdf]

doi 10.5281/zenodo.10680345

AI Sustainability in Practice Part Two: Sustainability Throughout the AI Workflow

Authors: David Leslie, Cami Rincon, Morgan Briggs, Antonella Perini, Smera Jayadeva, Ann Borda, SJ Bennett, Christopher Burr, Mhairi Aitken, Michael Katell, Claudia Fischer, Janis Wong, Ismael Kherroubi Garcia

Abstract: The sustainability of AI systems depends on the capacity of project teams to proceed with a continuous sensitivity to their potential real-world impacts and transformative effects. Stakeholder Impact Assessments (SIAs) are governance mechanisms that enable this kind of responsiveness. They are tools that create a procedure for, and a means of documenting, the collaborative evaluation and reflectiv… ▽ More The sustainability of AI systems depends on the capacity of project teams to proceed with a continuous sensitivity to their potential real-world impacts and transformative effects. Stakeholder Impact Assessments (SIAs) are governance mechanisms that enable this kind of responsiveness. They are tools that create a procedure for, and a means of documenting, the collaborative evaluation and reflective anticipation of the possible harms and benefits of AI innovation projects. SIAs are not one-off governance actions. They require project teams to pay continuous attention to the dynamic and changing character of AI production and use and to the shifting conditions of the real-world environments in which AI technologies are embedded. This workbook is part two of two workbooks on AI Sustainability. It provides a template of the SIA and activities that allow a deeper dive into crucial parts of it. It discusses methods for weighing values and considering trade-offs during the SIA. And, it highlights the need to treat the SIA as an end-to-end process of responsive evaluation and re-assessment. △ Less

Submitted 19 February, 2024; originally announced March 2024.

arXiv:2403.14636 [pdf]

doi 10.5281/zenodo.10680527

AI Fairness in Practice

Authors: David Leslie, Cami Rincon, Morgan Briggs, Antonella Perini, Smera Jayadeva, Ann Borda, SJ Bennett, Christopher Burr, Mhairi Aitken, Michael Katell, Claudia Fischer, Janis Wong, Ismael Kherroubi Garcia

Abstract: Reaching consensus on a commonly accepted definition of AI Fairness has long been a central challenge in AI ethics and governance. There is a broad spectrum of views across society on what the concept of fairness means and how it should best be put to practice. In this workbook, we tackle this challenge by exploring how a context-based and society-centred approach to understanding AI Fairness can… ▽ More Reaching consensus on a commonly accepted definition of AI Fairness has long been a central challenge in AI ethics and governance. There is a broad spectrum of views across society on what the concept of fairness means and how it should best be put to practice. In this workbook, we tackle this challenge by exploring how a context-based and society-centred approach to understanding AI Fairness can help project teams better identify, mitigate, and manage the many ways that unfair bias and discrimination can crop up across the AI project workflow. We begin by exploring how, despite the plurality of understandings about the meaning of fairness, priorities of equality and non-discrimination have come to constitute the broadly accepted core of its application as a practical principle. We focus on how these priorities manifest in the form of equal protection from direct and indirect discrimination and from discriminatory harassment. These elements form ethical and legal criteria based upon which instances of unfair bias and discrimination can be identified and mitigated across the AI project workflow. We then take a deeper dive into how the different contexts of the AI project lifecycle give rise to different fairness concerns. This allows us to identify several types of AI Fairness (Data Fairness, Application Fairness, Model Design and Development Fairness, Metric-Based Fairness, System Implementation Fairness, and Ecosystem Fairness) that form the basis of a multi-lens approach to bias identification, mitigation, and management. Building on this, we discuss how to put the principle of AI Fairness into practice across the AI project workflow through Bias Self-Assessment and Bias Risk Management as well as through the documentation of metric-based fairness criteria in a Fairness Position Statement. △ Less

Submitted 19 February, 2024; originally announced March 2024.

arXiv:2403.14635 [pdf]

doi 10.5281/zenodo.10680113

AI Sustainability in Practice Part One: Foundations for Sustainable AI Projects

Authors: David Leslie, Cami Rincon, Morgan Briggs, Antonella Perini, Smera Jayadeva, Ann Borda, SJ Bennett, Christopher Burr, Mhairi Aitken, Michael Katell, Claudia Fischer, Janis Wong, Ismael Kherroubi Garcia

Abstract: Sustainable AI projects are continuously responsive to the transformative effects as well as short-, medium-, and long-term impacts on individuals and society that the design, development, and deployment of AI technologies may have. Projects, which centre AI Sustainability, ensure that values-led, collaborative, and anticipatory reflection both guides the assessment of potential social and ethical… ▽ More Sustainable AI projects are continuously responsive to the transformative effects as well as short-, medium-, and long-term impacts on individuals and society that the design, development, and deployment of AI technologies may have. Projects, which centre AI Sustainability, ensure that values-led, collaborative, and anticipatory reflection both guides the assessment of potential social and ethical impacts and steers responsible innovation practices. This workbook is the first part of a pair that provides the concepts and tools needed to put AI Sustainability into practice. It introduces the SUM Values, which help AI project teams to assess the potential societal impacts and ethical permissibility of their projects. It then presents a Stakeholder Engagement Process (SEP), which provides tools to facilitate proportionate engagement of and input from stakeholders with an emphasis on equitable and meaningful participation and positionality awareness. △ Less

Submitted 19 February, 2024; originally announced March 2024.

arXiv:2403.09227 [pdf, other]

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

Authors: Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, Hang Yin, Michael Lingelbach, Minjune Hwang, Ayano Hiranaka, Sujay Garlanka, Arman Aydin, Sharon Lee, Jiankai Sun, Mona Anvari, Manasi Sharma, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R Matthews , et al. (10 additional authors not shown)

Abstract: We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics. BEHAVIOR-1K includes two components, guided and motivated by the results of an extensive survey on "what do you want robots to do for you?". The first is the definition of 1,000 everyday activities, grounded in 50 scenes (houses, gardens, restaurants, offices, etc.) with more than 9,000 objects annotated with… ▽ More We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics. BEHAVIOR-1K includes two components, guided and motivated by the results of an extensive survey on "what do you want robots to do for you?". The first is the definition of 1,000 everyday activities, grounded in 50 scenes (houses, gardens, restaurants, offices, etc.) with more than 9,000 objects annotated with rich physical and semantic properties. The second is OMNIGIBSON, a novel simulation environment that supports these activities via realistic physics simulation and rendering of rigid bodies, deformable bodies, and liquids. Our experiments indicate that the activities in BEHAVIOR-1K are long-horizon and dependent on complex manipulation skills, both of which remain a challenge for even state-of-the-art robot learning solutions. To calibrate the simulation-to-reality gap of BEHAVIOR-1K, we provide an initial study on transferring solutions learned with a mobile manipulator in a simulated apartment to its real-world counterpart. We hope that BEHAVIOR-1K's human-grounded nature, diversity, and realism make it valuable for embodied AI and robot learning research. Project website: https://behavior.stanford.edu. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: A preliminary version was published at 6th Conference on Robot Learning (CoRL 2022)

arXiv:2401.03676 [pdf, other]

Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education

Authors: Wei Hung Pan, Ming Jie Chok, Jonathan Leong Shan Wong, Yung Xin Shin, Yeong Shian Poon, Zhou Yang, Chun Yong Chong, David Lo, Mei Kuan Lim

Abstract: Educators are increasingly concerned about the usage of Large Language Models (LLMs) such as ChatGPT in programming education, particularly regarding the potential exploitation of imperfections in Artificial Intelligence Generated Content (AIGC) Detectors for academic misconduct. In this paper, we present an empirical study where the LLM is examined for its attempts to bypass detection by AIGC Det… ▽ More Educators are increasingly concerned about the usage of Large Language Models (LLMs) such as ChatGPT in programming education, particularly regarding the potential exploitation of imperfections in Artificial Intelligence Generated Content (AIGC) Detectors for academic misconduct. In this paper, we present an empirical study where the LLM is examined for its attempts to bypass detection by AIGC Detectors. This is achieved by generating code in response to a given question using different variants. We collected a dataset comprising 5,069 samples, with each sample consisting of a textual description of a coding problem and its corresponding human-written Python solution codes. These samples were obtained from various sources, including 80 from Quescol, 3,264 from Kaggle, and 1,725 from LeetCode. From the dataset, we created 13 sets of code problem variant prompts, which were used to instruct ChatGPT to generate the outputs. Subsequently, we assessed the performance of five AIGC detectors. Our results demonstrate that existing AIGC Detectors perform poorly in distinguishing between human-written code and AI-generated code. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 11 pages, paper accepted at 46th International Conference on Software Engineering, Software Engineering Education and Training Track (ICSE-SEET 2024)

arXiv:2401.02450 [pdf, other]

Locally Differentially Private Embedding Models in Distributed Fraud Prevention Systems

Authors: Iker Perez, Jason Wong, Piotr Skalski, Stuart Burrell, Richard Mortier, Derek McAuley, David Sutton

Abstract: Global financial crime activity is driving demand for machine learning solutions in fraud prevention. However, prevention systems are commonly serviced to financial institutions in isolation, and few provisions exist for data sharing due to fears of unintentional leaks and adversarial attacks. Collaborative learning advances in finance are rare, and it is hard to find real-world insights derived f… ▽ More Global financial crime activity is driving demand for machine learning solutions in fraud prevention. However, prevention systems are commonly serviced to financial institutions in isolation, and few provisions exist for data sharing due to fears of unintentional leaks and adversarial attacks. Collaborative learning advances in finance are rare, and it is hard to find real-world insights derived from privacy-preserving data processing systems. In this paper, we present a collaborative deep learning framework for fraud prevention, designed from a privacy standpoint, and awarded at the recent PETs Prize Challenges. We leverage latent embedded representations of varied-length transaction sequences, along with local differential privacy, in order to construct a data release mechanism which can securely inform externally hosted fraud and anomaly detection models. We assess our contribution on two distributed data sets donated by large payment networks, and demonstrate robustness to popular inference-time attacks, along with utility-privacy trade-offs analogous to published work in alternative application domains. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2401.01641 [pdf, other]

doi 10.1145/3604237.3626850

Towards a Foundation Purchasing Model: Pretrained Generative Autoregression on Transaction Sequences

Authors: Piotr Skalski, David Sutton, Stuart Burrell, Iker Perez, Jason Wong

Abstract: Machine learning models underpin many modern financial systems for use cases such as fraud detection and churn prediction. Most are based on supervised learning with hand-engineered features, which relies heavily on the availability of labelled data. Large self-supervised generative models have shown tremendous success in natural language processing and computer vision, yet so far they haven't bee… ▽ More Machine learning models underpin many modern financial systems for use cases such as fraud detection and churn prediction. Most are based on supervised learning with hand-engineered features, which relies heavily on the availability of labelled data. Large self-supervised generative models have shown tremendous success in natural language processing and computer vision, yet so far they haven't been adapted to multivariate time series of financial transactions. In this paper, we present a generative pretraining method that can be used to obtain contextualised embeddings of financial transactions. Benchmarks on public datasets demonstrate that it outperforms state-of-the-art self-supervised methods on a range of downstream tasks. We additionally perform large-scale pretraining of an embedding model using a corpus of data from 180 issuing banks containing 5.1 billion transactions and apply it to the card fraud detection problem on hold-out datasets. The embedding model significantly improves value detection rate at high precision thresholds and transfers well to out-of-domain distributions. △ Less

Submitted 4 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Journal ref: 4th ACM International Conference on AI in Finance (ICAIF '23), November 27-29, 2023, Brooklyn, NY, USA

arXiv:2312.12153 [pdf, other]

Noise robust distillation of self-supervised speech models via correlation metrics

Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Dianwen Ng, Jeremy H. M. Wong, Hung-yi Lee, Eng Siong Chng, Nancy F. Chen

Abstract: Compared to large speech foundation models, small distilled models exhibit degraded noise robustness. The student's robustness can be improved by introducing noise at the inputs during pre-training. Despite this, using the standard distillation loss still yields a student with degraded performance. Thus, this paper proposes improving student robustness via distillation with correlation metrics. Te… ▽ More Compared to large speech foundation models, small distilled models exhibit degraded noise robustness. The student's robustness can be improved by introducing noise at the inputs during pre-training. Despite this, using the standard distillation loss still yields a student with degraded performance. Thus, this paper proposes improving student robustness via distillation with correlation metrics. Teacher behavior is learned by maximizing the teacher and student cross-correlation matrix between their representations towards identity. Noise robustness is encouraged via the student's self-correlation minimization. The proposed method is agnostic of the teacher model and consistently outperforms the previous approach. This work also proposes an heuristic to weigh the importance of the two correlation terms automatically. Experiments show consistently better clean and noise generalization on Intent Classification, Keyword Spotting, and Automatic Speech Recognition tasks on SUPERB Challenge. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 6 pages

arXiv:2312.03243 [pdf, other]

Generalizable Neural Physics Solvers by Baldwinian Evolution

Authors: Jian Cheng Wong, Chin Chun Ooi, Abhishek Gupta, Pao-Hsiung Chiu, Joshua Shao Zheng Low, My Ha Dao, Yew-Soon Ong

Abstract: Physics-informed neural networks (PINNs) are at the forefront of scientific machine learning, making possible the creation of machine intelligence that is cognizant of physical laws and able to accurately simulate them. In this paper, the potential of discovering PINNs that generalize over an entire family of physics tasks is studied, for the first time, through a biological lens of the Baldwin ef… ▽ More Physics-informed neural networks (PINNs) are at the forefront of scientific machine learning, making possible the creation of machine intelligence that is cognizant of physical laws and able to accurately simulate them. In this paper, the potential of discovering PINNs that generalize over an entire family of physics tasks is studied, for the first time, through a biological lens of the Baldwin effect. Drawing inspiration from the neurodevelopment of precocial species that have evolved to learn, predict and react quickly to their environment, we envision PINNs that are pre-wired with connection strengths inducing strong biases towards efficient learning of physics. To this end, evolutionary selection pressure (guided by proficiency over a family of tasks) is coupled with lifetime learning (to specialize on a smaller subset of those tasks) to produce PINNs that demonstrate fast and physics-compliant prediction capabilities across a range of empirically challenging problem instances. The Baldwinian approach achieves an order of magnitude improvement in prediction accuracy at a fraction of the computation cost compared to state-of-the-art results with PINNs meta-learned by gradient descent. This paper marks a leap forward in the meta-learning of PINNs as generalizable physics solvers. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2312.01109 [pdf, other]

Kattis vs. ChatGPT: Assessment and Evaluation of Programming Tasks in the Age of Artificial Intelligence

Authors: Nora Dunder, Saga Lundborg, Olga Viberg, Jacqueline Wong

Abstract: AI-powered education technologies can support students and teachers in computer science education. However, with the recent developments in generative AI, and especially the increasingly emerging popularity of ChatGPT, the effectiveness of using large language models for solving programming tasks has been underexplored. The present study examines ChatGPT's ability to generate code solutions at dif… ▽ More AI-powered education technologies can support students and teachers in computer science education. However, with the recent developments in generative AI, and especially the increasingly emerging popularity of ChatGPT, the effectiveness of using large language models for solving programming tasks has been underexplored. The present study examines ChatGPT's ability to generate code solutions at different difficulty levels for introductory programming courses. We conducted an experiment where ChatGPT was tested on 127 randomly selected programming problems provided by Kattis, an automatic software grading tool for computer science programs, often used in higher education. The results showed that ChatGPT independently could solve 19 out of 127 programming tasks generated and assessed by Kattis. Further, ChatGPT was found to be able to generate accurate code solutions for simple problems but encountered difficulties with more complex programming tasks. The results contribute to the ongoing debate on the utility of AI-powered tools in programming education. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: 10 pages, 2 figures, 3 tables. (Pre-print). Final version to be submitted to ACM Journals. LAK2024, March,18-22, 2024, Kyoto, Japan

ACM Class: I.2.0

arXiv:2310.16684 [pdf, other]

Local Statistics for Generative Image Detection

Authors: Yung Jer Wong, Teck Khim Ng

Abstract: Diffusion models (DMs) are generative models that learn to synthesize images from Gaussian noise. DMs can be trained to do a variety of tasks such as image generation and image super-resolution. Researchers have made significant improvement in the capability of synthesizing photorealistic images in the past few years. These successes also hasten the need to address the potential misuse of synthesi… ▽ More Diffusion models (DMs) are generative models that learn to synthesize images from Gaussian noise. DMs can be trained to do a variety of tasks such as image generation and image super-resolution. Researchers have made significant improvement in the capability of synthesizing photorealistic images in the past few years. These successes also hasten the need to address the potential misuse of synthesized images. In this paper, we highlight the effectiveness of computing local statistics, as opposed to global statistics, in distinguishing digital camera images from DM-generated images. We hypothesized that local statistics should be used to address the spatial non-stationarity problem in images. We show that our approach produced promising results and it is also robust to various perturbations such as image resizing and JPEG compression. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2308.07931 [pdf, other]

Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation

Authors: William Shen, Ge Yang, Alan Yu, Jansen Wong, Leslie Pack Kaelbling, Phillip Isola

Abstract: Self-supervised and language-supervised image models contain rich knowledge of the world that is important for generalization. Many robotic tasks, however, require a detailed understanding of 3D geometry, which is often lacking in 2D image features. This work bridges this 2D-to-3D gap for robotic manipulation by leveraging distilled feature fields to combine accurate 3D geometry with rich semantic… ▽ More Self-supervised and language-supervised image models contain rich knowledge of the world that is important for generalization. Many robotic tasks, however, require a detailed understanding of 3D geometry, which is often lacking in 2D image features. This work bridges this 2D-to-3D gap for robotic manipulation by leveraging distilled feature fields to combine accurate 3D geometry with rich semantics from 2D foundation models. We present a few-shot learning method for 6-DOF grasping and placing that harnesses these strong spatial and semantic priors to achieve in-the-wild generalization to unseen objects. Using features distilled from a vision-language model, CLIP, we present a way to designate novel objects for manipulation via free-text natural language, and demonstrate its ability to generalize to unseen expressions and novel categories of objects. △ Less

Submitted 29 December, 2023; v1 submitted 27 July, 2023; originally announced August 2023.

Comments: Project website at https://f3rm.csail.mit.edu, Accepted at the 7th Annual Conference on Robot Learning (CoRL), 2023 in Atlanta, US

arXiv:2308.01999 [pdf, other]

cuQuantum SDK: A High-Performance Library for Accelerating Quantum Science

Authors: Harun Bayraktar, Ali Charara, David Clark, Saul Cohen, Timothy Costa, Yao-Lung L. Fang, Yang Gao, Jack Guan, John Gunnels, Azzam Haidar, Andreas Hehn, Markus Hohnerbach, Matthew Jones, Tom Lubowe, Dmitry Lyakh, Shinya Morino, Paul Springer, Sam Stanwyck, Igor Terentyev, Satya Varadhan, Jonathan Wong, Takuma Yamaguchi

Abstract: We present the NVIDIA cuQuantum SDK, a state-of-the-art library of composable primitives for GPU-accelerated quantum circuit simulations. As the size of quantum devices continues to increase, making their classical simulation progressively more difficult, the availability of fast and scalable quantum circuit simulators becomes vital for quantum algorithm developers, as well as quantum hardware eng… ▽ More We present the NVIDIA cuQuantum SDK, a state-of-the-art library of composable primitives for GPU-accelerated quantum circuit simulations. As the size of quantum devices continues to increase, making their classical simulation progressively more difficult, the availability of fast and scalable quantum circuit simulators becomes vital for quantum algorithm developers, as well as quantum hardware engineers focused on the validation and optimization of quantum devices. The cuQuantum SDK was created to accelerate and scale up quantum circuit simulators developed by the quantum information science community by enabling them to utilize efficient scalable software building blocks optimized for NVIDIA GPU platforms. The functional building blocks provided cover the needs of both state vector- and tensor network- based simulators, including approximate tensor network simulation methods based on matrix product state, projected entangled pair state, and other factorized tensor representations. By leveraging the enormous computing power of the latest NVIDIA GPU architectures, quantum circuit simulators that have adopted the cuQuantum SDK demonstrate significant acceleration, compared to CPU-only execution, for both the state vector and tensor network simulation methods. Furthermore, by utilizing the parallel primitives available in the cuQuantum SDK, one can easily transition to distributed GPU-accelerated platforms, including those furnished by cloud service providers and high-performance computing systems deployed by supercomputing centers, extending the scale of possible quantum circuit simulations. The rich capabilities provided by the SDK are conveniently made available via both Python and C application programming interfaces, where the former is directly targeting a broad Python quantum community and the latter allows tight integration with simulators written in any programming language. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: paper accepted at QCE 2023, journal reference will be updated whenever available

MSC Class: 68Q12; 68Q09; 81P68;

arXiv:2306.02719 [pdf, ps, other]

Multiple output samples per input in a single-output Gaussian process

Authors: Jeremy H. M. Wong, Huayun Zhang, Nancy F. Chen

Abstract: The standard Gaussian Process (GP) only considers a single output sample per input in the training set. Datasets for subjective tasks, such as spoken language assessment, may be annotated with output labels from multiple human raters per input. This paper proposes to generalise the GP to allow for these multiple output samples in the training set, and thus make use of available output uncertainty… ▽ More The standard Gaussian Process (GP) only considers a single output sample per input in the training set. Datasets for subjective tasks, such as spoken language assessment, may be annotated with output labels from multiple human raters per input. This paper proposes to generalise the GP to allow for these multiple output samples in the training set, and thus make use of available output uncertainty information. This differs from a multi-output GP, as all output samples are from the same task here. The output density function is formulated to be the joint likelihood of observing all output samples, and latent variables are not repeated to reduce computation cost. The test set predictions are inferred similarly to a standard GP, with a difference being in the optimised hyper-parameters. This is evaluated on speechocean762, showing that it allows the GP to compute a test set output distribution that is more similar to the collection of reference outputs from the multiple human raters. △ Less

Submitted 25 January, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: This paper is presented in the "Symposium for Celebrating 40 Years of Bayesian Learning in Speech and Language Processing and Beyond", which is a satellite event of the ASRU workshop, on 20 December 2023. https://bayesian40.github.io/

arXiv:2305.15365 [pdf, other]

Boundary Attention Mapping (BAM): Fine-grained saliency maps for segmentation of Burn Injuries

Authors: Mahla Abdolahnejad, Justin Lee, Hannah Chan, Alex Morzycki, Olivier Ethier, Anthea Mo, Peter X. Liu, Joshua N. Wong, Colin Hong, Rakesh Joshi

Abstract: Burn injuries can result from mechanisms such as thermal, chemical, and electrical insults. A prompt and accurate assessment of burns is essential for deciding definitive clinical treatments. Currently, the primary approach for burn assessments, via visual and tactile observations, is approximately 60%-80% accurate. The gold standard is biopsy and a close second would be non-invasive methods like… ▽ More Burn injuries can result from mechanisms such as thermal, chemical, and electrical insults. A prompt and accurate assessment of burns is essential for deciding definitive clinical treatments. Currently, the primary approach for burn assessments, via visual and tactile observations, is approximately 60%-80% accurate. The gold standard is biopsy and a close second would be non-invasive methods like Laser Doppler Imaging (LDI) assessments, which have up to 97% accuracy in predicting burn severity and the required healing time. In this paper, we introduce a machine learning pipeline for assessing burn severities and segmenting the regions of skin that are affected by burn. Segmenting 2D colour images of burns allows for the injured versus non-injured skin to be delineated, clearly marking the extent and boundaries of the localized burn/region-of-interest, even during remote monitoring of a burn patient. We trained a convolutional neural network (CNN) to classify four severities of burns. We built a saliency mapping method, Boundary Attention Mapping (BAM), that utilises this trained CNN for the purpose of accurately localizing and segmenting the burn regions from skin burn images. We demonstrated the effectiveness of our proposed pipeline through extensive experiments and evaluations using two datasets; 1) A larger skin burn image dataset consisting of 1684 skin burn images of four burn severities, 2) An LDI dataset that consists of a total of 184 skin burn images with their associated LDI scans. The CNN trained using the first dataset achieved an average F1-Score of 78% and micro/macro- average ROC of 85% in classifying the four burn severities. Moreover, a comparison between the BAM results and LDI results for measuring injury boundary showed that the segmentations generated by our method achieved 91.60% accuracy, 78.17% sensitivity, and 93.37% specificity. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.12564 [pdf]

ChatGPT Is More Likely to Be Perceived as Male Than Female

Authors: Jared Wong, Jin Kim

Abstract: We investigate how people perceive ChatGPT, and, in particular, how they assign human-like attributes such as gender to the chatbot. Across five pre-registered studies (N = 1,552), we find that people are more likely to perceive ChatGPT to be male than female. Specifically, people perceive male gender identity (1) following demonstrations of ChatGPT's core abilities (e.g., providing information or… ▽ More We investigate how people perceive ChatGPT, and, in particular, how they assign human-like attributes such as gender to the chatbot. Across five pre-registered studies (N = 1,552), we find that people are more likely to perceive ChatGPT to be male than female. Specifically, people perceive male gender identity (1) following demonstrations of ChatGPT's core abilities (e.g., providing information or summarizing text), (2) in the absence of such demonstrations, and (3) across different methods of eliciting perceived gender (using various scales and asking to name ChatGPT). Moreover, we find that this seemingly default perception of ChatGPT as male can reverse when ChatGPT's feminine-coded abilities are highlighted (e.g., providing emotional support for a user). △ Less

Submitted 21 May, 2023; originally announced May 2023.

arXiv:2304.09099 [pdf]

MATURE-HEALTH: HEALTH Recommender System for MAndatory FeaTURE choices

Authors: Ritu Shandilya, Sugam Sharma, Johnny Wong

Abstract: Balancing electrolytes is utmost important and essential for appropriate functioning of organs in human body as electrolytes imbalance can be an indication of the development of underlying pathophysiology. Efficient monitoring of electrolytes imbalance not only can increase the chances of early detection of disease, but also prevents the further deterioration of the health by strictly following nu… ▽ More Balancing electrolytes is utmost important and essential for appropriate functioning of organs in human body as electrolytes imbalance can be an indication of the development of underlying pathophysiology. Efficient monitoring of electrolytes imbalance not only can increase the chances of early detection of disease, but also prevents the further deterioration of the health by strictly following nutrient controlled diet for balancing the electrolytes post disease detection. In this research, a recommender system MATURE Health is proposed and implemented, which predicts the imbalance of mandatory electrolytes and other substances presented in blood and recommends the food items with the balanced nutrients to avoid occurrence of the electrolytes imbalance. The proposed model takes user most recent laboratory results and daily food intake into account to predict the electrolytes imbalance. MATURE Health relies on MATURE Food algorithm to recommend food items as latter recommends only those food items that satisfy all mandatory nutrient requirements while also considering user past food preferences. To validate the proposed method, particularly sodium, potassium, and BUN levels have been predicted with prediction algorithm, Random Forest, for dialysis patients using their laboratory reports history and daily food intake. And, the proposed model demonstrates 99.53 percent, 96.94 percent and 95.35 percent accuracy for Sodium, Potassium, and BUN respectively. MATURE Health is a novel health recommender system that implements machine learning models to predict the imbalance of mandatory electrolytes and other substances in the blood and recommends the food items which contain the required amount of the nutrients that prevent or at least reduce the risk of the electrolytes imbalance. △ Less

Submitted 20 April, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

Comments: Author version of the paper

arXiv:2304.01305 [pdf, other]

PyFlyt -- UAV Simulation Environments for Reinforcement Learning Research

Authors: Jun Jet Tai, Jim Wong, Mauro Innocente, Nadjim Horri, James Brusey, Swee King Phang

Abstract: Unmanned aerial vehicles (UAVs) have numerous applications, but their efficient and optimal flight can be a challenge. Reinforcement Learning (RL) has emerged as a promising approach to address this challenge, yet there is no standardized library for testing and benchmarking RL algorithms on UAVs. In this paper, we introduce PyFlyt, a platform built on the Bullet physics engine with native Gymnasi… ▽ More Unmanned aerial vehicles (UAVs) have numerous applications, but their efficient and optimal flight can be a challenge. Reinforcement Learning (RL) has emerged as a promising approach to address this challenge, yet there is no standardized library for testing and benchmarking RL algorithms on UAVs. In this paper, we introduce PyFlyt, a platform built on the Bullet physics engine with native Gymnasium API support. PyFlyt provides modular implementations of simple components, such as motors and lifting surfaces, allowing for the implementation of UAVs of arbitrary configurations. Additionally, PyFlyt includes various task definitions and multiple reward function settings for each vehicle type. We demonstrate the effectiveness of PyFlyt by training various RL agents for two UAV models: quadrotor and fixed-wing. Our findings highlight the effectiveness of RL in UAV control and planning, and further show that it is possible to train agents in sparse reward settings for UAVs. PyFlyt fills a gap in existing literature by providing a flexible and standardised platform for testing RL algorithms on UAVs. We believe that this will inspire more standardised research in this direction. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: Under Review for Transactions on Robotics

arXiv:2302.05206 [pdf, other]

The Wisdom of Hindsight Makes Language Models Better Instruction Followers

Authors: Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, Joseph E. Gonzalez

Abstract: Reinforcement learning has seen wide success in finetuning large language models to better align with instructions via human feedback. The so-called algorithm, Reinforcement Learning with Human Feedback (RLHF) demonstrates impressive performance on the GPT series models. However, the underlying Reinforcement Learning (RL) algorithm is complex and requires an additional training pipeline for reward… ▽ More Reinforcement learning has seen wide success in finetuning large language models to better align with instructions via human feedback. The so-called algorithm, Reinforcement Learning with Human Feedback (RLHF) demonstrates impressive performance on the GPT series models. However, the underlying Reinforcement Learning (RL) algorithm is complex and requires an additional training pipeline for reward and value networks. In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner. Such an algorithm doesn't require any additional parameters except for the original language model and maximally reuses the pretraining pipeline. To achieve this, we formulate instruction alignment problem for language models as a goal-reaching problem in decision making. We propose Hindsight Instruction Relabeling (HIR), a novel algorithm for aligning language models with instructions. The resulting two-stage algorithm shed light to a family of reward-free approaches that utilize the hindsightly relabeled instructions based on feedback. We evaluate the performance of HIR extensively on 12 challenging BigBench reasoning tasks and show that HIR outperforms the baseline algorithms and is comparable to or even surpasses supervised finetuning. △ Less

Submitted 10 February, 2023; originally announced February 2023.

arXiv:2302.01518 [pdf, other]

doi 10.1109/IJCNN54540.2023.10191236

LSA-PINN: Linear Boundary Connectivity Loss for Solving PDEs on Complex Geometry

Authors: Jian Cheng Wong, Pao-Hsiung Chiu, Chinchun Ooi, My Ha Dao, Yew-Soon Ong

Abstract: We present a novel loss formulation for efficient learning of complex dynamics from governing physics, typically described by partial differential equations (PDEs), using physics-informed neural networks (PINNs). In our experiments, existing versions of PINNs are seen to learn poorly in many problems, especially for complex geometries, as it becomes increasingly difficult to establish appropriate… ▽ More We present a novel loss formulation for efficient learning of complex dynamics from governing physics, typically described by partial differential equations (PDEs), using physics-informed neural networks (PINNs). In our experiments, existing versions of PINNs are seen to learn poorly in many problems, especially for complex geometries, as it becomes increasingly difficult to establish appropriate sampling strategy at the near boundary region. Overly dense sampling can adversely impede training convergence if the local gradient behaviors are too complex to be adequately modelled by PINNs. On the other hand, if the samples are too sparse, existing PINNs tend to overfit the near boundary region, leading to incorrect solution. To prevent such issues, we propose a new Boundary Connectivity (BCXN) loss function which provides linear local structure approximation (LSA) to the gradient behaviors at the boundary for PINN. Our BCXN-loss implicitly imposes local structure during training, thus facilitating fast physics-informed learning across entire problem domains with order of magnitude sparser training samples. This LSA-PINN method shows a few orders of magnitude smaller errors than existing methods in terms of the standard L2-norm metric, while using dramatically fewer training samples and iterations. Our proposed LSA-PINN does not pose any requirement on the differentiable property of the networks, and we demonstrate its benefits and ease of implementation on both multi-layer perceptron and convolutional neural network versions as commonly used in current PINN literature. △ Less

Submitted 2 March, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

Comments: 11 pages, 7 figures

Journal ref: 2023 International Joint Conference on Neural Networks (IJCNN)

arXiv:2302.00557 [pdf]

doi 10.1109/SSCI51031.2022.10022022

Graph Neural Network Based Surrogate Model of Physics Simulations for Geometry Design

Authors: Jian Cheng Wong, Chin Chun Ooi, Joyjit Chattoraj, Lucas Lestandi, Guoying Dong, Umesh Kizhakkinan, David William Rosen, Mark Hyunpong Jhon, My Ha Dao

Abstract: Computational Intelligence (CI) techniques have shown great potential as a surrogate model of expensive physics simulation, with demonstrated ability to make fast predictions, albeit at the expense of accuracy in some cases. For many scientific and engineering problems involving geometrical design, it is desirable for the surrogate models to precisely describe the change in geometry and predict th… ▽ More Computational Intelligence (CI) techniques have shown great potential as a surrogate model of expensive physics simulation, with demonstrated ability to make fast predictions, albeit at the expense of accuracy in some cases. For many scientific and engineering problems involving geometrical design, it is desirable for the surrogate models to precisely describe the change in geometry and predict the consequences. In that context, we develop graph neural networks (GNNs) as fast surrogate models for physics simulation, which allow us to directly train the models on 2/3D geometry designs that are represented by an unstructured mesh or point cloud, without the need for any explicit or hand-crafted parameterization. We utilize an encoder-processor-decoder-type architecture which can flexibly make prediction at both node level and graph level. The performance of our proposed GNN-based surrogate model is demonstrated on 2 example applications: feature designs in the domain of additive engineering and airfoil design in the domain of aerodynamics. The models show good accuracy in their predictions on a separate set of test geometries after training, with almost instant prediction speeds, as compared to O(hour) for the high-fidelity simulations required otherwise. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Comments: 7 pages, 5 figures, 2022 IEEE Symposium Series on Computational Intelligence

arXiv:2212.07624 [pdf, other]

doi 10.1145/3583133.3596397

Neuroevolution of Physics-Informed Neural Nets: Benchmark Problems and Comparative Results

Authors: Nicholas Sung Wei Yong, Jian Cheng Wong, Pao-Hsiung Chiu, Abhishek Gupta, Chinchun Ooi, Yew-Soon Ong

Abstract: The potential of learned models for fundamental scientific research and discovery is drawing increasing attention worldwide. Physics-informed neural networks (PINNs), where the loss function directly embeds governing equations of scientific phenomena, is one of the key techniques at the forefront of recent advances. PINNs are typically trained using stochastic gradient descent methods, akin to the… ▽ More The potential of learned models for fundamental scientific research and discovery is drawing increasing attention worldwide. Physics-informed neural networks (PINNs), where the loss function directly embeds governing equations of scientific phenomena, is one of the key techniques at the forefront of recent advances. PINNs are typically trained using stochastic gradient descent methods, akin to their deep learning counterparts. However, analysis in this paper shows that PINNs' unique loss formulations lead to a high degree of complexity and ruggedness that may not be conducive for gradient descent. Unlike in standard deep learning, PINN training requires globally optimum parameter values that satisfy physical laws as closely as possible. Spurious local optimum, indicative of erroneous physics, must be avoided. Hence, neuroevolution algorithms, with their superior global search capacity, may be a better choice for PINNs relative to gradient descent methods. Here, we propose a set of five benchmark problems, with open-source codes, spanning diverse physical phenomena for novel neuroevolution algorithm development. Using this, we compare two neuroevolution algorithms against the commonly used stochastic gradient descent, and our baseline results support the claim that neuroevolution can surpass gradient descent, ensuring better physics compliance in the predicted outputs. %Furthermore, implementing neuroevolution with JAX leads to orders of magnitude speedup relative to standard implementations. △ Less

Submitted 6 December, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: 11 pages, 6 figures, 4 tables

Journal ref: Proceedings of the Companion Conference on Genetic and Evolutionary Computation July 2023

arXiv:2211.13464 [pdf, other]

Design of Turing Systems with Physics-Informed Neural Networks

Authors: Jordon Kho, Winston Koh, Jian Cheng Wong, Pao-Hsiung Chiu, Chin Chun Ooi

Abstract: Reaction-diffusion (Turing) systems are fundamental to the formation of spatial patterns in nature and engineering. These systems are governed by a set of non-linear partial differential equations containing parameters that determine the rate of constituent diffusion and reaction. Critically, these parameters, such as diffusion coefficient, heavily influence the mode and type of the final pattern,… ▽ More Reaction-diffusion (Turing) systems are fundamental to the formation of spatial patterns in nature and engineering. These systems are governed by a set of non-linear partial differential equations containing parameters that determine the rate of constituent diffusion and reaction. Critically, these parameters, such as diffusion coefficient, heavily influence the mode and type of the final pattern, and quantitative characterization and knowledge of these parameters can aid in bio-mimetic design or understanding of real-world systems. However, the use of numerical methods to infer these parameters can be difficult and computationally expensive. Typically, adjoint solvers may be used, but they are frequently unstable for very non-linear systems. Alternatively, massive amounts of iterative forward simulations are used to find the best match, but this is extremely effortful. Recently, physics-informed neural networks have been proposed as a means for data-driven discovery of partial differential equations, and have seen success in various applications. Thus, we investigate the use of physics-informed neural networks as a tool to infer key parameters in reaction-diffusion systems in the steady-state for scientific discovery or design. Our proof-of-concept results show that the method is able to infer parameters for different pattern modes and types with errors of less than 10\%. In addition, the stochastic nature of this method can be exploited to provide multiple parameter alternatives to the desired pattern, highlighting the versatility of this method for bio-mimetic design. This work thus demonstrates the utility of physics-informed neural networks for inverse parameter inference of reaction-diffusion systems to enhance scientific discovery and design. △ Less

Submitted 24 November, 2022; originally announced November 2022.

arXiv:2211.12042 [pdf, other]

Robustness of Physics-Informed Neural Networks to Noise in Sensor Data

Authors: Jian Cheng Wong, Pao-Hsiung Chiu, Chin Chun Ooi, My Ha Da

Abstract: Physics-Informed Neural Networks (PINNs) have been shown to be an effective way of incorporating physics-based domain knowledge into neural network models for many important real-world systems. They have been particularly effective as a means of inferring system information based on data, even in cases where data is scarce. Most of the current work however assumes the availability of high-quality… ▽ More Physics-Informed Neural Networks (PINNs) have been shown to be an effective way of incorporating physics-based domain knowledge into neural network models for many important real-world systems. They have been particularly effective as a means of inferring system information based on data, even in cases where data is scarce. Most of the current work however assumes the availability of high-quality data. In this work, we further conduct a preliminary investigation of the robustness of physics-informed neural networks to the magnitude of noise in the data. Interestingly, our experiments reveal that the inclusion of physics in the neural network is sufficient to negate the impact of noise in data originating from hypothetical low quality sensors with high signal-to-noise ratios of up to 1. The resultant predictions for this test case are seen to still match the predictive value obtained for equivalent data obtained from high-quality sensors with potentially 10x less noise. This further implies the utility of physics-informed neural network modeling for making sense of data from sensor networks in the future, especially with the advent of Industry 4.0 and the increasing trend towards ubiquitous deployment of low-cost sensors which are typically noisier. △ Less

Submitted 22 November, 2022; originally announced November 2022.

arXiv:2211.12035 [pdf, other]

FastFlow: AI for Fast Urban Wind Velocity Prediction

Authors: Shi Jer Low, Venugopalan, S. G. Raghavan, Harish Gopalan, Jian Cheng Wong, Justin Yeoh, Chin Chun Ooi

Abstract: Data-driven approaches, including deep learning, have shown great promise as surrogate models across many domains. These extend to various areas in sustainability. An interesting direction for which data-driven methods have not been applied much yet is in the quick quantitative evaluation of urban layouts for planning and design. In particular, urban designs typically involve complex trade-offs be… ▽ More Data-driven approaches, including deep learning, have shown great promise as surrogate models across many domains. These extend to various areas in sustainability. An interesting direction for which data-driven methods have not been applied much yet is in the quick quantitative evaluation of urban layouts for planning and design. In particular, urban designs typically involve complex trade-offs between multiple objectives, including limits on urban build-up and/or consideration of urban heat island effect. Hence, it can be beneficial to urban planners to have a fast surrogate model to predict urban characteristics of a hypothetical layout, e.g. pedestrian-level wind velocity, without having to run computationally expensive and time-consuming high-fidelity numerical simulations. This fast surrogate can then be potentially integrated into other design optimization frameworks, including generative models or other gradient-based methods. Here we present the use of CNNs for urban layout characterization that is typically done via high-fidelity numerical simulation. We further apply this model towards a first demonstration of its utility for data-driven pedestrian-level wind velocity prediction. The data set in this work comprises results from high-fidelity numerical simulations of wind velocities for a diverse set of realistic urban layouts, based on randomized samples from a real-world, highly built-up urban city. We then provide prediction results obtained from the trained CNN, demonstrating test errors of under 0.1 m/s for previously unseen urban layouts. We further illustrate how this can be useful for purposes such as rapid evaluation of pedestrian wind velocity for a potential new layout. It is hoped that this data set will further accelerate research in data-driven urban AI, even as our baseline model facilitates quantitative comparison to future methods. △ Less

Submitted 22 November, 2022; originally announced November 2022.

arXiv:2210.11923 [pdf, other]

doi 10.1145/3627827

RollBack: A New Time-Agnostic Replay Attack Against the Automotive Remote Keyless Entry Systems

Authors: Levente Csikor, Hoon Wei Lim, Jun Wen Wong, Soundarya Ramesh, Rohini Poolat Parameswarath, Mun Choon Chan

Abstract: Today's RKE systems implement disposable rolling codes, making every key fob button press unique, effectively preventing simple replay attacks. However, a prior attack called RollJam was proven to break all rolling code-based systems in general. By a careful sequence of signal jamming, capturing, and replaying, an attacker can become aware of the subsequent valid unlock signal that has not been us… ▽ More Today's RKE systems implement disposable rolling codes, making every key fob button press unique, effectively preventing simple replay attacks. However, a prior attack called RollJam was proven to break all rolling code-based systems in general. By a careful sequence of signal jamming, capturing, and replaying, an attacker can become aware of the subsequent valid unlock signal that has not been used yet. RollJam, however, requires continuous deployment indefinitely until it is exploited. Otherwise, the captured signals become invalid if the key fob is used again without RollJam in place. We introduce RollBack, a new replay-and-resynchronize attack against most of today's RKE systems. In particular, we show that even though the one-time code becomes invalid in rolling code systems, replaying a few previously captured signals consecutively can trigger a rollback-like mechanism in the RKE system. Put differently, the rolling codes become resynchronized back to a previous code used in the past from where all subsequent yet already used signals work again. Moreover, the victim can still use the key fob without noticing any difference before and after the attack. Unlike RollJam, RollBack does not necessitate jamming at all. Furthermore, it requires signal capturing only once and can be exploited at any time in the future as many times as desired. This time-agnostic property is particularly attractive to attackers, especially in car-sharing/renting scenarios where accessing the key fob is straightforward. However, while RollJam defeats virtually any rolling code-based system, vehicles might have additional anti-theft measures against malfunctioning key fobs, hence against RollBack. Our ongoing analysis (covering Asian vehicle manufacturers for the time being) against different vehicle makes and models has revealed that ~70% of them are vulnerable to RollBack. △ Less

Submitted 14 September, 2022; originally announced October 2022.

Comments: 24 pages, 5 figures Under submission to a journal

Journal ref: ACM Transactions on Cyber-Physical Systems, 2024

arXiv:2209.00649 [pdf, other]

Addressing Hidden Imperfections in Online Experimentation

Authors: Jeffrey Wong, Jasmine Nettiksimmons, Jiannan Lu, Katherine Livins

Abstract: Technology companies are increasingly using randomized controlled trials (RCTs) as part of their development process. Despite having fine control over engineering systems and data instrumentation, these RCTs can still be imperfectly executed. In fact, online experimentation suffers from many of the same biases seen in biomedical RCTs including opt-in and user activity bias, selection bias, non-com… ▽ More Technology companies are increasingly using randomized controlled trials (RCTs) as part of their development process. Despite having fine control over engineering systems and data instrumentation, these RCTs can still be imperfectly executed. In fact, online experimentation suffers from many of the same biases seen in biomedical RCTs including opt-in and user activity bias, selection bias, non-compliance with the treatment, and more generally, challenges in the ability to test the question of interest. The result of these imperfections can lead to a bias in the estimated causal effect, a loss in statistical power, an attenuation of the effect, or even a need to reframe the question that can be answered. This paper aims to make practitioners of experimentation more aware of imperfections in technology-industry RCTs, which can be hidden throughout the engineering stack or in the design process. △ Less

Submitted 25 August, 2022; originally announced September 2022.

Comments: Presented at CODE@MIT 2021

arXiv:2208.13418 [pdf, other]

doi 10.1109/TVCG.2022.3209391

DPVisCreator: Incorporating Pattern Constraints to Privacy-preserving Visualizations via Differential Privacy

Authors: Jiehui Zhou, Xumeng Wang, Jason K. Wong, Huanliang Wang, Zhongwei Wang, Xiaoyu Yang, Xiaoran Yan, Haozhe Feng, Huamin Qu, Haochao Ying, Wei Chen

Abstract: Data privacy is an essential issue in publishing data visualizations. However, it is challenging to represent multiple data patterns in privacy-preserving visualizations. The prior approaches target specific chart types or perform an anonymization model uniformly without considering the importance of data patterns in visualizations. In this paper, we propose a visual analytics approach that facili… ▽ More Data privacy is an essential issue in publishing data visualizations. However, it is challenging to represent multiple data patterns in privacy-preserving visualizations. The prior approaches target specific chart types or perform an anonymization model uniformly without considering the importance of data patterns in visualizations. In this paper, we propose a visual analytics approach that facilitates data custodians to generate multiple private charts while maintaining user-preferred patterns. To this end, we introduce pattern constraints to model users' preferences over data patterns in the dataset and incorporate them into the proposed Bayesian network-based Differential Privacy (DP) model PriVis. A prototype system, DPVisCreator, is developed to assist data custodians in implementing our approach. The effectiveness of our approach is demonstrated with quantitative evaluation of pattern utility under the different levels of privacy protection, case studies, and semi-structured expert interviews. △ Less

Submitted 29 August, 2022; originally announced August 2022.

Comments: 9 pages, 5 figures

Journal ref: IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 1, pp. 809-819, Jan. 2023

arXiv:2208.12809 [pdf]

Incrementality Bidding and Attribution

Authors: Randall Lewis, Jeffrey Wong

Abstract: The causal effect of showing an ad to a potential customer versus not, commonly referred to as "incrementality", is the fundamental question of advertising effectiveness. In digital advertising three major puzzle pieces are central to rigorously quantifying advertising incrementality: ad buying/bidding/pricing, attribution, and experimentation. Building on the foundations of machine learning and c… ▽ More The causal effect of showing an ad to a potential customer versus not, commonly referred to as "incrementality", is the fundamental question of advertising effectiveness. In digital advertising three major puzzle pieces are central to rigorously quantifying advertising incrementality: ad buying/bidding/pricing, attribution, and experimentation. Building on the foundations of machine learning and causal econometrics, we propose a methodology that unifies these three concepts into a computationally viable model of both bidding and attribution which spans the randomization, training, cross validation, scoring, and conversion attribution of advertising's causal effects. Implementation of this approach is likely to secure a significant improvement in the return on investment of advertising. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: 40 pages

arXiv:2208.09237 [pdf, other]

CohortVA: A Visual Analytic System for Interactive Exploration of Cohorts based on Historical Data

Authors: Wei Zhang, Jason K. Wong, Xumeng Wang, Youcheng Gong, Rongchen Zhu, Kai Liu, Zihan Yan, Siwei Tan, Huamin Qu, Siming Chen, Wei Chen

Abstract: In history research, cohort analysis seeks to identify social structures and figure mobilities by studying the group-based behavior of historical figures. Prior works mainly employ automatic data mining approaches, lacking effective visual explanation. In this paper, we present CohortVA, an interactive visual analytic approach that enables historians to incorporate expertise and insight into the i… ▽ More In history research, cohort analysis seeks to identify social structures and figure mobilities by studying the group-based behavior of historical figures. Prior works mainly employ automatic data mining approaches, lacking effective visual explanation. In this paper, we present CohortVA, an interactive visual analytic approach that enables historians to incorporate expertise and insight into the iterative exploration process. The kernel of CohortVA is a novel identification model that generates candidate cohorts and constructs cohort features by means of pre-built knowledge graphs constructed from large-scale history databases. We propose a set of coordinated views to illustrate identified cohorts and features coupled with historical events and figure profiles. Two case studies and interviews with historians demonstrate that CohortVA can greatly enhance the capabilities of cohort identifications, figure authentications, and hypothesis generation. △ Less

Submitted 19 August, 2022; originally announced August 2022.

arXiv:2208.07479 [pdf, other]

Context-Aware Streaming Perception in Dynamic Environments

Authors: Gur-Eyal Sela, Ionel Gog, Justin Wong, Kumar Krishna Agrawal, Xiangxi Mo, Sukrit Kalra, Peter Schafhalter, Eric Leong, Xin Wang, Bharathan Balaji, Joseph Gonzalez, Ion Stoica

Abstract: Efficient vision works maximize accuracy under a latency budget. These works evaluate accuracy offline, one image at a time. However, real-time vision applications like autonomous driving operate in streaming settings, where ground truth changes between inference start and finish. This results in a significant accuracy drop. Therefore, a recent work proposed to maximize accuracy in streaming setti… ▽ More Efficient vision works maximize accuracy under a latency budget. These works evaluate accuracy offline, one image at a time. However, real-time vision applications like autonomous driving operate in streaming settings, where ground truth changes between inference start and finish. This results in a significant accuracy drop. Therefore, a recent work proposed to maximize accuracy in streaming settings on average. In this paper, we propose to maximize streaming accuracy for every environment context. We posit that scenario difficulty influences the initial (offline) accuracy difference, while obstacle displacement in the scene affects the subsequent accuracy degradation. Our method, Octopus, uses these scenario properties to select configurations that maximize streaming accuracy at test time. Our method improves tracking performance (S-MOTA) by 7.4% over the conventional static approach. Further, performance improvement using our method comes in addition to, and not instead of, advances in offline accuracy. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Comments: 26 pages, 10 figures, to be published in ECCV 2022

arXiv:2203.11903 [pdf]

Enabling faster and more reliable sonographic assessment of gestational age through machine learning

Authors: Chace Lee, Angelica Willis, Christina Chen, Marcin Sieniek, Akib Uddin, Jonny Wong, Rory Pilgrim, Katherine Chou, Daniel Tse, Shravya Shetty, Ryan G. Gomes

Abstract: Fetal ultrasounds are an essential part of prenatal care and can be used to estimate gestational age (GA). Accurate GA assessment is important for providing appropriate prenatal care throughout pregnancy and identifying complications such as fetal growth disorders. Since derivation of GA from manual fetal biometry measurements (head, abdomen, femur) are operator-dependent and time-consuming, there… ▽ More Fetal ultrasounds are an essential part of prenatal care and can be used to estimate gestational age (GA). Accurate GA assessment is important for providing appropriate prenatal care throughout pregnancy and identifying complications such as fetal growth disorders. Since derivation of GA from manual fetal biometry measurements (head, abdomen, femur) are operator-dependent and time-consuming, there have been a number of research efforts focused on using artificial intelligence (AI) models to estimate GA using standard biometry images, but there is still room to improve the accuracy and reliability of these AI systems for widescale adoption. To improve GA estimates, without significant change to provider workflows, we leverage AI to interpret standard plane ultrasound images as well as 'fly-to' ultrasound videos, which are 5-10s videos automatically recorded as part of the standard of care before the still image is captured. We developed and validated three AI models: an image model using standard plane images, a video model using fly-to videos, and an ensemble model (combining both image and video). All three were statistically superior to standard fetal biometry-based GA estimates derived by expert sonographers, the ensemble model has the lowest mean absolute error (MAE) compared to the clinical standard fetal biometry (mean difference: -1.51 $\pm$ 3.96 days, 95% CI [-1.9, -1.1]) on a test set that consisted of 404 participants. We showed that our models outperform standard biometry by a more substantial margin on fetuses that were small for GA. Our AI models have the potential to empower trained operators to estimate GA with higher accuracy while reducing the amount of time required and user variability in measurement acquisition. △ Less

Submitted 22 March, 2022; originally announced March 2022.

arXiv:2203.10139 [pdf]

AI system for fetal ultrasound in low-resource settings

Authors: Ryan G. Gomes, Bellington Vwalika, Chace Lee, Angelica Willis, Marcin Sieniek, Joan T. Price, Christina Chen, Margaret P. Kasaro, James A. Taylor, Elizabeth M. Stringer, Scott Mayer McKinney, Ntazana Sindano, George E. Dahl, William Goodnight III, Justin Gilmer, Benjamin H. Chi, Charles Lau, Terry Spitz, T Saensuksopa, Kris Liu, Jonny Wong, Rory Pilgrim, Akib Uddin, Greg Corrado, Lily Peng , et al. (4 additional authors not shown)

Abstract: Despite considerable progress in maternal healthcare, maternal and perinatal deaths remain high in low-to-middle income countries. Fetal ultrasound is an important component of antenatal care, but shortage of adequately trained healthcare workers has limited its adoption. We developed and validated an artificial intelligence (AI) system that uses novice-acquired "blind sweep" ultrasound videos to… ▽ More Despite considerable progress in maternal healthcare, maternal and perinatal deaths remain high in low-to-middle income countries. Fetal ultrasound is an important component of antenatal care, but shortage of adequately trained healthcare workers has limited its adoption. We developed and validated an artificial intelligence (AI) system that uses novice-acquired "blind sweep" ultrasound videos to estimate gestational age (GA) and fetal malpresentation. We further addressed obstacles that may be encountered in low-resourced settings. Using a simplified sweep protocol with real-time AI feedback on sweep quality, we have demonstrated the generalization of model performance to minimally trained novice ultrasound operators using low cost ultrasound devices with on-device AI integration. The GA model was non-inferior to standard fetal biometry estimates with as few as two sweeps, and the fetal malpresentation model had high AUC-ROCs across operators and devices. Our AI models have the potential to assist in upleveling the capabilities of lightly trained ultrasound operators in low resource settings. △ Less

Submitted 18 March, 2022; originally announced March 2022.

arXiv:2202.06941 [pdf, other]

doi 10.1088/1742-6596/2438/1/012121

Semi-Equivariant GNN Architectures for Jet Tagging

Authors: Daniel Murnane, Savannah Thais, Jason Wong

Abstract: Composing Graph Neural Networks (GNNs) of operations that respect physical symmetries has been suggested to give better model performance with a smaller number of learnable parameters. However, real-world applications, such as in high energy physics have not born this out. We present the novel architecture VecNet that combines both symmetry-respecting and unconstrained operations to study and tune… ▽ More Composing Graph Neural Networks (GNNs) of operations that respect physical symmetries has been suggested to give better model performance with a smaller number of learnable parameters. However, real-world applications, such as in high energy physics have not born this out. We present the novel architecture VecNet that combines both symmetry-respecting and unconstrained operations to study and tune the degree of physics-informed GNNs. We introduce a novel metric, the \textit{ant factor}, to quantify the resource-efficiency of each configuration in the search-space. We find that a generalized architecture such as ours can deliver optimal performance in resource-constrained applications. △ Less

Submitted 14 February, 2022; originally announced February 2022.

Comments: Proceedings submission to ACAT2021 Conference. 9 pages

arXiv:2112.12364 [pdf, other]

doi 10.1109/TVCG.2021.3128157

Explaining with Examples: Lessons Learned from Crowdsourced Introductory Description of Information Visualizations

Authors: Leni Yang, Cindy Xiong, Jason K. Wong, Aoyu Wu, Huamin Qu

Abstract: Data visualizations have been increasingly used in oral presentations to communicate data patterns to the general public. Clear verbal introductions of visualizations to explain how to interpret the visually encoded information are essential to convey the takeaways and avoid misunderstandings. We contribute a series of studies to investigate how to effectively introduce visualizations to the audie… ▽ More Data visualizations have been increasingly used in oral presentations to communicate data patterns to the general public. Clear verbal introductions of visualizations to explain how to interpret the visually encoded information are essential to convey the takeaways and avoid misunderstandings. We contribute a series of studies to investigate how to effectively introduce visualizations to the audience with varying degrees of visualization literacy. We begin with understanding how people are introducing visualizations. We crowdsource 110 introductions of visualizations and categorize them based on their content and structures. From these crowdsourced introductions, we identify different introduction strategies and generate a set of introductions for evaluation. We conduct experiments to systematically compare the effectiveness of different introduction strategies across four visualizations with 1,080 participants. We find that introductions explaining visual encodings with concrete examples are the most effective. Our study provides both qualitative and quantitative insights into how to construct effective verbal introductions of visualizations in presentations, inspiring further research in data storytelling. △ Less

Submitted 23 December, 2021; originally announced December 2021.

Comments: 12 pages, 5 figures, accepted to IEEE Transaction on Visualization and Graphics

arXiv:2112.05251 [pdf, other]

Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation

Authors: Josiah Wong, Albert Tung, Andrey Kurenkov, Ajay Mandlekar, Li Fei-Fei, Silvio Savarese, Roberto Martín-Martín

Abstract: In mobile manipulation (MM), robots can both navigate within and interact with their environment and are thus able to complete many more tasks than robots only capable of navigation or manipulation. In this work, we explore how to apply imitation learning (IL) to learn continuous visuo-motor policies for MM tasks. Much prior work has shown that IL can train visuo-motor policies for either manipula… ▽ More In mobile manipulation (MM), robots can both navigate within and interact with their environment and are thus able to complete many more tasks than robots only capable of navigation or manipulation. In this work, we explore how to apply imitation learning (IL) to learn continuous visuo-motor policies for MM tasks. Much prior work has shown that IL can train visuo-motor policies for either manipulation or navigation domains, but few works have applied IL to the MM domain. Doing this is challenging for two reasons: on the data side, current interfaces make collecting high-quality human demonstrations difficult, and on the learning side, policies trained on limited data can suffer from covariate shift when deployed. To address these problems, we first propose Mobile Manipulation RoboTurk (MoMaRT), a novel teleoperation framework allowing simultaneous navigation and manipulation of mobile manipulators, and collect a first-of-its-kind large scale dataset in a realistic simulated kitchen setting. We then propose a learned error detection system to address the covariate shift by detecting when an agent is in a potential failure state. We train performant IL policies and error detectors from this data, and achieve over 45% task success rate and 85% error detection success rate across multiple multi-stage tasks when trained on expert data. Codebase, datasets, visualization, and more available at https://sites.google.com/view/il-for-mm/home. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: CoRL 2021

arXiv:2110.15832 [pdf]

doi 10.1016/j.cma.2022.114909

CAN-PINN: A Fast Physics-Informed Neural Network Based on Coupled-Automatic-Numerical Differentiation Method

Authors: Pao-Hsiung Chiu, Jian Cheng Wong, Chinchun Ooi, My Ha Dao, Yew-Soon Ong

Abstract: In this study, novel physics-informed neural network (PINN) methods for coupling neighboring support points and their derivative terms which are obtained by automatic differentiation (AD), are proposed to allow efficient training with improved accuracy. The computation of differential operators required for PINNs loss evaluation at collocation points are conventionally obtained via AD. Although AD… ▽ More In this study, novel physics-informed neural network (PINN) methods for coupling neighboring support points and their derivative terms which are obtained by automatic differentiation (AD), are proposed to allow efficient training with improved accuracy. The computation of differential operators required for PINNs loss evaluation at collocation points are conventionally obtained via AD. Although AD has the advantage of being able to compute the exact gradients at any point, such PINNs can only achieve high accuracies with large numbers of collocation points, otherwise they are prone to optimizing towards unphysical solution. To make PINN training fast, the dual ideas of using numerical differentiation (ND)-inspired method and coupling it with AD are employed to define the loss function. The ND-based formulation for training loss can strongly link neighboring collocation points to enable efficient training in sparse sample regimes, but its accuracy is restricted by the interpolation scheme. The proposed coupled-automatic-numerical differentiation framework, labeled as can-PINN, unifies the advantages of AD and ND, providing more robust and efficient training than AD-based PINNs, while further improving accuracy by up to 1-2 orders of magnitude relative to ND-based PINNs. For a proof-of-concept demonstration of this can-scheme to fluid dynamic problems, two numerical-inspired instantiations of can-PINN schemes for the convection and pressure gradient terms were derived to solve the incompressible Navier-Stokes (N-S) equations. The superior performance of can-PINNs is demonstrated on several challenging problems, including the flow mixing phenomena, lid driven flow in a cavity, and channel flow over a backward facing step. The results reveal that for challenging problems like these, can-PINNs can consistently achieve very good accuracy whereas conventional AD-based PINNs fail. △ Less

Submitted 27 March, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

Comments: 25 pages, 20 figures

Journal ref: Computer Methods in Applied Mechanics and Engineering, Volume 395, 15 May 2022, 114909

arXiv:2110.00704 [pdf, other]

OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation

Authors: Josiah Wong, Viktor Makoviychuk, Anima Anandkumar, Yuke Zhu

Abstract: Learning performant robot manipulation policies can be challenging due to high-dimensional continuous actions and complex physics-based dynamics. This can be alleviated through intelligent choice of action space. Operational Space Control (OSC) has been used as an effective task-space controller for manipulation. Nonetheless, its strength depends on the underlying modeling fidelity, and is prone t… ▽ More Learning performant robot manipulation policies can be challenging due to high-dimensional continuous actions and complex physics-based dynamics. This can be alleviated through intelligent choice of action space. Operational Space Control (OSC) has been used as an effective task-space controller for manipulation. Nonetheless, its strength depends on the underlying modeling fidelity, and is prone to failure when there are modeling errors. In this work, we propose OSC for Adaptation and Robustness (OSCAR), a data-driven variant of OSC that compensates for modeling errors by inferring relevant dynamics parameters from online trajectories. OSCAR decomposes dynamics learning into task-agnostic and task-specific phases, decoupling the dynamics dependencies of the robot and the extrinsics due to its environment. This structure enables robust zero-shot performance under out-of-distribution and rapid adaptation to significant domain shifts through additional finetuning. We evaluate our method on a variety of simulated manipulation problems, and find substantial improvements over an array of controller baselines. For more results and information, please visit https://cremebrule.github.io/oscar-web/. △ Less

Submitted 1 October, 2021; originally announced October 2021.

arXiv:2109.11140 [pdf, other]

Joint speaker diarisation and tracking in switching state-space model

Authors: Jeremy H. M. Wong, Yifan Gong

Abstract: Speakers may move around while diarisation is being performed. When a microphone array is used, the instantaneous locations of where the sounds originated from can be estimated, and previous investigations have shown that such information can be complementary to speaker embeddings in the diarisation task. However, these approaches often assume that speakers are fairly stationary throughout a meeti… ▽ More Speakers may move around while diarisation is being performed. When a microphone array is used, the instantaneous locations of where the sounds originated from can be estimated, and previous investigations have shown that such information can be complementary to speaker embeddings in the diarisation task. However, these approaches often assume that speakers are fairly stationary throughout a meeting. This paper relaxes this assumption, by proposing to explicitly track the movements of speakers while jointly performing diarisation within a unified model. A state-space model is proposed, where the hidden state expresses the identity of the current active speaker and the predicted locations of all speakers. The model is implemented as a particle filter. Experiments on a Microsoft rich meeting transcription task show that the proposed joint location tracking and diarisation approach is able to perform comparably with other methods that use location information. △ Less

Submitted 23 September, 2021; originally announced September 2021.

arXiv:2109.10598 [pdf, other]

Diarisation using location tracking with agglomerative clustering

Authors: Jeremy H. M. Wong, Igor Abramovski, Xiong Xiao, Yifan Gong

Abstract: Previous works have shown that spatial location information can be complementary to speaker embeddings for a speaker diarisation task. However, the models used often assume that speakers are fairly stationary throughout a meeting. This paper proposes to relax this assumption, by explicitly modelling the movements of speakers within an Agglomerative Hierarchical Clustering (AHC) diarisation framewo… ▽ More Previous works have shown that spatial location information can be complementary to speaker embeddings for a speaker diarisation task. However, the models used often assume that speakers are fairly stationary throughout a meeting. This paper proposes to relax this assumption, by explicitly modelling the movements of speakers within an Agglomerative Hierarchical Clustering (AHC) diarisation framework. Kalman filters, which track the locations of speakers, are used to compute log-likelihood ratios that contribute to the cluster affinity computations for the AHC merging and stopping decisions. Experiments show that the proposed approach is able to yield improvements on a Microsoft rich meeting transcription task, compared to methods that do not use location information or that make stationarity assumptions. △ Less

Submitted 23 September, 2021; v1 submitted 22 September, 2021; originally announced September 2021.

arXiv:2109.09338 [pdf]

doi 10.1109/TAI.2022.3192362

Learning in Sinusoidal Spaces with Physics-Informed Neural Networks

Authors: Jian Cheng Wong, Chinchun Ooi, Abhishek Gupta, Yew-Soon Ong

Abstract: A physics-informed neural network (PINN) uses physics-augmented loss functions, e.g., incorporating the residual term from governing partial differential equations (PDEs), to ensure its output is consistent with fundamental physics laws. However, it turns out to be difficult to train an accurate PINN model for many problems in practice. In this paper, we present a novel perspective of the merits o… ▽ More A physics-informed neural network (PINN) uses physics-augmented loss functions, e.g., incorporating the residual term from governing partial differential equations (PDEs), to ensure its output is consistent with fundamental physics laws. However, it turns out to be difficult to train an accurate PINN model for many problems in practice. In this paper, we present a novel perspective of the merits of learning in sinusoidal spaces with PINNs. By analyzing behavior at model initialization, we first show that a PINN of increasing expressiveness induces an initial bias around flat output functions. Notably, this initial solution can be very close to satisfying many physics PDEs, i.e., falling into a local minimum of the PINN loss that only minimizes PDE residuals, while still being far from the true solution that jointly minimizes PDE residuals and the initial and/or boundary conditions. It is difficult for gradient descent optimization to escape from such a local minimum trap, often causing the training to stall. We then prove that the sinusoidal mapping of inputs, in an architecture we label as sf-PINN, is effective to increase input gradient variability, thus avoiding being trapped in such deceptive local minimum. The level of variability can be effectively modulated to match high-frequency patterns in the problem at hand. A key facet of this paper is the comprehensive empirical study that demonstrates the efficacy of learning in sinusoidal spaces with PINNs for a wide range of forward and inverse modelling problems spanning multiple physics domains. △ Less

Submitted 14 March, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

Comments: 16 pages, 13 figures

Journal ref: IEEE Transactions on Artificial Intelligence, 2022

Showing 1–50 of 83 results for author: Wong, J