subscribe to arXiv mailings

SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models

Authors: Yang Zhou, Yongjian Wu, Jiya Saiyin, Bingzheng Wei, Maode Lai, Eric Chang, Yan Xu

Abstract: Prompt tuning methods have achieved remarkable success in parameter-efficient fine-tuning on large pre-trained models. However, their application to dual-modal fusion-based visual-language pre-trained models (VLPMs), such as GLIP, has encountered issues. Existing prompt tuning methods have not effectively addressed the modal mapping and aligning problem for tokens in different modalities, leading… ▽ More Prompt tuning methods have achieved remarkable success in parameter-efficient fine-tuning on large pre-trained models. However, their application to dual-modal fusion-based visual-language pre-trained models (VLPMs), such as GLIP, has encountered issues. Existing prompt tuning methods have not effectively addressed the modal mapping and aligning problem for tokens in different modalities, leading to poor transfer generalization. To address this issue, we propose Synchronous Dual Prompt Tuning (SDPT). SDPT initializes a single set of learnable unified prototype tokens in the established modal aligning space to represent the aligned semantics of text and image modalities for downstream tasks. Furthermore, SDPT establishes inverse linear projections that require no training to embed the information of unified prototype tokens into the input space of different modalities. The inverse linear projections allow the unified prototype token to synchronously represent the two modalities and enable SDPT to share the unified semantics of text and image for downstream tasks across different modal prompts. Experimental results demonstrate that SDPT assists fusion-based VLPMs to achieve superior outcomes with only 0.04\% of model parameters for training across various scenarios, outperforming other single- or dual-modal methods. The code will be released at https://github.com/wuyongjianCODE/SDPT. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV 2024

arXiv:2407.10021 [pdf, other]

Document-level Clinical Entity and Relation Extraction via Knowledge Base-Guided Generation

Authors: Kriti Bhattarai, Inez Y. Oh, Zachary B. Abrams, Albert M. Lai

Abstract: Generative pre-trained transformer (GPT) models have shown promise in clinical entity and relation extraction tasks because of their precise extraction and contextual understanding capability. In this work, we further leverage the Unified Medical Language System (UMLS) knowledge base to accurately identify medical concepts and improve clinical entity and relation extraction at the document level.… ▽ More Generative pre-trained transformer (GPT) models have shown promise in clinical entity and relation extraction tasks because of their precise extraction and contextual understanding capability. In this work, we further leverage the Unified Medical Language System (UMLS) knowledge base to accurately identify medical concepts and improve clinical entity and relation extraction at the document level. Our framework selects UMLS concepts relevant to the text and combines them with prompts to guide language models in extracting entities. Our experiments demonstrate that this initial concept mapping and the inclusion of these mapped concepts in the prompts improves extraction results compared to few-shot extraction tasks on generic language models that do not leverage UMLS. Further, our results show that this approach is more effective than the standard Retrieval Augmented Generation (RAG) technique, where retrieved data is compared with prompt embeddings to generate results. Overall, we find that integrating UMLS concepts with GPT models significantly improves entity and relation identification, outperforming the baseline and RAG models. By combining the precise concept mapping capability of knowledge-based approaches like UMLS with the contextual understanding capability of GPT, our method highlights the potential of these approaches in specialized domains like healthcare. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: Accepted at Association for Computational Linguistics BioNLP 2024

arXiv:2407.08800 [pdf, other]

Local Clustering for Lung Cancer Image Classification via Sparse Solution Technique

Authors: Jackson Hamel, Ming-Jun Lai, Zhaiming Shen, Ye Tian

Abstract: In this work, we propose to use a local clustering approach based on the sparse solution technique to study the medical image, especially the lung cancer image classification task. We view images as the vertices in a weighted graph and the similarity between a pair of images as the edges in the graph. The vertices within the same cluster can be assumed to share similar features and properties, thu… ▽ More In this work, we propose to use a local clustering approach based on the sparse solution technique to study the medical image, especially the lung cancer image classification task. We view images as the vertices in a weighted graph and the similarity between a pair of images as the edges in the graph. The vertices within the same cluster can be assumed to share similar features and properties, thus making the applications of graph clustering techniques very useful for image classification. Recently, the approach based on the sparse solutions of linear systems for graph clustering has been found to identify clusters more efficiently than traditional clustering methods such as spectral clustering. We propose to use the two newly developed local clustering methods based on sparse solution of linear system for image classification. In addition, we employ a box spline-based tight-wavelet-framelet method to clean these images and help build a better adjacency matrix before clustering. The performance of our methods is shown to be very effective in classifying images. Our approach is significantly more efficient and either favorable or equally effective compared with other state-of-the-art approaches. Finally, we shall make a remark by pointing out two image deformation methods to build up more artificial image data to increase the number of labeled images. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2406.18556 [pdf]

Renal digital pathology visual knowledge search platform based on language large model and book knowledge

Authors: Xiaomin Lv, Chong Lai, Liya Ding, Maode Lai, Qingrong Sun

Abstract: Large models have become mainstream, yet their applications in digital pathology still require exploration. Meanwhile renal pathology images play an important role in the diagnosis of renal diseases. We conducted image segmentation and paired corresponding text descriptions based on 60 books for renal pathology, clustering analysis for all image and text description features based on large models,… ▽ More Large models have become mainstream, yet their applications in digital pathology still require exploration. Meanwhile renal pathology images play an important role in the diagnosis of renal diseases. We conducted image segmentation and paired corresponding text descriptions based on 60 books for renal pathology, clustering analysis for all image and text description features based on large models, ultimately building a retrieval system based on the semantic features of large models. Based above analysis, we established a knowledge base of 10,317 renal pathology images and paired corresponding text descriptions, and then we evaluated the semantic feature capabilities of 4 large models, including GPT2, gemma, LLma and Qwen, and the image-based feature capabilities of dinov2 large model. Furthermore, we built a semantic retrieval system to retrieve pathological images based on text descriptions, and named RppD (aidp.zjsru.edu.cn). △ Less

Submitted 26 May, 2024; originally announced June 2024.

Comments: 9 pages, 6 figures

arXiv:2405.03060 [pdf, other]

Tree-based Ensemble Learning for Out-of-distribution Detection

Authors: Zhaiming Shen, Menglun Wang, Guang Cheng, Ming-Jun Lai, Lin Mu, Ruihao Huang, Qi Liu, Hao Zhu

Abstract: Being able to successfully determine whether the testing samples has similar distribution as the training samples is a fundamental question to address before we can safely deploy most of the machine learning models into practice. In this paper, we propose TOOD detection, a simple yet effective tree-based out-of-distribution (TOOD) detection mechanism to determine if a set of unseen samples will ha… ▽ More Being able to successfully determine whether the testing samples has similar distribution as the training samples is a fundamental question to address before we can safely deploy most of the machine learning models into practice. In this paper, we propose TOOD detection, a simple yet effective tree-based out-of-distribution (TOOD) detection mechanism to determine if a set of unseen samples will have similar distribution as of the training samples. The TOOD detection mechanism is based on computing pairwise hamming distance of testing samples' tree embeddings, which are obtained by fitting a tree-based ensemble model through in-distribution training samples. Our approach is interpretable and robust for its tree-based nature. Furthermore, our approach is efficient, flexible to various machine learning tasks, and can be easily generalized to unsupervised setting. Extensive experiments are conducted to show the proposed method outperforms other state-of-the-art out-of-distribution detection methods in distinguishing the in-distribution from out-of-distribution on various tabular, image, and text data. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2403.11211 [pdf]

RCdpia: A Renal Carcinoma Digital Pathology Image Annotation dataset based on pathologists

Authors: Qingrong Sun, Weixiang Zhong, Jie Zhou, Chong Lai, Xiaodong Teng, Maode Lai

Abstract: The annotation of digital pathological slide data for renal cell carcinoma is of paramount importance for correct diagnosis of artificial intelligence models due to the heterogeneous nature of the tumor. This process not only facilitates a deeper understanding of renal cell cancer heterogeneity but also aims to minimize noise in the data for more accurate studies. To enhance the applicability of t… ▽ More The annotation of digital pathological slide data for renal cell carcinoma is of paramount importance for correct diagnosis of artificial intelligence models due to the heterogeneous nature of the tumor. This process not only facilitates a deeper understanding of renal cell cancer heterogeneity but also aims to minimize noise in the data for more accurate studies. To enhance the applicability of the data, two pathologists were enlisted to meticulously curate, screen, and label a kidney cancer pathology image dataset from The Cancer Genome Atlas Program (TCGA) database. Subsequently, a Resnet model was developed to validate the annotated dataset against an additional dataset from the First Affiliated Hospital of Zhejiang University. Based on these results, we have meticulously compiled the TCGA digital pathological dataset with independent labeling of tumor regions and adjacent areas (RCdpia), which includes 109 cases of kidney chromophobe cell carcinoma, 486 cases of kidney clear cell carcinoma, and 292 cases of kidney papillary cell carcinoma. This dataset is now publicly accessible at http://39.171.241.18:8888/RCdpia/. Furthermore, model analysis has revealed significant discrepancies in predictive outcomes when applying the same model to datasets from different centers. Leveraging the RCdpia, we can now develop more precise digital pathology artificial intelligence models for tasks such as normalization, classification, and segmentation. These advancements underscore the potential for more nuanced and accurate AI applications in the field of digital pathology. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 8 pages, 3 figures, 1 table

arXiv:2403.00473 [pdf, other]

Computer-Controlled 3D Freeform Surface Weaving

Authors: Xiangjia Chen, Lip M. Lai, Zishun Liu, Chengkai Dai, Isaac C. W. Leung, Charlie C. L. Wang, Yeung Yam

Abstract: In this paper, we present a new computer-controlled weaving technology that enables the fabrication of woven structures in the shape of given 3D surfaces by using threads in non-traditional materials with high bending-stiffness, allowing for multiple applications with the resultant woven fabrics. A new weaving machine and a new manufacturing process are developed to realize the function of 3D surf… ▽ More In this paper, we present a new computer-controlled weaving technology that enables the fabrication of woven structures in the shape of given 3D surfaces by using threads in non-traditional materials with high bending-stiffness, allowing for multiple applications with the resultant woven fabrics. A new weaving machine and a new manufacturing process are developed to realize the function of 3D surface weaving by the principle of short-row shaping. A computational solution is investigated to convert input 3D freeform surfaces into the corresponding weaving operations (indicated as W-code) to guide the operation of this system. A variety of examples using cotton threads, conductive threads and optical fibres are fabricated by our prototype system to demonstrate its functionality. △ Less

Submitted 8 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.15391 [pdf, other]

Genie: Generative Interactive Environments

Authors: Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, Tim Rocktäschel

Abstract: We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotem… ▽ More We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: https://sites.google.com/corp/view/genie-2024/

arXiv:2402.05554 [pdf]

doi 10.1016/j.ultrasmedbio.2023.10.009

One-Stop Automated Diagnostic System for Carpal Tunnel Syndrome in Ultrasound Images Using Deep Learning

Authors: Jiayu Peng, Jiajun Zeng, Manlin Lai, Ruobing Huang, Dong Ni, Zhenzhou Li

Abstract: Objective: Ultrasound (US) examination has unique advantages in diagnosing carpal tunnel syndrome (CTS) while identifying the median nerve (MN) and diagnosing CTS depends heavily on the expertise of examiners. To alleviate this problem, we aimed to develop a one-stop automated CTS diagnosis system (OSA-CTSD) and evaluate its effectiveness as a computer-aided diagnostic tool. Methods: We combined r… ▽ More Objective: Ultrasound (US) examination has unique advantages in diagnosing carpal tunnel syndrome (CTS) while identifying the median nerve (MN) and diagnosing CTS depends heavily on the expertise of examiners. To alleviate this problem, we aimed to develop a one-stop automated CTS diagnosis system (OSA-CTSD) and evaluate its effectiveness as a computer-aided diagnostic tool. Methods: We combined real-time MN delineation, accurate biometric measurements, and explainable CTS diagnosis into a unified framework, called OSA-CTSD. We collected a total of 32,301 static images from US videos of 90 normal wrists and 40 CTS wrists for evaluation using a simplified scanning protocol. Results: The proposed model showed better segmentation and measurement performance than competing methods, reporting that HD95 score of 7.21px, ASSD score of 2.64px, Dice score of 85.78%, and IoU score of 76.00%, respectively. In the reader study, it demonstrated comparable performance with the average performance of the experienced in classifying the CTS, while outperformed that of the inexperienced radiologists in terms of classification metrics (e.g., accuracy score of 3.59% higher and F1 score of 5.85% higher). Conclusion: The OSA-CTSD demonstrated promising diagnostic performance with the advantages of real-time, automation, and clinical interpretability. The application of such a tool can not only reduce reliance on the expertise of examiners, but also can help to promote the future standardization of the CTS diagnosis process, benefiting both patients and radiologists. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: Accepted by Ultrasound in Medicine & Biology

Journal ref: Ultrasound in Medicine & Biology, Volume 50, Issue 2, February 2024, Pages 304-314

arXiv:2309.17403 [pdf, other]

Maximal Volume Matrix Cross Approximation for Image Compression and Least Squares Solution

Authors: Kenneth Allen, Ming-Jun Lai, Zhaiming Shen

Abstract: We study the classic cross approximation of matrices based on the maximal volume submatrices. Our main results consist of an improvement of a classic estimate for matrix cross approximation and a greedy approach for finding the maximal volume submatrices. Indeed, we present a new proof of a classic estimate of the inequality with an improved constant. Also, we present a family of greedy maximal vo… ▽ More We study the classic cross approximation of matrices based on the maximal volume submatrices. Our main results consist of an improvement of a classic estimate for matrix cross approximation and a greedy approach for finding the maximal volume submatrices. Indeed, we present a new proof of a classic estimate of the inequality with an improved constant. Also, we present a family of greedy maximal volume algorithms which improve the error bound of cross approximation of a matrix in the Chebyshev norm and also improve the computational efficiency of classic maximal volume algorithm. The proposed algorithms are shown to have theoretical guarantees of convergence. Finally, we present two applications: one is image compression and the other is least squares approximation of continuous functions. Our numerical results in the end of the paper demonstrate the effective performances of our approach. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.07394 [pdf, other]

doi 10.1109/TMI.2023.3309971

Nucleus-aware Self-supervised Pretraining Using Unpaired Image-to-image Translation for Histopathology Images

Authors: Zhiyun Song, Penghui Du, Junpeng Yan, Kailu Li, Jianzhong Shou, Maode Lai, Yubo Fan, Yan Xu

Abstract: Self-supervised pretraining attempts to enhance model performance by obtaining effective features from unlabeled data, and has demonstrated its effectiveness in the field of histopathology images. Despite its success, few works concentrate on the extraction of nucleus-level information, which is essential for pathologic analysis. In this work, we propose a novel nucleus-aware self-supervised pretr… ▽ More Self-supervised pretraining attempts to enhance model performance by obtaining effective features from unlabeled data, and has demonstrated its effectiveness in the field of histopathology images. Despite its success, few works concentrate on the extraction of nucleus-level information, which is essential for pathologic analysis. In this work, we propose a novel nucleus-aware self-supervised pretraining framework for histopathology images. The framework aims to capture the nuclear morphology and distribution information through unpaired image-to-image translation between histopathology images and pseudo mask images. The generation process is modulated by both conditional and stochastic style representations, ensuring the reality and diversity of the generated histopathology images for pretraining. Further, an instance segmentation guided strategy is employed to capture instance-level information. The experiments on 7 datasets show that the proposed pretraining method outperforms supervised ones on Kather classification, multiple instance learning, and 5 dense-prediction tasks with the transfer learning protocol, and yields superior results than other self-supervised approaches on 8 semi-supervised tasks. Our project is publicly available at https://github.com/zhiyuns/UNITPathSSL. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2308.09175 [pdf, other]

Diversifying AI: Towards Creative Chess with AlphaZero

Authors: Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh

Abstract: In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations. This work explores whether AI can benefit from creative decision-making mechanisms when pushed to the limits of its computational rationality. In par… ▽ More In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations. This work explores whether AI can benefit from creative decision-making mechanisms when pushed to the limits of its computational rationality. In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones. We study this question in the game of chess, the so-called drosophila of AI. We build on AlphaZero (AZ) and extend it to represent a league of agents via a latent-conditioned architecture, which we call AZ_db. We train AZ_db to generate a wider range of ideas using behavioral diversity techniques and select the most promising ones with sub-additive planning. Our experiments suggest that AZ_db plays chess in diverse ways, solves more puzzles as a group and outperforms a more homogeneous team. Notably, AZ_db solves twice as many challenging puzzles as AZ, including the challenging Penrose positions. When playing chess from different openings, we notice that players in AZ_db specialize in different openings, and that selecting a player for each opening using sub-additive planning results in a 50 Elo improvement over AZ. Our findings suggest that diversity bonuses emerge in teams of AI agents, just as they do in teams of humans and that diversity is a valuable asset in solving computationally hard problems. △ Less

Submitted 29 August, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.06709 [pdf, other]

The Hard-Constraint PINNs for Interface Optimal Control Problems

Authors: Ming-Chih Lai, Yongcun Song, Xiaoming Yuan, Hangrui Yue, Tianyou Zeng

Abstract: We show that the physics-informed neural networks (PINNs), in combination with some recently developed discontinuity capturing neural networks, can be applied to solve optimal control problems subject to partial differential equations (PDEs) with interfaces and some control constraints. The resulting algorithm is mesh-free and scalable to different PDEs, and it ensures the control constraints rigo… ▽ More We show that the physics-informed neural networks (PINNs), in combination with some recently developed discontinuity capturing neural networks, can be applied to solve optimal control problems subject to partial differential equations (PDEs) with interfaces and some control constraints. The resulting algorithm is mesh-free and scalable to different PDEs, and it ensures the control constraints rigorously. Since the boundary and interface conditions, as well as the PDEs, are all treated as soft constraints by lumping them into a weighted loss function, it is necessary to learn them simultaneously and there is no guarantee that the boundary and interface conditions can be satisfied exactly. This immediately causes difficulties in tuning the weights in the corresponding loss function and training the neural networks. To tackle these difficulties and guarantee the numerical accuracy, we propose to impose the boundary and interface conditions as hard constraints in PINNs by developing a novel neural network architecture. The resulting hard-constraint PINNs approach guarantees that both the boundary and interface conditions can be satisfied exactly and they are decoupled from the learning of the PDEs. Its efficiency is promisingly validated by some elliptic and parabolic interface optimal control problems. △ Less

Submitted 13 August, 2023; originally announced August 2023.

arXiv:2306.17659 [pdf, other]

Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Authors: Yongjian Wu, Yang Zhou, Jiya Saiyin, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yubo Fan, Yan Xu

Abstract: Large-scale visual-language pre-trained models (VLPM) have proven their excellent performance in downstream object detection for natural scenes. However, zero-shot nuclei detection on H\&E images via VLPMs remains underexplored. The large gap between medical images and the web-originated text-image pairs used for pre-training makes it a challenging task. In this paper, we attempt to explore the po… ▽ More Large-scale visual-language pre-trained models (VLPM) have proven their excellent performance in downstream object detection for natural scenes. However, zero-shot nuclei detection on H\&E images via VLPMs remains underexplored. The large gap between medical images and the web-originated text-image pairs used for pre-training makes it a challenging task. In this paper, we attempt to explore the potential of the object-level VLPM, Grounded Language-Image Pre-training (GLIP) model, for zero-shot nuclei detection. Concretely, an automatic prompts design pipeline is devised based on the association binding trait of VLPM and the image-to-text VLPM BLIP, avoiding empirical manual prompts engineering. We further establish a self-training framework, using the automatically designed prompts to generate the preliminary results as pseudo labels from GLIP and refine the predicted boxes in an iterative manner. Our method achieves a remarkable performance for label-free nuclei detection, surpassing other comparison methods. Foremost, our work demonstrates that the VLPM pre-trained on natural image-text pairs exhibits astonishing potential for downstream tasks in the medical field as well. Code will be released at https://github.com/wuyongjianCODE/VLPMNuD. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: This article has been accepted by MICCAI 2023,but has not been fully edited. Content may change prior to final publication

arXiv:2306.05537 [pdf, other]

AaKOS: Aspect-adaptive Knowledge-based Opinion Summarization

Authors: Guan Wang, Weihua Li, Edmund M-K. Lai, Quan Bai

Abstract: The rapid growth of information on the Internet has led to an overwhelming amount of opinions and comments on various activities, products, and services. This makes it difficult and time-consuming for users to process all the available information when making decisions. Text summarization, a Natural Language Processing (NLP) task, has been widely explored to help users quickly retrieve relevant in… ▽ More The rapid growth of information on the Internet has led to an overwhelming amount of opinions and comments on various activities, products, and services. This makes it difficult and time-consuming for users to process all the available information when making decisions. Text summarization, a Natural Language Processing (NLP) task, has been widely explored to help users quickly retrieve relevant information by generating short and salient content from long or multiple documents. Recent advances in pre-trained language models, such as ChatGPT, have demonstrated the potential of Large Language Models (LLMs) in text generation. However, LLMs require massive amounts of data and resources and are challenging to implement as offline applications. Furthermore, existing text summarization approaches often lack the ``adaptive" nature required to capture diverse aspects in opinion summarization, which is particularly detrimental to users with specific requirements or preferences. In this paper, we propose an Aspect-adaptive Knowledge-based Opinion Summarization model for product reviews, which effectively captures the adaptive nature required for opinion summarization. The model generates aspect-oriented summaries given a set of reviews for a particular product, efficiently providing users with useful information on specific aspects they are interested in, ensuring the generated summaries are more personalized and informative. Extensive experiments have been conducted using real-world datasets to evaluate the proposed model. The results demonstrate that our model outperforms state-of-the-art approaches and is adaptive and efficient in generating summaries that focus on particular aspects, enabling users to make well-informed decisions and catering to their diverse interests and preferences. △ Less

Submitted 25 May, 2023; originally announced June 2023.

Comments: 21 pages, 4 figures, 7 tables

arXiv:2306.02691 [pdf, other]

doi 10.1109/TMI.2023.3275609

Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Authors: Yang Zhou, Yongjian Wu, Zihua Wang, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yubo Fan, Yan Xu

Abstract: Nuclei instance segmentation on histopathology images is of great clinical value for disease analysis. Generally, fully-supervised algorithms for this task require pixel-wise manual annotations, which is especially time-consuming and laborious for the high nuclei density. To alleviate the annotation burden, we seek to solve the problem through image-level weakly supervised learning, which is under… ▽ More Nuclei instance segmentation on histopathology images is of great clinical value for disease analysis. Generally, fully-supervised algorithms for this task require pixel-wise manual annotations, which is especially time-consuming and laborious for the high nuclei density. To alleviate the annotation burden, we seek to solve the problem through image-level weakly supervised learning, which is underexplored for nuclei instance segmentation. Compared with most existing methods using other weak annotations (scribble, point, etc.) for nuclei instance segmentation, our method is more labor-saving. The obstacle to using image-level annotations in nuclei instance segmentation is the lack of adequate location information, leading to severe nuclei omission or overlaps. In this paper, we propose a novel image-level weakly supervised method, called cyclic learning, to solve this problem. Cyclic learning comprises a front-end classification task and a back-end semi-supervised instance segmentation task to benefit from multi-task learning (MTL). We utilize a deep learning classifier with interpretability as the front-end to convert image-level labels to sets of high-confidence pseudo masks and establish a semi-supervised architecture as the back-end to conduct nuclei instance segmentation under the supervision of these pseudo masks. Most importantly, cyclic learning is designed to circularly share knowledge between the front-end classifier and the back-end semi-supervised part, which allows the whole system to fully extract the underlying information from image-level labels and converge to a better optimum. Experiments on three datasets demonstrate the good generality of our method, which outperforms other image-level weakly supervised methods for nuclei instance segmentation, and achieves comparable performance to fully-supervised methods. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI https://doi.org/10.1109/TMI.2023.3275609, IEEE Transactions on Medical Imaging. Code: https://github.com/wuyongjianCODE/Cyclic

arXiv:2304.14339 [pdf, other]

MarsEclipse at SemEval-2023 Task 3: Multi-Lingual and Multi-Label Framing Detection with Contrastive Learning

Authors: Qisheng Liao, Meiting Lai, Preslav Nakov

Abstract: This paper describes our system for SemEval-2023 Task 3 Subtask 2 on Framing Detection. We used a multi-label contrastive loss for fine-tuning large pre-trained language models in a multi-lingual setting, achieving very competitive results: our system was ranked first on the official test set and on the official shared task leaderboard for five of the six languages for which we had training data a… ▽ More This paper describes our system for SemEval-2023 Task 3 Subtask 2 on Framing Detection. We used a multi-label contrastive loss for fine-tuning large pre-trained language models in a multi-lingual setting, achieving very competitive results: our system was ranked first on the official test set and on the official shared task leaderboard for five of the six languages for which we had training data and for which we could perform fine-tuning. Here, we describe our experimental setup, as well as various ablation studies. The code of our system is available at https://github.com/QishengL/SemEval2023 △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: framing, contrastive learning, SemEval-2023 task 3

MSC Class: 68T50 ACM Class: F.2.2; I.2.7

Journal ref: SemEval-2023

arXiv:2301.09175 [pdf, other]

Ensemble Transfer Learning for Multilingual Coreference Resolution

Authors: Tuan Manh Lai, Heng Ji

Abstract: Entity coreference resolution is an important research problem with many applications, including information extraction and question answering. Coreference resolution for English has been studied extensively. However, there is relatively little work for other languages. A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data. To overcome… ▽ More Entity coreference resolution is an important research problem with many applications, including information extraction and question answering. Coreference resolution for English has been studied extensively. However, there is relatively little work for other languages. A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data. To overcome this challenge, we design a simple but effective ensemble-based framework that combines various transfer learning (TL) techniques. We first train several models using different TL methods. Then, during inference, we compute the unweighted average scores of the models' predictions to extract the final set of predicted clusters. Furthermore, we also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts. Leveraging the idea that the coreferential links naturally exist between anchor texts pointing to the same article, our method builds a sizeable distantly-supervised dataset for the target language that consists of tens of thousands of documents. We can pre-train a model on the pseudo-labeled dataset before finetuning it on the final target dataset. Experimental results on two benchmark datasets, OntoNotes and SemEval, confirm the effectiveness of our methods. Our best ensembles consistently outperform the baseline approach of simple training by up to 7.68% in the F1 score. These ensembles also achieve new state-of-the-art results for three languages: Arabic, Dutch, and Spanish. △ Less

Submitted 22 January, 2023; originally announced January 2023.

arXiv:2211.11114 [pdf, other]

Semi-supervised Local Cluster Extraction by Compressive Sensing

Authors: Zhaiming Shen, Ming-Jun Lai, Sheng Li

Abstract: Local clustering problem aims at extracting a small local structure inside a graph without the necessity of knowing the entire graph structure. As the local structure is usually small in size compared to the entire graph, one can think of it as a compressive sensing problem where the indices of target cluster can be thought as a sparse solution to a linear system. In this paper, we propose a new s… ▽ More Local clustering problem aims at extracting a small local structure inside a graph without the necessity of knowing the entire graph structure. As the local structure is usually small in size compared to the entire graph, one can think of it as a compressive sensing problem where the indices of target cluster can be thought as a sparse solution to a linear system. In this paper, we propose a new semi-supervised local cluster extraction approach by applying the idea of compressive sensing based on two pioneering works under the same framework. Our approves improves the existing works by making the initial cut to be the entire graph and hence overcomes a major limitation of existing works, which is the low quality of initial cut. Extensive experimental results on multiple benchmark datasets demonstrate the effectiveness of our approach. △ Less

Submitted 20 November, 2022; originally announced November 2022.

arXiv:2210.08424 [pdf, ps, other]

doi 10.1016/j.jcp.2023.112359

A cusp-capturing PINN for elliptic interface problems

Authors: Yu-Hau Tseng, Te-Sheng Lin, Wei-Fan Hu, Ming-Chih Lai

Abstract: In this paper, we propose a cusp-capturing physics-informed neural network (PINN) to solve discontinuous-coefficient elliptic interface problems whose solution is continuous but has discontinuous first derivatives on the interface. To find such a solution using neural network representation, we introduce a cusp-enforced level set function as an additional feature input to the network to retain the… ▽ More In this paper, we propose a cusp-capturing physics-informed neural network (PINN) to solve discontinuous-coefficient elliptic interface problems whose solution is continuous but has discontinuous first derivatives on the interface. To find such a solution using neural network representation, we introduce a cusp-enforced level set function as an additional feature input to the network to retain the inherent solution properties; that is, capturing the solution cusps (where the derivatives are discontinuous) sharply. In addition, the proposed neural network has the advantage of being mesh-free, so it can easily handle problems in irregular domains. We train the network using the physics-informed framework in which the loss function comprises the residual of the differential equation together with certain interface and boundary conditions. We conduct a series of numerical experiments to demonstrate the effectiveness of the cusp-capturing technique and the accuracy of the present network model. Numerical results show that even using a one-hidden-layer (shallow) network with a moderate number of neurons and sufficient training data points, the present network model can achieve prediction accuracy comparable with traditional methods. Besides, if the solution is discontinuous across the interface, we can simply incorporate an additional supervised learning task for solution jump approximation into the present network without much difficulty. △ Less

Submitted 16 April, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

arXiv:2210.05523 [pdf, other]

doi 10.4208/cicp.OA-2022-0284

An efficient neural-network and finite-difference hybrid method for elliptic interface problems with applications

Authors: Wei-Fan Hu, Te-Sheng Lin, Yu-Hau Tseng, Ming-Chih Lai

Abstract: A new and efficient neural-network and finite-difference hybrid method is developed for solving Poisson equation in a regular domain with jump discontinuities on embedded irregular interfaces. Since the solution has low regularity across the interface, when applying finite difference discretization to this problem, an additional treatment accounting for the jump discontinuities must be employed. H… ▽ More A new and efficient neural-network and finite-difference hybrid method is developed for solving Poisson equation in a regular domain with jump discontinuities on embedded irregular interfaces. Since the solution has low regularity across the interface, when applying finite difference discretization to this problem, an additional treatment accounting for the jump discontinuities must be employed. Here, we aim to elevate such an extra effort to ease our implementation by machine learning methodology. The key idea is to decompose the solution into singular and regular parts. The neural network learning machinery incorporating the given jump conditions finds the singular solution, while the standard five-point Laplacian discretization is used to obtain the regular solution with associated boundary conditions. Regardless of the interface geometry, these two tasks only require supervised learning for function approximation and a fast direct solver for Poisson equation, making the hybrid method easy to implement and efficient. The two- and three-dimensional numerical results show that the present hybrid method preserves second-order accuracy for the solution and its derivatives, and it is comparable with the traditional immersed interface method in the literature. As an application, we solve the Stokes equations with singular forces to demonstrate the robustness of the present method. △ Less

Submitted 2 March, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

Journal ref: Commun. Comput. Phys., Vol. 33, pp.1090-1105 (2023)

arXiv:2207.10652 [pdf, other]

O-Dang! The Ontology of Dangerous Speech Messages

Authors: Marco A. Stranisci, Simona Frenda, Mirko Lai, Oscar Araque, Alessandra T. Cignarella, Valerio Basile, Viviana Patti, Cristina Bosco

Abstract: Inside the NLP community there is a considerable amount of language resources created, annotated and released every day with the aim of studying specific linguistic phenomena. Despite a variety of attempts in order to organize such resources has been carried on, a lack of systematic methods and of possible interoperability between resources are still present. Furthermore, when storing linguistic i… ▽ More Inside the NLP community there is a considerable amount of language resources created, annotated and released every day with the aim of studying specific linguistic phenomena. Despite a variety of attempts in order to organize such resources has been carried on, a lack of systematic methods and of possible interoperability between resources are still present. Furthermore, when storing linguistic information, still nowadays, the most common practice is the concept of "gold standard", which is in contrast with recent trends in NLP that aim at stressing the importance of different subjectivities and points of view when training machine learning and deep learning methods. In this paper we present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG) for the collection of linguistic annotated data. O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community. The ontology has also been designed to account for a perspectivist approach, since it provides a model for encoding both gold standard and single-annotator labels in the KG. The paper is structured as follows. In Section 1 the motivations of our work are outlined. Section 2 describes the O-Dang! Ontology, that provides a common semantic model for the integration of datasets in the KG. The Ontology Population stage with information about corpora, users, and annotations is presented in Section 3. Finally, in Section 4 an analysis of offensiveness across corpora is provided as a first case study for the resource. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2206.10945 [pdf, ps, other]

Improve Sensing and Communication Performance of UAV via Integrated Sensing and Communication

Authors: Wangjun Jiang, Ailing Wang, Zhiqing Wei, Meichen Lai, Meichen Lai, Zhiyong Feng, Jianjun Liu

Abstract: The unmanned aerial vehicle (UAV) needs to sense the environment to ensure safe flight, and the sensing accuracy and communication delay performance are two important indicators of safe flight. The strategy of using integrated sensing and communication (ISAC) technology to improve the sensing and communication performance is proposed in this paper. On the one hand, the extended kalman filter (EKF)… ▽ More The unmanned aerial vehicle (UAV) needs to sense the environment to ensure safe flight, and the sensing accuracy and communication delay performance are two important indicators of safe flight. The strategy of using integrated sensing and communication (ISAC) technology to improve the sensing and communication performance is proposed in this paper. On the one hand, the extended kalman filter (EKF) algorithm is adopted to achieve the fusion of communication location information and sensing information to improve the accuracy of target sensing. On the other hand, a Identification Friend or Foe (IFF) method based on ISAC is proposed to reduce communication delay. Compared with the traditional IFF method, the integrated technology used for IFF can realize the radar sensing and communication interrogating functions in parallel, greatly shortening the sensing time. Simulation results show that using ISAC technology, the sensing performance of UAV has been greatly improved, the communication delay can be reduced by up to 50%, the accuracy of target sensing can be improved by 24.2% when communication location information and radar sensing information have the same sensing accuracy. △ Less

Submitted 22 June, 2022; originally announced June 2022.

arXiv:2205.08878 [pdf, other]

Transformer based multiple instance learning for weakly supervised histopathology image segmentation

Authors: Ziniu Qian, Kailu Li, Maode Lai, Eric I-Chao Chang, Bingzheng Wei, Yubo Fan, Yan Xu

Abstract: Hispathological image segmentation algorithms play a critical role in computer aided diagnosis technology. The development of weakly supervised segmentation algorithm alleviates the problem of medical image annotation that it is time-consuming and labor-intensive. As a subset of weakly supervised learning, Multiple Instance Learning (MIL) has been proven to be effective in segmentation. However, t… ▽ More Hispathological image segmentation algorithms play a critical role in computer aided diagnosis technology. The development of weakly supervised segmentation algorithm alleviates the problem of medical image annotation that it is time-consuming and labor-intensive. As a subset of weakly supervised learning, Multiple Instance Learning (MIL) has been proven to be effective in segmentation. However, there is a lack of related information between instances in MIL, which limits the further improvement of segmentation performance. In this paper, we propose a novel weakly supervised method for pixel-level segmentation in histopathology images, which introduces Transformer into the MIL framework to capture global or long-range dependencies. The multi-head self-attention in the Transformer establishes the relationship between instances, which solves the shortcoming that instances are independent of each other in MIL. In addition, deep supervision is introduced to overcome the limitation of annotations in weakly supervised methods and make the better utilization of hierarchical information. The state-of-the-art results on the colon cancer dataset demonstrate the superiority of the proposed method compared with other weakly supervised methods. It is worth believing that there is a potential of our approach for various applications in medical images. △ Less

Submitted 18 May, 2022; originally announced May 2022.

Comments: Provisional accepted for MICCAI 2022

arXiv:2203.03546 [pdf, other]

LMN at SemEval-2022 Task 11: A Transformer-based System for English Named Entity Recognition

Authors: Ngoc Minh Lai

Abstract: Processing complex and ambiguous named entities is a challenging research problem, but it has not received sufficient attention from the natural language processing community. In this short paper, we present our participation in the English track of SemEval-2022 Task 11: Multilingual Complex Named Entity Recognition. Inspired by the recent advances in pretrained Transformer language models, we pro… ▽ More Processing complex and ambiguous named entities is a challenging research problem, but it has not received sufficient attention from the natural language processing community. In this short paper, we present our participation in the English track of SemEval-2022 Task 11: Multilingual Complex Named Entity Recognition. Inspired by the recent advances in pretrained Transformer language models, we propose a simple yet effective Transformer-based baseline for the task. Despite its simplicity, our proposed approach shows competitive results in the leaderboard as we ranked 12 over 30 teams. Our system achieved a macro F1 score of 72.50% on the held-out test set. We have also explored a data augmentation approach using entity linking. While the approach does not improve the final performance, we also discuss it in this paper. △ Less

Submitted 13 February, 2022; originally announced March 2022.

Comments: SemEval 2022 (co-located with NAACL)

arXiv:2203.01581 [pdf, other]

A shallow physics-informed neural network for solving partial differential equations on surfaces

Authors: Wei-Fan Hu, Yi-Jun Shih, Te-Sheng Lin, Ming-Chih Lai

Abstract: In this paper, we introduce a shallow (one-hidden-layer) physics-informed neural network for solving partial differential equations on static and evolving surfaces. For the static surface case, with the aid of level set function, the surface normal and mean curvature used in the surface differential expressions can be computed easily. So instead of imposing the normal extension constraints used in… ▽ More In this paper, we introduce a shallow (one-hidden-layer) physics-informed neural network for solving partial differential equations on static and evolving surfaces. For the static surface case, with the aid of level set function, the surface normal and mean curvature used in the surface differential expressions can be computed easily. So instead of imposing the normal extension constraints used in literature, we write the surface differential operators in the form of traditional Cartesian differential operators and use them in the loss function directly. We perform a series of performance study for the present methodology by solving Laplace-Beltrami equation and surface diffusion equation on complex static surfaces. With just a moderate number of neurons used in the hidden layer, we are able to attain satisfactory prediction results. Then we extend the present methodology to solve the advection-diffusion equation on an evolving surface with given velocity. To track the surface, we additionally introduce a prescribed hidden layer to enforce the topological structure of the surface and use the network to learn the homeomorphism between the surface and the prescribed topology. The proposed network structure is designed to track the surface and solve the equation simultaneously. Again, the numerical results show comparable accuracy as the static cases. As an application, we simulate the surfactant transport on the droplet surface under shear flow and obtain some physically plausible results. △ Less

Submitted 20 January, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

arXiv:2202.13404 [pdf, other]

Improving Candidate Retrieval with Entity Profile Generation for Wikidata Entity Linking

Authors: Tuan Manh Lai, Heng Ji, ChengXiang Zhai

Abstract: Entity linking (EL) is the task of linking entity mentions in a document to referent entities in a knowledge base (KB). Many previous studies focus on Wikipedia-derived KBs. There is little work on EL over Wikidata, even though it is the most extensive crowdsourced KB. The scale of Wikidata can open up many new real-world applications, but its massive number of entities also makes EL challenging.… ▽ More Entity linking (EL) is the task of linking entity mentions in a document to referent entities in a knowledge base (KB). Many previous studies focus on Wikipedia-derived KBs. There is little work on EL over Wikidata, even though it is the most extensive crowdsourced KB. The scale of Wikidata can open up many new real-world applications, but its massive number of entities also makes EL challenging. To effectively narrow down the search space, we propose a novel candidate retrieval paradigm based on entity profiling. Wikidata entities and their textual fields are first indexed into a text search engine (e.g., Elasticsearch). During inference, given a mention and its context, we use a sequence-to-sequence (seq2seq) model to generate the profile of the target entity, which consists of its title and description. We use the profile to query the indexed search engine to retrieve candidate entities. Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary, enabling us to further design a highly effective hybrid method for candidate retrieval. Combined with a simple cross-attention reranker, our complete EL framework achieves state-of-the-art results on three Wikidata-based datasets and strong performance on TACKBP-2010. △ Less

Submitted 14 March, 2022; v1 submitted 27 February, 2022; originally announced February 2022.

Comments: ACL 2022 (Findings)

arXiv:2202.02904 [pdf, other]

A Compressed Sensing Based Least Squares Approach to Semi-supervised Local Cluster Extraction

Authors: Ming-Jun Lai, Zhaiming Shen

Abstract: A least squares semi-supervised local clustering algorithm based on the idea of compressed sensing is proposed to extract clusters from a graph with known adjacency matrix. The algorithm is based on a two-stage approach similar to the one in \cite{LaiMckenzie2020}. However, under a weaker assumption and with less computational complexity than the one in \cite{LaiMckenzie2020}, the algorithm is sho… ▽ More A least squares semi-supervised local clustering algorithm based on the idea of compressed sensing is proposed to extract clusters from a graph with known adjacency matrix. The algorithm is based on a two-stage approach similar to the one in \cite{LaiMckenzie2020}. However, under a weaker assumption and with less computational complexity than the one in \cite{LaiMckenzie2020}, the algorithm is shown to be able to find a desired cluster with high probability. The ``one cluster at a time" feature of our method distinguishes it from other global clustering methods. Several numerical experiments are conducted on the synthetic data such as stochastic block model and real data such as MNIST, political blogs network, AT\&T and YaleB human faces data sets to demonstrate the effectiveness and efficiency of our algorithm. △ Less

Submitted 31 October, 2022; v1 submitted 6 February, 2022; originally announced February 2022.

arXiv:2112.08366 [pdf, other]

doi 10.1109/BIBM52615.2021.9669314

AGMI: Attention-Guided Multi-omics Integration for Drug Response Prediction with Graph Neural Networks

Authors: Ruiwei Feng, Yufeng Xie, Minshan Lai, Danny Z. Chen, Ji Cao, Jian Wu

Abstract: Accurate drug response prediction (DRP) is a crucial yet challenging task in precision medicine. This paper presents a novel Attention-Guided Multi-omics Integration (AGMI) approach for DRP, which first constructs a Multi-edge Graph (MeG) for each cell line, and then aggregates multi-omics features to predict drug response using a novel structure, called Graph edge-aware Network (GeNet). For the f… ▽ More Accurate drug response prediction (DRP) is a crucial yet challenging task in precision medicine. This paper presents a novel Attention-Guided Multi-omics Integration (AGMI) approach for DRP, which first constructs a Multi-edge Graph (MeG) for each cell line, and then aggregates multi-omics features to predict drug response using a novel structure, called Graph edge-aware Network (GeNet). For the first time, our AGMI approach explores gene constraint based multi-omics integration for DRP with the whole-genome using GNNs. Empirical experiments on the CCLE and GDSC datasets show that our AGMI largely outperforms state-of-the-art DRP methods by 8.3%--34.2% on four metrics. Our data and code are available at https://github.com/yivan-WYYGDSG/AGMI. △ Less

Submitted 9 January, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

arXiv:2109.12832 [pdf, ps, other]

Anti-collision Technologies for Unmanned Aerial Vehicles: Recent Advances and Future Trends

Authors: Zhiqing Wei, Zeyang Meng, Meichen Lai, Huici Wu, Jiarong Han, Zhiyong Feng

Abstract: Unmanned aerial vehicles (UAVs) are widely applied in civil applications, such as disaster relief, agriculture and cargo transportation, etc. With the massive number of UAV flight activities, the anti-collision technologies aiming to avoid the collisions between UAVs and other objects have attracted much attention. The anti-collision technologies are of vital importance to guarantee the survivabil… ▽ More Unmanned aerial vehicles (UAVs) are widely applied in civil applications, such as disaster relief, agriculture and cargo transportation, etc. With the massive number of UAV flight activities, the anti-collision technologies aiming to avoid the collisions between UAVs and other objects have attracted much attention. The anti-collision technologies are of vital importance to guarantee the survivability and safety of UAVs. In this article, a comprehensive survey on UAV anti-collision technologies is presented. We firstly introduce laws and regulations on UAV safety which prevent collision at the policy level. Then, the process of anti-collision technologies are reviewed from three aspects, i.e., obstacle sensing, collision prediction, and collision avoidance. We provide detailed survey and comparison of the methods of each aspect and analyze their pros and cons. Besides, the future trends on UAV anti-collision technologies are presented from the perspective of fast obstacle sensing and fast wireless networking. Finally, we summarize this article. △ Less

Submitted 1 March, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

Comments: 32 pages, 7 figures and 9 tables

MSC Class: 93-02 ACM Class: A.1

arXiv:2108.09889 [pdf, other]

A Unified Transformer-based Framework for Duplex Text Normalization

Authors: Tuan Manh Lai, Yang Zhang, Evelina Bakhturina, Boris Ginsburg, Heng Ji

Abstract: Text normalization (TN) and inverse text normalization (ITN) are essential preprocessing and postprocessing steps for text-to-speech synthesis and automatic speech recognition, respectively. Many methods have been proposed for either TN or ITN, ranging from weighted finite-state transducers to neural networks. Despite their impressive performance, these methods aim to tackle only one of the two ta… ▽ More Text normalization (TN) and inverse text normalization (ITN) are essential preprocessing and postprocessing steps for text-to-speech synthesis and automatic speech recognition, respectively. Many methods have been proposed for either TN or ITN, ranging from weighted finite-state transducers to neural networks. Despite their impressive performance, these methods aim to tackle only one of the two tasks but not both. As a result, in a complete spoken dialog system, two separate models for TN and ITN need to be built. This heterogeneity increases the technical complexity of the system, which in turn increases the cost of maintenance in a production setting. Motivated by this observation, we propose a unified framework for building a single neural duplex system that can simultaneously handle TN and ITN. Combined with a simple but effective data augmentation method, our systems achieve state-of-the-art results on the Google TN dataset for English and Russian. They can also reach over 95% sentence-level accuracy on an internal English TN dataset without any additional fine-tuning. In addition, we also create a cleaned dataset from the Spoken Wikipedia Corpora for German and report the performance of our systems on the dataset. Overall, experimental results demonstrate the proposed duplex text normalization framework is highly effective and applicable to a range of domains and languages △ Less

Submitted 22 August, 2021; originally announced August 2021.

Comments: Under Review

arXiv:2108.04174 [pdf]

doi 10.1093/jamiaopen/ooab052

Machine learning for modeling the progression of Alzheimer disease dementia using clinical data: a systematic literature review

Authors: Sayantan Kumar, Inez Oh, Suzanne Schindler, Albert M Lai, Philip R O Payne, Aditi Gupta

Abstract: Objective Alzheimer disease (AD) is the most common cause of dementia, a syndrome characterized by cognitive impairment severe enough to interfere with activities of daily life. We aimed to conduct a systematic literature review (SLR) of studies that applied machine learning (ML) methods to clinical data derived from electronic health records in order to model risk for progression of AD dementia.… ▽ More Objective Alzheimer disease (AD) is the most common cause of dementia, a syndrome characterized by cognitive impairment severe enough to interfere with activities of daily life. We aimed to conduct a systematic literature review (SLR) of studies that applied machine learning (ML) methods to clinical data derived from electronic health records in order to model risk for progression of AD dementia. Materials and Methods: We searched for articles published between January 1, 2010, and May 31, 2020, in PubMed, Scopus, ScienceDirect, IEEE Explore Digital Library, Association for Computing Machinery Digital Library, and arXiv. We used predefined criteria to select relevant articles and summarized them according to key components of ML analysis such as data characteristics, computational algorithms, and research focus. Results: There has been a considerable rise over the past 5 years in the number of research papers using ML-based analysis for AD dementia modeling. We reviewed 64 relevant articles in our SLR. The results suggest that majority of existing research has focused on predicting progression of AD dementia using publicly available datasets containing both neuroimaging and clinical data (neurobehavioral status exam scores, patient demographics, neuroimaging data, and laboratory test values). Discussion: Identifying individuals at risk for progression of AD dementia could potentially help to personalize disease management to plan future care. Clinical data consisting of both structured data tables and clinical notes can be effectively used in ML-based approaches to model risk for AD dementia progression. Data sharing and reproducibility of results can enhance the impact, adaptation, and generalizability of this research. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: 10 pages, 4 figures, 3 tables

Journal ref: JAMIA Open, Volume 4, Issue 3, July 2021, ooab052

arXiv:2107.12013 [pdf, other]

doi 10.1016/j.jcp.2022.111547

A Shallow Ritz Method for Elliptic Problems with Singular Sources

Authors: Ming-Chih Lai, Che-Chia Chang, Wei-Syuan Lin, Wei-Fan Hu, Te-Sheng Lin

Abstract: In this paper, a shallow Ritz-type neural network for solving elliptic equations with delta function singular sources on an interface is developed. There are three novel features in the present work; namely, (i) the delta function singularity is naturally removed, (ii) level set function is introduced as a feature input, (iii) it is completely shallow, comprising only one hidden layer. We first in… ▽ More In this paper, a shallow Ritz-type neural network for solving elliptic equations with delta function singular sources on an interface is developed. There are three novel features in the present work; namely, (i) the delta function singularity is naturally removed, (ii) level set function is introduced as a feature input, (iii) it is completely shallow, comprising only one hidden layer. We first introduce the energy functional of the problem and then transform the contribution of singular sources to a regular surface integral along the interface. In such a way, the delta function singularity can be naturally removed without introducing a discrete one that is commonly used in traditional regularization methods, such as the well-known immersed boundary method. The original problem is then reformulated as a minimization problem. We propose a shallow Ritz-type neural network with one hidden layer to approximate the global minimizer of the energy functional. As a result, the network is trained by minimizing the loss function that is a discrete version of the energy. In addition, we include the level set function of the interface as a feature input of the network and find that it significantly improves the training efficiency and accuracy. We perform a series of numerical tests to show the accuracy of the present method and its capability for problems in irregular domains and higher dimensions. △ Less

Submitted 1 July, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

Journal ref: J. Comput. Phys., Vol.469 (2022) 111547

arXiv:2107.01700 [pdf, other]

End-to-end Neural Coreference Resolution Revisited: A Simple yet Effective Baseline

Authors: Tuan Manh Lai, Trung Bui, Doo Soon Kim

Abstract: Since the first end-to-end neural coreference resolution model was introduced, many extensions to the model have been proposed, ranging from using higher-order inference to directly optimizing evaluation metrics using reinforcement learning. Despite improving the coreference resolution performance by a large margin, these extensions add substantial extra complexity to the original model. Motivated… ▽ More Since the first end-to-end neural coreference resolution model was introduced, many extensions to the model have been proposed, ranging from using higher-order inference to directly optimizing evaluation metrics using reinforcement learning. Despite improving the coreference resolution performance by a large margin, these extensions add substantial extra complexity to the original model. Motivated by this observation and the recent advances in pre-trained Transformer language models, we propose a simple yet effective baseline for coreference resolution. Even though our model is a simplified version of the original neural coreference resolution model, it achieves impressive performance, outperforming all recent extended works on the public English OntoNotes benchmark. Our work provides evidence for the necessity of carefully justifying the complexity of existing or newly proposed models, as introducing a conceptual or practical simplification to an existing model can still yield competitive results. △ Less

Submitted 8 February, 2022; v1 submitted 4 July, 2021; originally announced July 2021.

Comments: Accepted by ICASSP 2022

arXiv:2106.05587 [pdf, other]

doi 10.1016/j.jcp.2022.111576

A Discontinuity Capturing Shallow Neural Network for Elliptic Interface Problems

Authors: Wei-Fan Hu, Te-Sheng Lin, Ming-Chih Lai

Abstract: In this paper, a new Discontinuity Capturing Shallow Neural Network (DCSNN) for approximating $d$-dimensional piecewise continuous functions and for solving elliptic interface problems is developed. There are three novel features in the present network; namely, (i) jump discontinuities are accurately captured, (ii) it is completely shallow, comprising only one hidden layer, (iii) it is completely… ▽ More In this paper, a new Discontinuity Capturing Shallow Neural Network (DCSNN) for approximating $d$-dimensional piecewise continuous functions and for solving elliptic interface problems is developed. There are three novel features in the present network; namely, (i) jump discontinuities are accurately captured, (ii) it is completely shallow, comprising only one hidden layer, (iii) it is completely mesh-free for solving partial differential equations. The crucial idea here is that a $d$-dimensional piecewise continuous function can be extended to a continuous function defined in $(d+1)$-dimensional space, where the augmented coordinate variable labels the pieces of each sub-domain. We then construct a shallow neural network to express this new function. Since only one hidden layer is employed, the number of training parameters (weights and biases) scales linearly with the dimension and the neurons used in the hidden layer. For solving elliptic interface problems, the network is trained by minimizing the mean square error loss that consists of the residual of the governing equation, boundary condition, and the interface jump conditions. We perform a series of numerical tests to demonstrate the accuracy of the present network. Our DCSNN model is efficient due to only a moderate number of parameters needed to be trained (a few hundred parameters used throughout all numerical examples), and the results indicate good accuracy. Compared with the results obtained by the traditional grid-based immersed interface method (IIM), which is designed particularly for elliptic interface problems, our network model shows a better accuracy than IIM. We conclude by solving a six-dimensional problem to demonstrate the capability of the present network for high-dimensional applications. △ Less

Submitted 30 August, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

Journal ref: J. Comput. Phys., Vol. 469 (2022) 111576

arXiv:2010.11980 [pdf, other]

A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents

Authors: Tuan Manh Lai, Trung Bui, Doo Soon Kim, Quan Hung Tran

Abstract: Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Most existing benchmark datasets for the task typically have limited numbers of annotated documents, making it challenging to train increasingly complex neural networks. In contrast, digital libraries store millions of scientific articles online, covering a wide range of topics. While a significant… ▽ More Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Most existing benchmark datasets for the task typically have limited numbers of annotated documents, making it challenging to train increasingly complex neural networks. In contrast, digital libraries store millions of scientific articles online, covering a wide range of topics. While a significant portion of these articles contain keyphrases provided by their authors, most other articles lack such kind of annotations. Therefore, to effectively utilize these large amounts of unlabeled articles, we propose a simple and efficient joint learning approach based on the idea of self-distillation. Experimental results show that our approach consistently improves the performance of baseline models for keyphrase extraction. Furthermore, our best models outperform previous methods for the task, achieving new state-of-the-art results on two public benchmarks: Inspec and SemEval-2017. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: Accepted to COLING 2020

arXiv:2007.14936 [pdf, other]

doi 10.3233/JIFS-179895

#Brexit: Leave or Remain? The Role of User's Community and Diachronic Evolution on Stance Detection

Authors: Mirko Lai, Viviana Patti, Giancarlo Ruffo, Paolo Rosso

Abstract: Interest has grown around the classification of stance that users assume within online debates in recent years. Stance has been usually addressed by considering users posts in isolation, while social studies highlight that social communities may contribute to influence users' opinion. Furthermore, stance should be studied in a diachronic perspective, since it could help to shed light on users' opi… ▽ More Interest has grown around the classification of stance that users assume within online debates in recent years. Stance has been usually addressed by considering users posts in isolation, while social studies highlight that social communities may contribute to influence users' opinion. Furthermore, stance should be studied in a diachronic perspective, since it could help to shed light on users' opinion shift dynamics that can be recorded during the debate. We analyzed the political discussion in UK about the BREXIT referendum on Twitter, proposing a novel approach and annotation schema for stance detection, with the main aim of investigating the role of features related to social network community and diachronic stance evolution. Classification experiments show that such features provide very useful clues for detecting stance. △ Less

Submitted 29 July, 2020; originally announced July 2020.

Comments: To appear in Journal of Intelligent & Fuzzy Systems

arXiv:2007.09534 [pdf, other]

A Quasi-Orthogonal Matching Pursuit Algorithm for Compressive Sensing

Authors: Ming-Jun Lai, Zhaiming Shen

Abstract: In this paper, we propose a new orthogonal matching pursuit algorithm called quasi-OMP algorithm which greatly enhances the performance of classical orthogonal matching pursuit (OMP) algorithm, at some cost of computational complexity. We are able to show that under some sufficient conditions of mutual coherence of the sensing matrix, the QOMP Algorithm succeeds in recovering the s-sparse signal v… ▽ More In this paper, we propose a new orthogonal matching pursuit algorithm called quasi-OMP algorithm which greatly enhances the performance of classical orthogonal matching pursuit (OMP) algorithm, at some cost of computational complexity. We are able to show that under some sufficient conditions of mutual coherence of the sensing matrix, the QOMP Algorithm succeeds in recovering the s-sparse signal vector x within s iterations where a total number of 2s columns are selected under the both noiseless and noisy settings. In addition, we show that for Gaussian sensing matrix, the norm of the residual of each iteration will go to zero linearly depends on the size of the matrix with high probability. The numerical experiments are demonstrated to show the effectiveness of QOMP algorithm in recovering sparse solutions which outperforms the classic OMP and GOMP algorithm. △ Less

Submitted 18 July, 2020; originally announced July 2020.

arXiv:2007.07161 [pdf, ps, other]

Graph Sparsification by Universal Greedy Algorithms

Authors: Ming-Jun Lai, Jiaxin Xie, Zhiqiang Xu

Abstract: Graph sparsification is to approximate an arbitrary graph by a sparse graph and is useful in many applications, such as simplification of social networks, least squares problems, numerical solution of symmetric positive definite linear systems and etc. In this paper, inspired by the well-known sparse signal recovery algorithm called orthogonal matching pursuit (OMP), we introduce a deterministic,… ▽ More Graph sparsification is to approximate an arbitrary graph by a sparse graph and is useful in many applications, such as simplification of social networks, least squares problems, numerical solution of symmetric positive definite linear systems and etc. In this paper, inspired by the well-known sparse signal recovery algorithm called orthogonal matching pursuit (OMP), we introduce a deterministic, greedy edge selection algorithm called universal greedy approach (UGA) for graph sparsification. For a general spectral sparsification problem, e.g., positive subset selection problem from a set of $m$ vectors from $\mathbb{R}^n$, we propose a nonnegative UGA algorithm which needs $O(mn^2+ n^3/ε^2)$ time to find a $\frac{1+ε/β}{1-ε/β}$-spectral sparsifier with positive coefficients with sparsity $\le\lceil\frac{n}{ε^2}\rceil$, where $β$ is the ratio between the smallest length and largest length of the vectors. The convergence of the nonnegative UGA algorithm will be established. For the graph sparsification problem, another UGA algorithm will be proposed which can output a $\frac{1+O(ε)}{1-O(ε)}$-spectral sparsifier with $\lceil\frac{n}{ε^2}\rceil$ edges in $O(m+n^2/ε^2)$ time from a graph with $m$ edges and $n$ vertices under some mild assumptions. This is a linear time algorithm in terms of the number of edges that the community of graph sparsification is looking for. The best result in the literature to the knowledge of the authors is the existence of a deterministic algorithm which is almost linear, i.e. $O(m^{1+o(1)})$ for some $o(1)=O(\frac{(\log\log(m))^{2/3}}{\log^{1/3}(m)})$. Finally, extensive experimental results, including applications to graph clustering and least squares regression, show the effectiveness of proposed approaches. △ Less

Submitted 21 February, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

arXiv:2007.03805 [pdf, other]

ISA: An Intelligent Shopping Assistant

Authors: Tuan Manh Lai, Trung Bui, Nedim Lipka

Abstract: Despite the growth of e-commerce, brick-and-mortar stores are still the preferred destinations for many people. In this paper, we present ISA, a mobile-based intelligent shopping assistant that is designed to improve shopping experience in physical stores. ISA assists users by leveraging advanced techniques in computer vision, speech processing, and natural language processing. An in-store user on… ▽ More Despite the growth of e-commerce, brick-and-mortar stores are still the preferred destinations for many people. In this paper, we present ISA, a mobile-based intelligent shopping assistant that is designed to improve shopping experience in physical stores. ISA assists users by leveraging advanced techniques in computer vision, speech processing, and natural language processing. An in-store user only needs to take a picture or scan the barcode of the product of interest, and then the user can talk to the assistant about the product. The assistant can also guide the user through the purchase process or recommend other similar products to the user. We take a data-driven approach in building the engines of ISA's natural language processing component, and the engines achieve good performance. △ Less

Submitted 23 September, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

Comments: Accepted by AACL 2020 (Demo)

arXiv:1910.12995 [pdf, other]

A Simple but Effective BERT Model for Dialog State Tracking on Resource-Limited Systems

Authors: Tuan Manh Lai, Quan Hung Tran, Trung Bui, Daisuke Kihara

Abstract: In a task-oriented dialog system, the goal of dialog state tracking (DST) is to monitor the state of the conversation from the dialog history. Recently, many deep learning based methods have been proposed for the task. Despite their impressive performance, current neural architectures for DST are typically heavily-engineered and conceptually complex, making it difficult to implement, debug, and ma… ▽ More In a task-oriented dialog system, the goal of dialog state tracking (DST) is to monitor the state of the conversation from the dialog history. Recently, many deep learning based methods have been proposed for the task. Despite their impressive performance, current neural architectures for DST are typically heavily-engineered and conceptually complex, making it difficult to implement, debug, and maintain them in a production setting. In this work, we propose a simple but effective DST model based on BERT. In addition to its simplicity, our approach also has a number of other advantages: (a) the number of parameters does not grow with the ontology size (b) the model can operate in situations where the domain ontology may change dynamically. Experimental results demonstrate that our BERT-based model outperforms previous methods by a large margin, achieving new state-of-the-art results on the standard WoZ 2.0 dataset. Finally, to make the model small and fast enough for resource-restricted systems, we apply the knowledge distillation method to compress our model. The final compressed model achieves comparable results with the original model while being 8x smaller and 7x faster. △ Less

Submitted 8 February, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

Comments: Accepted to ICASSP 2020

arXiv:1908.09453 [pdf, other]

OpenSpiel: A Framework for Reinforcement Learning in Games

Authors: Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes , et al. (2 additional authors not shown)

Abstract: OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partia… ▽ More OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partially- and fully- observable) grid worlds and social dilemmas. OpenSpiel also includes tools to analyze learning dynamics and other common evaluation metrics. This document serves both as an overview of the code base and an introduction to the terminology, core concepts, and algorithms across the fields of reinforcement learning, computational game theory, and search. △ Less

Submitted 26 September, 2020; v1 submitted 25 August, 2019; originally announced August 2019.

arXiv:1901.02539 [pdf, other]

doi 10.1109/ICMLA.2018.00180

Supervised Transfer Learning for Product Information Question Answering

Authors: Tuan Manh Lai, Trung Bui, Nedim Lipka, Sheng Li

Abstract: Popular e-commerce websites such as Amazon offer community question answering systems for users to pose product related questions and experienced customers may provide answers voluntarily. In this paper, we show that the large volume of existing community question answering data can be beneficial when building a system for answering questions related to product facts and specifications. Our experi… ▽ More Popular e-commerce websites such as Amazon offer community question answering systems for users to pose product related questions and experienced customers may provide answers voluntarily. In this paper, we show that the large volume of existing community question answering data can be beneficial when building a system for answering questions related to product facts and specifications. Our experimental results demonstrate that the performance of a model for answering questions related to products listed in the Home Depot website can be improved by a large margin via a simple transfer learning technique from an existing large-scale Amazon community question answering dataset. Transfer learning can result in an increase of about 10% in accuracy in the experimental setting where we restrict the size of the data of the target task used for training. As an application of this work, we integrate the best performing model trained in this work into a mobile-based shopping assistant and show its usefulness. △ Less

Submitted 8 January, 2019; originally announced January 2019.

Comments: 2018 17th IEEE International Conference on Machine Learning and Applications

arXiv:1810.09061 [pdf, ps, other]

On DC based Methods for Phase Retrieval

Authors: Meng Huang, Ming-Jun Lai, Abraham Varghese, Zhiqiang Xu

Abstract: In this paper, we develop a new computational approach which is based on minimizing the difference of two convex functionals (DC) to solve a broader class of phase retrieval problems. The approach splits a standard nonlinear least squares minimizing function associated with the phase retrieval problem into the difference of two convex functions and then solves a sequence of convex minimization sub… ▽ More In this paper, we develop a new computational approach which is based on minimizing the difference of two convex functionals (DC) to solve a broader class of phase retrieval problems. The approach splits a standard nonlinear least squares minimizing function associated with the phase retrieval problem into the difference of two convex functions and then solves a sequence of convex minimization sub-problems. For each subproblem, the Nesterov's accelerated gradient descent algorithm or the Barzilai-Borwein (BB) algorithm is used. In the setting of sparse phase retrieval, a standard $\ell_1$ norm term is added into the minimization mentioned above. The subproblem is approximated by a proximal gradient method which is solved by the shrinkage-threshold technique directly without iterations. In addition, a modified Attouch-Peypouquet technique is used to accelerate the iterative computation. These lead to more effective algorithms than the Wirtinger flow (WF) algorithm and the Gauss-Newton (GN) algorithm and etc.. A convergence analysis of both DC based algorithms shows that the iterative solutions is convergent linearly to a critical point and will be closer to a global minimizer than the given initial starting point. Our study is a deterministic analysis while the study for the Wirtinger flow (WF) algorithm and its variants, the Gauss-Newton (GN) algorithm, the trust region algorithm is based on the probability analysis. In particular, the DC based algorithms are able to retrieve solutions using a number $m$ of measurements which is about twice of the number $n$ of entries in the solution with high frequency of successes. When $m\approx n$, the $\ell_1$ DC based algorithm is able to retrieve sparse signals. △ Less

Submitted 21 October, 2018; originally announced October 2018.

Comments: 28 pages

arXiv:1808.05780 [pdf, other]

Compressive Sensing for cut improvement and local clustering

Authors: Ming-Jun Lai, Daniel Mckenzie

Abstract: We show how one can phrase the cut improvement problem for graphs as a sparse recovery problem, whence one can use algorithms originally developed for use in compressive sensing (such as SubspacePursuit or CoSaMP) to solve it. We show that this approach to cut improvement is fast, both in theory and practice and moreover enjoys statistical guarantees of success when applied to graphs drawn from pr… ▽ More We show how one can phrase the cut improvement problem for graphs as a sparse recovery problem, whence one can use algorithms originally developed for use in compressive sensing (such as SubspacePursuit or CoSaMP) to solve it. We show that this approach to cut improvement is fast, both in theory and practice and moreover enjoys statistical guarantees of success when applied to graphs drawn from probabilistic models such as the Stochastic Block Model. Using this new cut improvement approach, which we call ClusterPursuit, as an algorithmic primitive we then propose new methods for local clustering and semi-supervised clustering, which enjoy similar guarantees of success and speed. Finally, we verify the promise of our approach with extensive numerical benchmarking. △ Less

Submitted 25 February, 2020; v1 submitted 17 August, 2018; originally announced August 2018.

Comments: 25 pages. Generalizes and improves upon the earlier versions arxiv: 1808.05780 and arXiv:1708.09477. To appear in SIMODS

MSC Class: 68Q25; 68R10; 68U05; 94A12

arXiv:1807.03399 [pdf, other]

Jointly Embedding Entities and Text with Distant Supervision

Authors: Denis Newman-Griffis, Albert M. Lai, Eric Fosler-Lussier

Abstract: Learning representations for knowledge base entities and concepts is becoming increasingly important for NLP applications. However, recent entity embedding methods have relied on structured resources that are expensive to create for new domains and corpora. We present a distantly-supervised method for jointly learning embeddings of entities and text from an unnanotated corpus, using only a list of… ▽ More Learning representations for knowledge base entities and concepts is becoming increasingly important for NLP applications. However, recent entity embedding methods have relied on structured resources that are expensive to create for new domains and corpora. We present a distantly-supervised method for jointly learning embeddings of entities and text from an unnanotated corpus, using only a list of mappings between entities and surface forms. We learn embeddings from open-domain and biomedical corpora, and compare against prior methods that rely on human-annotated text or large knowledge graph structure. Our embeddings capture entity similarity and relatedness better than prior work, both in existing biomedical datasets and a new Wikipedia-based dataset that we release to the community. Results on analogy completion and entity sense disambiguation indicate that entities and words capture complementary information that can be effectively combined for downstream use. △ Less

Submitted 9 July, 2018; originally announced July 2018.

Comments: 12 pages; Accepted to 3rd Workshop on Representation Learning for NLP (Repl4NLP 2018). Code at https://github.com/OSU-slatelab/JET

arXiv:1712.01815 [pdf, other]

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Authors: David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis

Abstract: The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game… ▽ More The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case. △ Less

Submitted 5 December, 2017; originally announced December 2017.

arXiv:1711.11317 [pdf, other]

doi 10.1109/JBHI.2018.2852639

Unsupervised Learning for Cell-level Visual Representation in Histopathology Images with Generative Adversarial Networks

Authors: Bo Hu, Ye Tang, Eric I-Chao Chang, Yubo Fan, Maode Lai, Yan Xu

Abstract: The visual attributes of cells, such as the nuclear morphology and chromatin openness, are critical for histopathology image analysis. By learning cell-level visual representation, we can obtain a rich mix of features that are highly reusable for various tasks, such as cell-level classification, nuclei segmentation, and cell counting. In this paper, we propose a unified generative adversarial netw… ▽ More The visual attributes of cells, such as the nuclear morphology and chromatin openness, are critical for histopathology image analysis. By learning cell-level visual representation, we can obtain a rich mix of features that are highly reusable for various tasks, such as cell-level classification, nuclei segmentation, and cell counting. In this paper, we propose a unified generative adversarial networks architecture with a new formulation of loss to perform robust cell-level visual representation learning in an unsupervised setting. Our model is not only label-free and easily trained but also capable of cell-level unsupervised classification with interpretable visualization, which achieves promising results in the unsupervised classification of bone marrow cellular components. Based on the proposed cell-level visual representation learning, we further develop a pipeline that exploits the varieties of cellular elements to perform histopathology image classification, the advantages of which are demonstrated on bone marrow datasets. △ Less

Submitted 7 July, 2018; v1 submitted 30 November, 2017; originally announced November 2017.

Comments: Accepted for publication in IEEE Journal of Biomedical and Health Informatics

arXiv:1709.04319 [pdf, ps, other]

Enhanced Particle Swarm Optimization Algorithms for Multiple-Input Multiple-Output System Modelling using Convolved Gaussian Process Models

Authors: Gang Cao, Edmund M-K Lai, Fakhrul Alam

Abstract: Convolved Gaussian Process (CGP) is able to capture the correlations not only between inputs and outputs but also among the outputs. This allows a superior performance of using CGP than standard Gaussian Process (GP) in the modelling of Multiple-Input Multiple-Output (MIMO) systems when observations are missing for some of outputs. Similar to standard GP, a key issue of CGP is the learning of hype… ▽ More Convolved Gaussian Process (CGP) is able to capture the correlations not only between inputs and outputs but also among the outputs. This allows a superior performance of using CGP than standard Gaussian Process (GP) in the modelling of Multiple-Input Multiple-Output (MIMO) systems when observations are missing for some of outputs. Similar to standard GP, a key issue of CGP is the learning of hyperparameters from a set of input-output observations. It typically performed by maximizing the Log-Likelihood (LL) function which leads to an unconstrained nonlinear and non-convex optimization problem. Algorithms such as Conjugate Gradient (CG) or Broyden-Fletcher-Goldfarb-Shanno (BFGS) are commonly used but they often get stuck in local optima, especially for CGP where there are more hyperparameters. In addition, the LL value is not a reliable indicator for judging the quality intermediate models in the optimization process. In this paper, we propose to use enhanced Particle Swarm Optimization (PSO) algorithms to solve this problem by minimizing the model output error instead. This optimization criterion enables the quality of intermediate solutions to be directly observable during the optimization process. Two enhancements to the standard PSO algorithm which make use of gradient information and the multi- start technique are proposed. Simulation results on the modelling of both linear and nonlinear systems demonstrate the effectiveness of minimizing the model output error to learn hyperparameters and the performance of using enhanced algorithms. △ Less

Submitted 12 July, 2017; originally announced September 2017.

arXiv:1708.09477 [pdf, other]

A Compressive Sensing Approach to Community Detection with Applications

Authors: Ming-Jun Lai, Daniel Mckenzie

Abstract: The community detection problem for graphs asks one to partition the n vertices V of a graph G into k communities, or clusters, such that there are many intracluster edges and few intercluster edges. Of course this is equivalent to finding a permutation matrix P such that, if A denotes the adjacency matrix of G, then PAP^T is approximately block diagonal. As there are k^n possible partitions of n… ▽ More The community detection problem for graphs asks one to partition the n vertices V of a graph G into k communities, or clusters, such that there are many intracluster edges and few intercluster edges. Of course this is equivalent to finding a permutation matrix P such that, if A denotes the adjacency matrix of G, then PAP^T is approximately block diagonal. As there are k^n possible partitions of n vertices into k subsets, directly determining the optimal clustering is clearly infeasible. Instead one seeks to solve a more tractable approximation to the clustering problem. In this paper we reformulate the community detection problem via sparse solution of a linear system associated with the Laplacian of a graph G and then develop a two-stage approach based on a thresholding technique and a compressive sensing algorithm to find a sparse solution which corresponds to the community containing a vertex of interest in G. Crucially, our approach results in an algorithm which is able to find a single cluster of size n_0 in O(nlog(n)n_0) operations and all k clusters in fewer than O(n^2ln(n)) operations. This is a marked improvement over the classic spectral clustering algorithm, which is unable to find a single cluster at a time and takes approximately O(n^3) operations to find all k clusters. Moreover, we are able to provide robust guarantees of success for the case where G is drawn at random from the Stochastic Block Model, a popular model for graphs with clusters. Extensive numerical results are also provided, showing the efficacy of our algorithm on both synthetic and real-world data sets. △ Less

Submitted 20 August, 2018; v1 submitted 30 August, 2017; originally announced August 2017.

Comments: 39 pages, 10 figures Version 2, disabled 'showkeys' package. Note that there is an error in the proof of Lemma 5.1. A correct version of this lemma, as well as a greatly improved version of the central algorithm of this paper, is available at: arXiv:1808.05780

Showing 1–50 of 66 results for author: Lai, M