-
Symbolic Learning Enables Self-Evolving Agents
Authors:
Wangchunshu Zhou,
Yixin Ou,
Shengwei Ding,
Long Li,
Jialong Wu,
Tiannan Wang,
Jiamin Chen,
Shuai Wang,
Xiaohua Xu,
Ningyu Zhang,
Huajun Chen,
Yuchen Eleanor Jiang
Abstract:
The AI community has been exploring a pathway to artificial general intelligence (AGI) by developing "language agents", which are complex large language models (LLMs) pipelines involving both prompting techniques and tool usage methods. While language agents have demonstrated impressive capabilities for many real-world tasks, a fundamental limitation of current language agents research is that the…
▽ More
The AI community has been exploring a pathway to artificial general intelligence (AGI) by developing "language agents", which are complex large language models (LLMs) pipelines involving both prompting techniques and tool usage methods. While language agents have demonstrated impressive capabilities for many real-world tasks, a fundamental limitation of current language agents research is that they are model-centric, or engineering-centric. That's to say, the progress on prompts, tools, and pipelines of language agents requires substantial manual engineering efforts from human experts rather than automatically learning from data. We believe the transition from model-centric, or engineering-centric, to data-centric, i.e., the ability of language agents to autonomously learn and evolve in environments, is the key for them to possibly achieve AGI.
In this work, we introduce agent symbolic learning, a systematic framework that enables language agents to optimize themselves on their own in a data-centric way using symbolic optimizers. Specifically, we consider agents as symbolic networks where learnable weights are defined by prompts, tools, and the way they are stacked together. Agent symbolic learning is designed to optimize the symbolic network within language agents by mimicking two fundamental algorithms in connectionist learning: back-propagation and gradient descent. Instead of dealing with numeric weights, agent symbolic learning works with natural language simulacrums of weights, loss, and gradients. We conduct proof-of-concept experiments on both standard benchmarks and complex real-world tasks and show that agent symbolic learning enables language agents to update themselves after being created and deployed in the wild, resulting in "self-evolving agents".
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Tri-VQA: Triangular Reasoning Medical Visual Question Answering for Multi-Attribute Analysis
Authors:
Lin Fan,
Xun Gong,
Cenyang Zheng,
Yafei Ou
Abstract:
The intersection of medical Visual Question Answering (Med-VQA) is a challenging research topic with advantages including patient engagement and clinical expert involvement for second opinions. However, existing Med-VQA methods based on joint embedding fail to explain whether their provided results are based on correct reasoning or coincidental answers, which undermines the credibility of VQA answ…
▽ More
The intersection of medical Visual Question Answering (Med-VQA) is a challenging research topic with advantages including patient engagement and clinical expert involvement for second opinions. However, existing Med-VQA methods based on joint embedding fail to explain whether their provided results are based on correct reasoning or coincidental answers, which undermines the credibility of VQA answers. In this paper, we investigate the construction of a more cohesive and stable Med-VQA structure. Motivated by causal effect, we propose a novel Triangular Reasoning VQA (Tri-VQA) framework, which constructs reverse causal questions from the perspective of "Why this answer?" to elucidate the source of the answer and stimulate more reasonable forward reasoning processes. We evaluate our method on the Endoscopic Ultrasound (EUS) multi-attribute annotated dataset from five centers, and test it on medical VQA datasets. Experimental results demonstrate the superiority of our approach over existing methods. Our codes and pre-trained models are available at https://anonymous.4open.science/r/Tri_VQA.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
MDA: An Interpretable Multi-Modal Fusion with Missing Modalities and Intrinsic Noise
Authors:
Lin Fan,
Yafei Ou,
Cenyang Zheng,
Pengyu Dai,
Tamotsu Kamishima,
Masayuki Ikebe,
Kenji Suzuki,
Xun Gong
Abstract:
Multi-modal fusion is crucial in medical data research, enabling a comprehensive understanding of diseases and improving diagnostic performance by combining diverse modalities. However, multi-modal fusion faces challenges, including capturing interactions between modalities, addressing missing modalities, handling erroneous modal information, and ensuring interpretability. Many existing researcher…
▽ More
Multi-modal fusion is crucial in medical data research, enabling a comprehensive understanding of diseases and improving diagnostic performance by combining diverse modalities. However, multi-modal fusion faces challenges, including capturing interactions between modalities, addressing missing modalities, handling erroneous modal information, and ensuring interpretability. Many existing researchers tend to design different solutions for these problems, often overlooking the commonalities among them. This paper proposes a novel multi-modal fusion framework that achieves adaptive adjustment over the weights of each modality by introducing the Modal-Domain Attention (MDA). It aims to facilitate the fusion of multi-modal information while allowing for the inclusion of missing modalities or intrinsic noise, thereby enhancing the representation of multi-modal data. We provide visualizations of accuracy changes and MDA weights by observing the process of modal fusion, offering a comprehensive analysis of its interpretability. Extensive experiments on various gastrointestinal disease benchmarks, the proposed MDA maintains high accuracy even in the presence of missing modalities and intrinsic noise. One thing worth mentioning is that the visualization of MDA is highly consistent with the conclusions of existing clinical studies on the dependence of different diseases on various modalities. Code and dataset will be made available.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Towards Accurate and Robust Architectures via Neural Architecture Search
Authors:
Yuwei Ou,
Yuqi Feng,
Yanan Sun
Abstract:
To defend deep neural networks from adversarial attacks, adversarial training has been drawing increasing attention for its effectiveness. However, the accuracy and robustness resulting from the adversarial training are limited by the architecture, because adversarial training improves accuracy and robustness by adjusting the weight connection affiliated to the architecture. In this work, we propo…
▽ More
To defend deep neural networks from adversarial attacks, adversarial training has been drawing increasing attention for its effectiveness. However, the accuracy and robustness resulting from the adversarial training are limited by the architecture, because adversarial training improves accuracy and robustness by adjusting the weight connection affiliated to the architecture. In this work, we propose ARNAS to search for accurate and robust architectures for adversarial training. First we design an accurate and robust search space, in which the placement of the cells and the proportional relationship of the filter numbers are carefully determined. With the design, the architectures can obtain both accuracy and robustness by deploying accurate and robust structures to their sensitive positions, respectively. Then we propose a differentiable multi-objective search strategy, performing gradient descent towards directions that are beneficial for both natural loss and adversarial loss, thus the accuracy and robustness can be guaranteed at the same time. We conduct comprehensive experiments in terms of white-box attacks, black-box attacks, and transferability. Experimental results show that the searched architecture has the strongest robustness with the competitive accuracy, and breaks the traditional idea that NAS-based architectures cannot transfer well to complex tasks in robustness scenarios. By analyzing outstanding architectures searched, we also conclude that accurate and robust neural architectures tend to deploy different structures near the input and output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust architectures.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
IncidentResponseGPT: Generating Traffic Incident Response Plans with Generative Artificial Intelligence
Authors:
Artur Grigorev,
Adriana-Simona Mihaita Khaled Saleh,
Yuming Ou
Abstract:
Traffic congestion due to road incidents poses a significant challenge in urban environments, leading to increased pollution, economic losses, and traffic congestion. Efficiently managing these incidents is imperative for mitigating their adverse effects; however, the complexity of urban traffic systems and the variety of potential incidents represent a considerable obstacle. This paper introduces…
▽ More
Traffic congestion due to road incidents poses a significant challenge in urban environments, leading to increased pollution, economic losses, and traffic congestion. Efficiently managing these incidents is imperative for mitigating their adverse effects; however, the complexity of urban traffic systems and the variety of potential incidents represent a considerable obstacle. This paper introduces IncidentResponseGPT, an innovative solution designed to assist traffic management authorities by providing rapid, informed, and adaptable traffic incident response plans. By integrating a Generative AI platform with real-time traffic incident reports and operational guidelines, our system aims to streamline the decision-making process in responding to traffic incidents. The research addresses the critical challenges involved in deploying AI in traffic management, including overcoming the complexity of urban traffic networks, ensuring real-time decision-making capabilities, aligning with local laws and regulations, and securing public acceptance for AI-driven systems. Through a combination of text analysis of accident reports, validation of AI recommendations through traffic simulation, and implementation of transparent and validated AI systems, IncidentResponseGPT offers a promising approach to optimizing traffic flow and reducing congestion in the face of traffic incidents. The relevance of this work extends to traffic management authorities, emergency response teams, and municipal bodies, all integral stakeholders in urban traffic control and incident management. By proposing a novel solution to the identified challenges, this research aims to develop a framework that not only facilitates faster resolution of traffic incidents but also minimizes their overall impact on urban traffic systems.
△ Less
Submitted 29 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Hyper Evidential Deep Learning to Quantify Composite Classification Uncertainty
Authors:
Changbin Li,
Kangshuo Li,
Yuzhe Ou,
Lance M. Kaplan,
Audun Jøsang,
Jin-Hee Cho,
Dong Hyun Jeong,
Feng Chen
Abstract:
Deep neural networks (DNNs) have been shown to perform well on exclusive, multi-class classification tasks. However, when different classes have similar visual features, it becomes challenging for human annotators to differentiate them. This scenario necessitates the use of composite class labels. In this paper, we propose a novel framework called Hyper-Evidential Neural Network (HENN) that explic…
▽ More
Deep neural networks (DNNs) have been shown to perform well on exclusive, multi-class classification tasks. However, when different classes have similar visual features, it becomes challenging for human annotators to differentiate them. This scenario necessitates the use of composite class labels. In this paper, we propose a novel framework called Hyper-Evidential Neural Network (HENN) that explicitly models predictive uncertainty due to composite class labels in training data in the context of the belief theory called Subjective Logic (SL). By placing a grouped Dirichlet distribution on the class probabilities, we treat predictions of a neural network as parameters of hyper-subjective opinions and learn the network that collects both single and composite evidence leading to these hyper-opinions by a deterministic DNN from data. We introduce a new uncertainty type called vagueness originally designed for hyper-opinions in SL to quantify composite classification uncertainty for DNNs. Our results demonstrate that HENN outperforms its state-of-the-art counterparts based on four image datasets. The code and datasets are available at: https://github.com/Hugo101/HyperEvidentialNN.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
A Realistic Surgical Simulator for Non-Rigid and Contact-Rich Manipulation in Surgeries with the da Vinci Research Kit
Authors:
Yafei Ou,
Sadra Zargarzadeh,
Paniz Sedighi,
Mahdi Tavakoli
Abstract:
Realistic real-time surgical simulators play an increasingly important role in surgical robotics research, such as surgical robot learning and automation, and surgical skills assessment. Although there are a number of existing surgical simulators for research, they generally lack the ability to simulate the diverse types of objects and contact-rich manipulation tasks typically present in surgeries…
▽ More
Realistic real-time surgical simulators play an increasingly important role in surgical robotics research, such as surgical robot learning and automation, and surgical skills assessment. Although there are a number of existing surgical simulators for research, they generally lack the ability to simulate the diverse types of objects and contact-rich manipulation tasks typically present in surgeries, such as tissue cutting and blood suction. In this work, we introduce CRESSim, a realistic surgical simulator based on PhysX 5 for the da Vinci Research Kit (dVRK) that enables simulating various contact-rich surgical tasks involving different surgical instruments, soft tissue, and body fluids. The real-world dVRK console and the master tool manipulator (MTM) robots are incorporated into the system to allow for teleoperation through virtual reality (VR). To showcase the advantages and potentials of the simulator, we present three examples of surgical tasks, including tissue grasping and deformation, blood suction, and tissue cutting. These tasks are performed using the simulated surgical instruments, including the large needle driver, suction irrigator, and curved scissor, through VR-based teleoperation.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Enhancing Traffic Incident Management with Large Language Models: A Hybrid Machine Learning Approach for Severity Classification
Authors:
Artur Grigorev,
Khaled Saleh,
Yuming Ou,
Adriana-Simona Mihaita
Abstract:
This research showcases the innovative integration of Large Language Models into machine learning workflows for traffic incident management, focusing on the classification of incident severity using accident reports. By leveraging features generated by modern language models alongside conventional data extracted from incident reports, our research demonstrates improvements in the accuracy of sever…
▽ More
This research showcases the innovative integration of Large Language Models into machine learning workflows for traffic incident management, focusing on the classification of incident severity using accident reports. By leveraging features generated by modern language models alongside conventional data extracted from incident reports, our research demonstrates improvements in the accuracy of severity classification across several machine learning algorithms. Our contributions are threefold. First, we present an extensive comparison of various machine learning models paired with multiple large language models for feature extraction, aiming to identify the optimal combinations for accurate incident severity classification. Second, we contrast traditional feature engineering pipelines with those enhanced by language models, showcasing the superiority of language-based feature engineering in processing unstructured text. Third, our study illustrates how merging baseline features from accident reports with language-based features can improve the severity classification accuracy. This comprehensive approach not only advances the field of incident management but also highlights the cross-domain application potential of our methodology, particularly in contexts requiring the prediction of event outcomes from unstructured textual data or features translated into textual representation. Specifically, our novel methodology was applied to three distinct datasets originating from the United States, the United Kingdom, and Queensland, Australia. This cross-continental application underlines the robustness of our approach, suggesting its potential for widespread adoption in improving incident management processes globally.
△ Less
Submitted 29 April, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents
Authors:
Yuqi Zhu,
Shuofei Qiao,
Yixin Ou,
Shumin Deng,
Ningyu Zhang,
Shiwei Lyu,
Yue Shen,
Lei Liang,
Jinjie Gu,
Huajun Chen
Abstract:
Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges, especially when interacting with environments through generating executable actions. This inadequacy primarily stems from the lack of built-in action knowledge in language agents, which fails to effectively guide the planning trajectories durin…
▽ More
Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges, especially when interacting with environments through generating executable actions. This inadequacy primarily stems from the lack of built-in action knowledge in language agents, which fails to effectively guide the planning trajectories during task solving and results in planning hallucination. To address this issue, we introduce KnowAgent, a novel approach designed to enhance the planning capabilities of LLMs by incorporating explicit action knowledge. Specifically, KnowAgent employs an action knowledge base and a knowledgeable self-learning strategy to constrain the action path during planning, enabling more reasonable trajectory synthesis, and thereby enhancing the planning performance of language agents. Experimental results on HotpotQA and ALFWorld based on various backbone models demonstrate that KnowAgent can achieve comparable or superior performance to existing baselines. Further analysis indicates the effectiveness of KnowAgent in terms of planning hallucinations mitigation. Code is available in https://github.com/zjunlp/KnowAgent.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Halo Reduction in Display Systems through Smoothed Local Histogram Equalization and Human Visual System Modeling
Authors:
Prasoon Ambalathankandy,
Yafei Ou,
Masayuki Ikebe
Abstract:
Halo artifacts significantly impact display quality. We propose a method to reduce halos in Local Histogram Equalization (LHE) algorithms by separately addressing dark and light variants. This approach results in visually natural images by exploring the relationship between lateral inhibition and halo artifacts in the human visual system.
Halo artifacts significantly impact display quality. We propose a method to reduce halos in Local Histogram Equalization (LHE) algorithms by separately addressing dark and light variants. This approach results in visually natural images by exploring the relationship between lateral inhibition and halo artifacts in the human visual system.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Sparse Anatomical Prompt Semi-Supervised Learning with Masked Image Modeling for CBCT Tooth Segmentation
Authors:
Pengyu Dai,
Yafei Ou,
Yang Liu,
Yue Zhao
Abstract:
Accurate tooth identification and segmentation in Cone Beam Computed Tomography (CBCT) dental images can significantly enhance the efficiency and precision of manual diagnoses performed by dentists. However, existing segmentation methods are mainly developed based on large data volumes training, on which their annotations are extremely time-consuming. Meanwhile, the teeth of each class in CBCT den…
▽ More
Accurate tooth identification and segmentation in Cone Beam Computed Tomography (CBCT) dental images can significantly enhance the efficiency and precision of manual diagnoses performed by dentists. However, existing segmentation methods are mainly developed based on large data volumes training, on which their annotations are extremely time-consuming. Meanwhile, the teeth of each class in CBCT dental images being closely positioned, coupled with subtle inter-class differences, gives rise to the challenge of indistinct boundaries when training model with limited data. To address these challenges, this study aims to propose a tasked-oriented Masked Auto-Encoder paradigm to effectively utilize large amounts of unlabeled data to achieve accurate tooth segmentation with limited labeled data. Specifically, we first construct a self-supervised pre-training framework of masked auto encoder to efficiently utilize unlabeled data to enhance the network performance. Subsequently, we introduce a sparse masked prompt mechanism based on graph attention to incorporate boundary information of the teeth, aiding the network in learning the anatomical structural features of teeth. To the best of our knowledge, we are pioneering the integration of the mask pre-training paradigm into the CBCT tooth segmentation task. Extensive experiments demonstrate both the feasibility of our proposed method and the potential of the boundary prompt mechanism.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
A Psychological Study: Importance of Contrast and Luminance in Color to Grayscale Mapping
Authors:
Prasoon Ambalathankandy,
Yafei Ou,
Sae Kaneko,
Masayuki Ikebe
Abstract:
Grayscale images are essential in image processing and computer vision tasks. They effectively emphasize luminance and contrast, highlighting important visual features, while also being easily compatible with other algorithms. Moreover, their simplified representation makes them efficient for storage and transmission purposes. While preserving contrast is important for maintaining visual quality,…
▽ More
Grayscale images are essential in image processing and computer vision tasks. They effectively emphasize luminance and contrast, highlighting important visual features, while also being easily compatible with other algorithms. Moreover, their simplified representation makes them efficient for storage and transmission purposes. While preserving contrast is important for maintaining visual quality, other factors such as preserving information relevant to the specific application or task at hand may be more critical for achieving optimal performance. To evaluate and compare different decolorization algorithms, we designed a psychological experiment. During the experiment, participants were instructed to imagine color images in a hypothetical "colorless world" and select the grayscale image that best resembled their mental visualization. We conducted a comparison between two types of algorithms: (i) perceptual-based simple color space conversion algorithms, and (ii) spatial contrast-based algorithms, including iteration-based methods. Our experimental findings indicate that CIELAB exhibited superior performance on average, providing further evidence for the effectiveness of perception-based decolorization algorithms. On the other hand, the spatial contrast-based algorithms showed relatively poorer performance, possibly due to factors such as DC-offset and artificial contrast generation. However, these algorithms demonstrated shorter selection times. Notably, no single algorithm consistently outperformed the others across all test images. In this paper, we will delve into a comprehensive discussion on the significance of contrast and luminance in color-to-grayscale mapping based on our experimental results and analysis.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models
Authors:
Yixin Ou,
Ningyu Zhang,
Honghao Gui,
Ziwen Xu,
Shuofei Qiao,
Yida Xue,
Runnan Fang,
Kangwei Liu,
Lei Li,
Zhen Bi,
Guozhou Zheng,
Huajun Chen
Abstract:
In recent years, instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs). To construct high-quality instruction datasets, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist am…
▽ More
In recent years, instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs). To construct high-quality instruction datasets, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist among various instruction processing methods, there is no standard open-source instruction processing implementation framework available for the community, which hinders practitioners from further developing and advancing. To facilitate instruction processing research and development, we present EasyInstruct, an easy-to-use instruction processing framework for LLMs, which modularizes instruction generation, selection, and prompting, while also considering their combination and interaction. EasyInstruct is publicly released and actively maintained at https://github.com/zjunlp/EasyInstruct, along with an online demo app and a demo video for quick-start, calling for broader research centered on instruction data and synthetic data.
△ Less
Submitted 23 June, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Prototypical Contrastive Learning through Alignment and Uniformity for Recommendation
Authors:
Yangxun Ou,
Lei Chen,
Fenglin Pan,
Yupeng Wu
Abstract:
Graph Collaborative Filtering (GCF), one of the most widely adopted recommendation system methods, effectively captures intricate relationships between user and item interactions. Graph Contrastive Learning (GCL) based GCF has gained significant attention as it leverages self-supervised techniques to extract valuable signals from real-world scenarios. However, many methods usually learn the instan…
▽ More
Graph Collaborative Filtering (GCF), one of the most widely adopted recommendation system methods, effectively captures intricate relationships between user and item interactions. Graph Contrastive Learning (GCL) based GCF has gained significant attention as it leverages self-supervised techniques to extract valuable signals from real-world scenarios. However, many methods usually learn the instances of discrimination tasks that involve the construction of contrastive pairs through random sampling. GCL approaches suffer from sampling bias issues, where the negatives might have a semantic structure similar to that of the positives, thus leading to a loss of effective feature representation. To address these problems, we present the \underline{Proto}typical contrastive learning through \underline{A}lignment and \underline{U}niformity for recommendation, which is called \textbf{ProtoAU}. Specifically, we first propose prototypes (cluster centroids) as a latent space to ensure consistency across different augmentations from the origin graph, aiming to eliminate the need for random sampling of contrastive pairs. Furthermore, the absence of explicit negatives means that directly optimizing the consistency loss between instance and prototype could easily result in dimensional collapse issues. Therefore, we propose aligning and maintaining uniformity in the prototypes of users and items as optimization objectives to prevent falling into trivial solutions. Finally, we conduct extensive experiments on four datasets and evaluate their performance on the task of link prediction. Experimental results demonstrate that the proposed ProtoAU outperforms other representative methods. The source codes of our proposed ProtoAU are available at \url{https://github.com/oceanlvr/ProtoAU}.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Baichuan2-Sum: Instruction Finetune Baichuan2-7B Model for Dialogue Summarization
Authors:
Jianfei Xiao,
Yancan Chen,
Yimin Ou,
Hanyi Yu,
Kai Shu,
Yiyong Xiao
Abstract:
Large language models (LLMs) like Llama, Baichuan and Bloom models show remarkable ability with instruction fine-tuning in many natural language tasks. Nevertheless, for the dialogue summarization task, which aims to generate summaries for different roles in dialogue, most of the state-of-the-art methods conduct on small models (e.g Bart and Bert). Existing methods try to add task specified optimi…
▽ More
Large language models (LLMs) like Llama, Baichuan and Bloom models show remarkable ability with instruction fine-tuning in many natural language tasks. Nevertheless, for the dialogue summarization task, which aims to generate summaries for different roles in dialogue, most of the state-of-the-art methods conduct on small models (e.g Bart and Bert). Existing methods try to add task specified optimization on small models like adding global-local centrality score to models. In this paper, we propose an instruction fine-tuning model: Baichuan2-Sum, for role-oriented diaglouge summarization. By setting different instructions for different roles, the model can learn from the dialogue interactions and output the expected summaries. Furthermore, we applied NEFTune technique to add suitable noise during training to improve the results. The experiments demonstrate that the proposed model achieves the new state-of-the-art results on two public dialogue summarization datasets: CSDS and SAMSUM. We release our model and related codes to facilitate future studies on dialogue summarization task.
△ Less
Submitted 3 April, 2024; v1 submitted 27 January, 2024;
originally announced January 2024.
-
Training Robust Deep Physiological Measurement Models with Synthetic Video-based Data
Authors:
Yuxuan Ou,
Yuzhe Zhang,
Yuntang Wang,
Shwetak Patel,
Daniel McDuf,
Yuzhe Yang,
Xin Liu
Abstract:
Recent advances in supervised deep learning techniques have demonstrated the possibility to remotely measure human physiological vital signs (e.g., photoplethysmograph, heart rate) just from facial videos. However, the performance of these methods heavily relies on the availability and diversity of real labeled data. Yet, collecting large-scale real-world data with high-quality labels is typically…
▽ More
Recent advances in supervised deep learning techniques have demonstrated the possibility to remotely measure human physiological vital signs (e.g., photoplethysmograph, heart rate) just from facial videos. However, the performance of these methods heavily relies on the availability and diversity of real labeled data. Yet, collecting large-scale real-world data with high-quality labels is typically challenging and resource intensive, which also raises privacy concerns when storing personal bio-metric data. Synthetic video-based datasets (e.g., SCAMPS \cite{mcduff2022scamps}) with photo-realistic synthesized avatars are introduced to alleviate the issues while providing high-quality synthetic data. However, there exists a significant gap between synthetic and real-world data, which hinders the generalization of neural models trained on these synthetic datasets. In this paper, we proposed several measures to add real-world noise to synthetic physiological signals and corresponding facial videos. We experimented with individual and combined augmentation methods and evaluated our framework on three public real-world datasets. Our results show that we were able to reduce the average MAE from 6.9 to 2.0.
△ Less
Submitted 15 November, 2023; v1 submitted 9 November, 2023;
originally announced November 2023.
-
MLatom 3: Platform for machine learning-enhanced computational chemistry simulations and workflows
Authors:
Pavlo O. Dral,
Fuchun Ge,
Yi-Fan Hou,
Peikun Zheng,
Yuxinxin Chen,
Mario Barbatti,
Olexandr Isayev,
Cheng Wang,
Bao-Xin Xue,
Max Pinheiro Jr,
Yuming Su,
Yiheng Dai,
Yangtao Chen,
Lina Zhang,
Shuang Zhang,
Arif Ullah,
Quanhao Zhang,
Yanchi Ou
Abstract:
Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provid…
▽ More
Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provides plenty of choice to the users who can run simulations with the command line options, input files, or with scripts using MLatom as a Python package, both on their computers and on the online XACS cloud computing at XACScloud.com. Computational chemists can calculate energies and thermochemical properties, optimize geometries, run molecular and quantum dynamics, and simulate (ro)vibrational, one-photon UV/vis absorption, and two-photon absorption spectra with ML, quantum mechanical, and combined models. The users can choose from an extensive library of methods containing pre-trained ML models and quantum mechanical approximations such as AIQM1 approaching coupled-cluster accuracy. The developers can build their own models using various ML algorithms. The great flexibility of MLatom is largely due to the extensive use of the interfaces to many state-of-the-art software packages and libraries.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Progressive Dual Priori Network for Generalized Breast Tumor Segmentation
Authors:
Li Wang,
Lihui Wang,
Zixiang Kuai,
Lei Tang,
Yingfeng Ou,
Chen Ye,
Yuemin Zhu
Abstract:
To promote the generalization ability of breast tumor segmentation models, as well as to improve the segmentation performance for breast tumors with smaller size, low-contrast and irregular shape, we propose a progressive dual priori network (PDPNet) to segment breast tumors from dynamic enhanced magnetic resonance images (DCE-MRI) acquired at different centers. The PDPNet first cropped tumor regi…
▽ More
To promote the generalization ability of breast tumor segmentation models, as well as to improve the segmentation performance for breast tumors with smaller size, low-contrast and irregular shape, we propose a progressive dual priori network (PDPNet) to segment breast tumors from dynamic enhanced magnetic resonance images (DCE-MRI) acquired at different centers. The PDPNet first cropped tumor regions with a coarse-segmentation based localization module, then the breast tumor mask was progressively refined by using the weak semantic priori and cross-scale correlation prior knowledge. To validate the effectiveness of PDPNet, we compared it with several state-of-the-art methods on multi-center datasets. The results showed that, comparing against the suboptimal method, the DSC and HD95 of PDPNet were improved at least by 5.13% and 7.58% respectively on multi-center test sets. In addition, through ablations, we demonstrated that the proposed localization module can decrease the influence of normal tissues and therefore improve the generalization ability of the model. The weak semantic priors allow focusing on tumor regions to avoid missing small tumors and low-contrast tumors. The cross-scale correlation priors are beneficial for promoting the shape-aware ability for irregular tumors. Thus integrating them in a unified framework improved the multi-center breast tumor segmentation performance. The source code and open data can be accessed at https://github.com/wangli100209/PDPNet.
△ Less
Submitted 16 June, 2024; v1 submitted 20 October, 2023;
originally announced October 2023.
-
Tackling Heterogeneity in Medical Federated learning via Vision Transformers
Authors:
Erfan Darzi,
Yiqing Shen,
Yangming Ou,
Nanna M. Sijtsema,
P. M. A van Ooijen
Abstract:
Optimization-based regularization methods have been effective in addressing the challenges posed by data heterogeneity in medical federated learning, particularly in improving the performance of underrepresented clients. However, these methods often lead to lower overall model accuracy and slower convergence rates. In this paper, we demonstrate that using Vision Transformers can substantially impr…
▽ More
Optimization-based regularization methods have been effective in addressing the challenges posed by data heterogeneity in medical federated learning, particularly in improving the performance of underrepresented clients. However, these methods often lead to lower overall model accuracy and slower convergence rates. In this paper, we demonstrate that using Vision Transformers can substantially improve the performance of underrepresented clients without a significant trade-off in overall accuracy. This improvement is attributed to the Vision transformer's ability to capture long-range dependencies within the input data.
△ Less
Submitted 15 November, 2023; v1 submitted 13 October, 2023;
originally announced October 2023.
-
OceanGPT: A Large Language Model for Ocean Science Tasks
Authors:
Zhen Bi,
Ningyu Zhang,
Yida Xue,
Yixin Ou,
Daxiong Ji,
Guozhou Zheng,
Huajun Chen
Abstract:
Ocean science, which delves into the oceans that are reservoirs of life and biodiversity, is of great significance given that oceans cover over 70% of our planet's surface. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in science. Despite the success in other domains, current LLMs often fall short in catering to the needs of domain experts like oceanographers, an…
▽ More
Ocean science, which delves into the oceans that are reservoirs of life and biodiversity, is of great significance given that oceans cover over 70% of our planet's surface. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in science. Despite the success in other domains, current LLMs often fall short in catering to the needs of domain experts like oceanographers, and the potential of LLMs for ocean science is under-explored. The intrinsic reasons are the immense and intricate nature of ocean data as well as the necessity for higher granularity and richness in knowledge. To alleviate these issues, we introduce OceanGPT, the first-ever large language model in the ocean domain, which is expert in various ocean science tasks. We also propose OceanGPT, a novel framework to automatically obtain a large volume of ocean domain instruction data, which generates instructions based on multi-agent collaboration. Additionally, we construct the first oceanography benchmark, OceanBench, to evaluate the capabilities of LLMs in the ocean domain. Though comprehensive experiments, OceanGPT not only shows a higher level of knowledge expertise for oceans science tasks but also gains preliminary embodied intelligence capabilities in ocean technology.
△ Less
Submitted 23 May, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
InfLoR-SNN: Reducing Information Loss for Spiking Neural Networks
Authors:
Yufei Guo,
Yuanpei Chen,
Liwen Zhang,
Xiaode Liu,
Xinyi Tong,
Yuanyuan Ou,
Xuhui Huang,
Zhe Ma
Abstract:
The Spiking Neural Network (SNN) has attracted more and more attention recently. It adopts binary spike signals to transmit information. Benefitting from the information passing paradigm of SNNs, the multiplications of activations and weights can be replaced by additions, which are more energy-efficient. However, its "Hard Reset" mechanism for the firing activity would ignore the difference among…
▽ More
The Spiking Neural Network (SNN) has attracted more and more attention recently. It adopts binary spike signals to transmit information. Benefitting from the information passing paradigm of SNNs, the multiplications of activations and weights can be replaced by additions, which are more energy-efficient. However, its "Hard Reset" mechanism for the firing activity would ignore the difference among membrane potentials when the membrane potential is above the firing threshold, causing information loss. Meanwhile, quantifying the membrane potential to 0/1 spikes at the firing instants will inevitably introduce the quantization error thus bringing about information loss too. To address these problems, we propose to use the "Soft Reset" mechanism for the supervised training-based SNNs, which will drive the membrane potential to a dynamic reset potential according to its magnitude, and Membrane Potential Rectifier (MPR) to reduce the quantization error via redistributing the membrane potential to a range close to the spikes. Results show that the SNNs with the "Soft Reset" mechanism and MPR outperform their vanilla counterparts on both static and dynamic datasets.
△ Less
Submitted 17 August, 2023; v1 submitted 10 July, 2023;
originally announced July 2023.
-
A region and category confidence-based multi-task network for carotid ultrasound image segmentation and classification
Authors:
Haitao Gan,
Ran Zhou,
Yanghan Ou,
Furong Wang,
Xinyao Cheng,
Aaron Fenster
Abstract:
The segmentation and classification of carotid plaques in ultrasound images play important roles in the treatment of atherosclerosis and assessment for the risk of stroke. Although deep learning methods have been used for carotid plaque segmentation and classification, two-stage methods will increase the complexity of the overall analysis and the existing multi-task methods ignored the relationshi…
▽ More
The segmentation and classification of carotid plaques in ultrasound images play important roles in the treatment of atherosclerosis and assessment for the risk of stroke. Although deep learning methods have been used for carotid plaque segmentation and classification, two-stage methods will increase the complexity of the overall analysis and the existing multi-task methods ignored the relationship between the segmentation and classification. These will lead to suboptimal performance as valuable information might not be fully leveraged across all tasks. Therefore, we propose a multi-task learning framework (RCCM-Net) for ultrasound carotid plaque segmentation and classification, which utilizes a region confidence module (RCM) and a sample category confidence module (CCM) to exploit the correlation between these two tasks. The RCM provides knowledge from the probability of plaque regions to the classification task, while the CCM is designed to learn the categorical sample weight for the segmentation task. A total of 1270 2D ultrasound images of carotid plaques were collected from Zhongnan Hospital (Wuhan, China) for our experiments. The results showed that the proposed method can improve both segmentation and classification performance compared to existing single-task networks (i.e., SegNet, Deeplabv3+, UNet++, EfficientNet, Res2Net, RepVGG, DPN) and multi-task algorithms (i.e., HRNet, MTANet), with an accuracy of 85.82% for classification and a Dice-similarity-coefficient of 84.92% for segmentation. In the ablation study, the results demonstrated that both the designed RCM and CCM were beneficial in improving the network's performance. Therefore, we believe that the proposed method could be useful for carotid plaque analysis in clinical trials and practice.
△ Less
Submitted 18 November, 2023; v1 submitted 2 July, 2023;
originally announced July 2023.
-
Sim-to-Real Surgical Robot Learning and Autonomous Planning for Internal Tissue Points Manipulation using Reinforcement Learning
Authors:
Yafei Ou,
Mahdi Tavakoli
Abstract:
Indirect simultaneous positioning (ISP), where internal tissue points are placed at desired locations indirectly through the manipulation of boundary points, is a type of subtask frequently performed in robotic surgeries. Although challenging due to complex tissue dynamics, automating the task can potentially reduce the workload of surgeons. This paper presents a sim-to-real framework for learning…
▽ More
Indirect simultaneous positioning (ISP), where internal tissue points are placed at desired locations indirectly through the manipulation of boundary points, is a type of subtask frequently performed in robotic surgeries. Although challenging due to complex tissue dynamics, automating the task can potentially reduce the workload of surgeons. This paper presents a sim-to-real framework for learning to automate the task without interacting with a real environment, and for planning preoperatively to find the grasping points that minimize local tissue deformation. A control policy is learned using deep reinforcement learning (DRL) in the FEM-based simulation environment and transferred to real-world situation. Grasping points are planned in the simulator by utilizing the trained policy using Bayesian optimization (BO). Inconsistent simulation performance is overcome by formulating the problem as a state augmented Markov decision process (MDP). Experimental results show that the learned policy places the internal tissue points accurately, and that the planned grasping points yield small tissue deformation among the trials. The proposed learning and planning scheme is able to automate internal tissue point manipulation in surgeries and has the potential to be generalized to complex surgical scenarios.
△ Less
Submitted 24 June, 2023;
originally announced June 2023.
-
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
Authors:
Wenjin Wang,
Yunhao Li,
Yixin Ou,
Yin Zhang
Abstract:
Layout-aware pre-trained models has achieved significant progress on document image question answering. They introduce extra learnable modules into existing language models to capture layout information within document images from text bounding box coordinates obtained by OCR tools. However, extra modules necessitate pre-training on extensive document images. This prevents these methods from direc…
▽ More
Layout-aware pre-trained models has achieved significant progress on document image question answering. They introduce extra learnable modules into existing language models to capture layout information within document images from text bounding box coordinates obtained by OCR tools. However, extra modules necessitate pre-training on extensive document images. This prevents these methods from directly utilizing off-the-shelf instruction-tuning language foundation models, which have recently shown promising potential in zero-shot learning. Instead, in this paper, we find that instruction-tuning language models like Claude and ChatGPT can understand layout by spaces and line breaks. Based on this observation, we propose the LAyout and Task aware Instruction Prompt (LATIN-Prompt), which consists of layout-aware document content and task-aware instruction. Specifically, the former uses appropriate spaces and line breaks to recover the layout information among text segments obtained by OCR tools, and the latter ensures that generated answers adhere to formatting requirements. Moreover, we propose the LAyout and Task aware Instruction Tuning (LATIN-Tuning) to improve the performance of small instruction-tuning models like Alpaca. Experimental results show that LATIN-Prompt enables zero-shot performance of Claude and ChatGPT to be comparable to the fine-tuning performance of SOTAs on document image question answering, and LATIN-Tuning enhances the zero-shot performance of Alpaca significantly. For example, LATIN-Prompt improves the performance of Claude and ChatGPT on DocVQA by 263% and 20% respectively. LATIN-Tuning improves the performance of Alpaca on DocVQA by 87.7%. Quantitative and qualitative analyses demonstrate the effectiveness of LATIN-Prompt and LATIN-Tuning. We provide the code in supplementary and will release it to facilitate future research.
△ Less
Submitted 7 September, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities
Authors:
Yuqi Zhu,
Xiaohan Wang,
Jing Chen,
Shuofei Qiao,
Yixin Ou,
Yunzhi Yao,
Shumin Deng,
Huajun Chen,
Ningyu Zhang
Abstract:
This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We engage in experiments across eight diverse datasets, focusing on four representative tasks encompassing entity and relation extraction, event extraction, link prediction, and question-answering, thereby thoroughly exploring LLMs' performa…
▽ More
This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We engage in experiments across eight diverse datasets, focusing on four representative tasks encompassing entity and relation extraction, event extraction, link prediction, and question-answering, thereby thoroughly exploring LLMs' performance in the domain of construction and inference. Empirically, our findings suggest that LLMs, represented by GPT-4, are more suited as inference assistants rather than few-shot information extractors. Specifically, while GPT-4 exhibits good performance in tasks related to KG construction, it excels further in reasoning tasks, surpassing fine-tuned models in certain cases. Moreover, our investigation extends to the potential generalization ability of LLMs for information extraction, leading to the proposition of a Virtual Knowledge Extraction task and the development of the corresponding VINE dataset. Based on these empirical findings, we further propose AutoKG, a multi-agent-based approach employing LLMs and external sources for KG construction and reasoning. We anticipate that this research can provide invaluable insights for future undertakings in the field of knowledge graphs. The code and datasets are in https://github.com/zjunlp/AutoKG.
△ Less
Submitted 22 February, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
A Deep Registration Method for Accurate Quantification of Joint Space Narrowing Progression in Rheumatoid Arthritis
Authors:
Haolin Wang,
Yafei Ou,
Wanxuan Fang,
Prasoon Ambalathankandy,
Naoto Goto,
Gen Ota,
Masayuki Ikebe,
Tamotsu Kamishima
Abstract:
Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disease that results in progressive articular destruction and severe disability. Joint space narrowing (JSN) progression has been regarded as an important indicator for RA progression and has received sustained attention. In the diagnosis and monitoring of RA, radiology plays a crucial role to monitor joint space. A new framework for m…
▽ More
Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disease that results in progressive articular destruction and severe disability. Joint space narrowing (JSN) progression has been regarded as an important indicator for RA progression and has received sustained attention. In the diagnosis and monitoring of RA, radiology plays a crucial role to monitor joint space. A new framework for monitoring joint space by quantifying JSN progression through image registration in radiographic images has been developed. This framework offers the advantage of high accuracy, however, challenges do exist in reducing mismatches and improving reliability. In this work, a deep intra-subject rigid registration network is proposed to automatically quantify JSN progression in the early stage of RA. In our experiments, the mean-square error of Euclidean distance between moving and fixed image is 0.0031, standard deviation is 0.0661 mm, and the mismatching rate is 0.48\%. The proposed method has sub-pixel level accuracy, exceeding manual measurements by far, and is equipped with immune to noise, rotation, and scaling of joints. Moreover, this work provides loss visualization, which can aid radiologists and rheumatologists in assessing quantification reliability, with important implications for possible future clinical applications. As a result, we are optimistic that this proposed work will make a significant contribution to the automatic quantification of JSN progression in RA.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Computer-Vision Benchmark Segment-Anything Model (SAM) in Medical Images: Accuracy in 12 Datasets
Authors:
Sheng He,
Rina Bao,
Jingpeng Li,
Jeffrey Stout,
Atle Bjornerud,
P. Ellen Grant,
Yangming Ou
Abstract:
Background: The segment-anything model (SAM), introduced in April 2023, shows promise as a benchmark model and a universal solution to segment various natural images. It comes without previously-required re-training or fine-tuning specific to each new dataset.
Purpose: To test SAM's accuracy in various medical image segmentation tasks and investigate potential factors that may affect its accurac…
▽ More
Background: The segment-anything model (SAM), introduced in April 2023, shows promise as a benchmark model and a universal solution to segment various natural images. It comes without previously-required re-training or fine-tuning specific to each new dataset.
Purpose: To test SAM's accuracy in various medical image segmentation tasks and investigate potential factors that may affect its accuracy in medical images.
Methods: SAM was tested on 12 public medical image segmentation datasets involving 7,451 subjects. The accuracy was measured by the Dice overlap between the algorithm-segmented and ground-truth masks. SAM was compared with five state-of-the-art algorithms specifically designed for medical image segmentation tasks. Associations of SAM's accuracy with six factors were computed, independently and jointly, including segmentation difficulties as measured by segmentation ability score and by Dice overlap in U-Net, image dimension, size of the target region, image modality, and contrast.
Results: The Dice overlaps from SAM were significantly lower than the five medical-image-based algorithms in all 12 medical image segmentation datasets, by a margin of 0.1-0.5 and even 0.6-0.7 Dice. SAM-Semantic was significantly associated with medical image segmentation difficulty and the image modality, and SAM-Point and SAM-Box were significantly associated with image segmentation difficulty, image dimension, target region size, and target-vs-background contrast. All these 3 variations of SAM were more accurate in 2D medical images, larger target region sizes, easier cases with a higher Segmentation Ability score and higher U-Net Dice, and higher foreground-background contrast.
△ Less
Submitted 5 May, 2023; v1 submitted 18 April, 2023;
originally announced April 2023.
-
Differentiable Genetic Programming for High-dimensional Symbolic Regression
Authors:
Peng Zeng,
Xiaotian Song,
Andrew Lensen,
Yuwei Ou,
Yanan Sun,
Mengjie Zhang,
Jiancheng Lv
Abstract:
Symbolic regression (SR) is the process of discovering hidden relationships from data with mathematical expressions, which is considered an effective way to reach interpretable machine learning (ML). Genetic programming (GP) has been the dominator in solving SR problems. However, as the scale of SR problems increases, GP often poorly demonstrates and cannot effectively address the real-world high-…
▽ More
Symbolic regression (SR) is the process of discovering hidden relationships from data with mathematical expressions, which is considered an effective way to reach interpretable machine learning (ML). Genetic programming (GP) has been the dominator in solving SR problems. However, as the scale of SR problems increases, GP often poorly demonstrates and cannot effectively address the real-world high-dimensional problems. This limitation is mainly caused by the stochastic evolutionary nature of traditional GP in constructing the trees. In this paper, we propose a differentiable approach named DGP to construct GP trees towards high-dimensional SR for the first time. Specifically, a new data structure called differentiable symbolic tree is proposed to relax the discrete structure to be continuous, thus a gradient-based optimizer can be presented for the efficient optimization. In addition, a sampling method is proposed to eliminate the discrepancy caused by the above relaxation for valid symbolic expressions. Furthermore, a diversification mechanism is introduced to promote the optimizer escaping from local optima for globally better solutions. With these designs, the proposed DGP method can efficiently search for the GP trees with higher performance, thus being capable of dealing with high-dimensional SR. To demonstrate the effectiveness of DGP, we conducted various experiments against the state of the arts based on both GP and deep neural networks. The experiment results reveal that DGP can outperform these chosen peer competitors on high-dimensional regression benchmarks with dimensions varying from tens to thousands. In addition, on the synthetic SR problems, the proposed DGP method can also achieve the best recovery rate even with different noisy levels. It is believed this work can facilitate SR being a powerful alternative to interpretable ML for a broader range of real-world problems.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
U-Netmer: U-Net meets Transformer for medical image segmentation
Authors:
Sheng He,
Rina Bao,
P. Ellen Grant,
Yangming Ou
Abstract:
The combination of the U-Net based deep learning models and Transformer is a new trend for medical image segmentation. U-Net can extract the detailed local semantic and texture information and Transformer can learn the long-rang dependencies among pixels in the input image. However, directly adapting the Transformer for segmentation has ``token-flatten" problem (flattens the local patches into 1D…
▽ More
The combination of the U-Net based deep learning models and Transformer is a new trend for medical image segmentation. U-Net can extract the detailed local semantic and texture information and Transformer can learn the long-rang dependencies among pixels in the input image. However, directly adapting the Transformer for segmentation has ``token-flatten" problem (flattens the local patches into 1D tokens which losses the interaction among pixels within local patches) and ``scale-sensitivity" problem (uses a fixed scale to split the input image into local patches). Compared to directly combining U-Net and Transformer, we propose a new global-local fashion combination of U-Net and Transformer, named U-Netmer, to solve the two problems. The proposed U-Netmer splits an input image into local patches. The global-context information among local patches is learnt by the self-attention mechanism in Transformer and U-Net segments each local patch instead of flattening into tokens to solve the `token-flatten" problem. The U-Netmer can segment the input image with different patch sizes with the identical structure and the same parameter. Thus, the U-Netmer can be trained with different patch sizes to solve the ``scale-sensitivity" problem. We conduct extensive experiments in 7 public datasets on 7 organs (brain, heart, breast, lung, polyp, pancreas and prostate) and 4 imaging modalities (MRI, CT, ultrasound, and endoscopy) to show that the proposed U-Netmer can be generally applied to improve accuracy of medical image segmentation. These experimental results show that U-Netmer provides state-of-the-art performance compared to baselines and other models. In addition, the discrepancy among the outputs of U-Netmer with different scales is linearly correlated to the segmentation accuracy which can be considered as a confidence score to rank test images by difficulty without ground-truth.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
Authors:
Yaobo Liang,
Chenfei Wu,
Ting Song,
Wenshan Wu,
Yan Xia,
Yu Liu,
Yang Ou,
Shuai Lu,
Lei Ji,
Shaoguang Mao,
Yun Wang,
Linjun Shou,
Ming Gong,
Nan Duan
Abstract:
Artificial Intelligence (AI) has made incredible progress recently. On the one hand, advanced foundation models like ChatGPT can offer powerful conversation, in-context learning and code generation abilities on a broad range of open-domain tasks. They can also generate high-level solution outlines for domain-specific tasks based on the common sense knowledge they have acquired. However, they still…
▽ More
Artificial Intelligence (AI) has made incredible progress recently. On the one hand, advanced foundation models like ChatGPT can offer powerful conversation, in-context learning and code generation abilities on a broad range of open-domain tasks. They can also generate high-level solution outlines for domain-specific tasks based on the common sense knowledge they have acquired. However, they still face difficulties with some specialized tasks because they lack enough domain-specific data during pre-training or they often have errors in their neural network computations on those tasks that need accurate executions. On the other hand, there are also many existing models and systems (symbolic-based or neural-based) that can do some domain-specific tasks very well. However, due to the different implementation or working mechanisms, they are not easily accessible or compatible with foundation models. Therefore, there is a clear and pressing need for a mechanism that can leverage foundation models to propose task solution outlines and then automatically match some of the sub-tasks in the outlines to the off-the-shelf models and systems with special functionalities to complete them. Inspired by this, we introduce TaskMatrix.AI as a new AI ecosystem that connects foundation models with millions of APIs for task completion. Unlike most previous work that aimed to improve a single AI model, TaskMatrix.AI focuses more on using existing foundation models (as a brain-like central system) and APIs of other AI models and systems (as sub-task solvers) to achieve diversified tasks in both digital and physical domains. As a position paper, we will present our vision of how to build such an ecosystem, explain each key component, and use study cases to illustrate both the feasibility of this vision and the main challenges we need to address next.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
A Concept Knowledge Graph for User Next Intent Prediction at Alipay
Authors:
Yacheng He,
Qianghuai Jia,
Lin Yuan,
Ruopeng Li,
Yixin Ou,
Ningyu Zhang
Abstract:
This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. To explicitly characterize user intent, we propose AlipayKG, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content int…
▽ More
This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. To explicitly characterize user intent, we propose AlipayKG, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content interacted by users and the relations between them. We further introduce a Transformer-based model which integrates expert rules from the knowledge graph to infer the online user's next intent. Experimental results demonstrate that the proposed system can effectively enhance the performance of the downstream tasks while retaining explainability.
△ Less
Submitted 14 March, 2023; v1 submitted 1 January, 2023;
originally announced January 2023.
-
Differentiable Search of Accurate and Robust Architectures
Authors:
Yuwei Ou,
Xiangning Xie,
Shangce Gao,
Yanan Sun,
Kay Chen Tan,
Jiancheng Lv
Abstract:
Deep neural networks (DNNs) are found to be vulnerable to adversarial attacks, and various methods have been proposed for the defense. Among these methods, adversarial training has been drawing increasing attention because of its simplicity and effectiveness. However, the performance of the adversarial training is greatly limited by the architectures of target DNNs, which often makes the resulting…
▽ More
Deep neural networks (DNNs) are found to be vulnerable to adversarial attacks, and various methods have been proposed for the defense. Among these methods, adversarial training has been drawing increasing attention because of its simplicity and effectiveness. However, the performance of the adversarial training is greatly limited by the architectures of target DNNs, which often makes the resulting DNNs with poor accuracy and unsatisfactory robustness. To address this problem, we propose DSARA to automatically search for the neural architectures that are accurate and robust after adversarial training. In particular, we design a novel cell-based search space specially for adversarial training, which improves the accuracy and the robustness upper bound of the searched architectures by carefully designing the placement of the cells and the proportional relationship of the filter numbers. Then we propose a two-stage search strategy to search for both accurate and robust neural architectures. At the first stage, the architecture parameters are optimized to minimize the adversarial loss, which makes full use of the effectiveness of the adversarial training in enhancing the robustness. At the second stage, the architecture parameters are optimized to minimize both the natural loss and the adversarial loss utilizing the proposed multi-objective adversarial training method, so that the searched neural architectures are both accurate and robust. We evaluate the proposed algorithm under natural data and various adversarial attacks, which reveals the superiority of the proposed method in terms of both accurate and robust architectures. We also conclude that accurate and robust neural architectures tend to deploy very different structures near the input and the output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust neural architectures.
△ Less
Submitted 2 January, 2023; v1 submitted 28 December, 2022;
originally announced December 2022.
-
Reasoning with Language Model Prompting: A Survey
Authors:
Shuofei Qiao,
Yixin Ou,
Ningyu Zhang,
Xiang Chen,
Yunzhi Yao,
Shumin Deng,
Chuanqi Tan,
Fei Huang,
Huajun Chen
Abstract:
Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc. This paper provides a comprehensive survey of cutting-edge research on reasoning with language model prompting. We introduce research works with comparisons and summaries and provide systematic resources to help beginners. We…
▽ More
Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc. This paper provides a comprehensive survey of cutting-edge research on reasoning with language model prompting. We introduce research works with comparisons and summaries and provide systematic resources to help beginners. We also discuss the potential reasons for emerging such reasoning abilities and highlight future research directions. Resources are available at https://github.com/zjunlp/Prompt4ReasoningPapers (updated periodically).
△ Less
Submitted 18 September, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Segmentation Ability Map: Interpret deep features for medical image segmentation
Authors:
Sheng He,
Yanfang Feng,
P. Ellen Grant,
Yangming Ou
Abstract:
Deep convolutional neural networks (CNNs) have been widely used for medical image segmentation. In most studies, only the output layer is exploited to compute the final segmentation results and the hidden representations of the deep learned features have not been well understood. In this paper, we propose a prototype segmentation (ProtoSeg) method to compute a binary segmentation map based on deep…
▽ More
Deep convolutional neural networks (CNNs) have been widely used for medical image segmentation. In most studies, only the output layer is exploited to compute the final segmentation results and the hidden representations of the deep learned features have not been well understood. In this paper, we propose a prototype segmentation (ProtoSeg) method to compute a binary segmentation map based on deep features. We measure the segmentation abilities of the features by computing the Dice between the feature segmentation map and ground-truth, named as the segmentation ability score (SA score for short). The corresponding SA score can quantify the segmentation abilities of deep features in different layers and units to understand the deep neural networks for segmentation. In addition, our method can provide a mean SA score which can give a performance estimation of the output on the test images without ground-truth. Finally, we use the proposed ProtoSeg method to compute the segmentation map directly on input images to further understand the segmentation ability of each input image. Results are presented on segmenting tumors in brain MRI, lesions in skin images, COVID-related abnormality in CT images, prostate segmentation in abdominal MRI, and pancreatic mass segmentation in CT images. Our method can provide new insights for interpreting and explainable AI systems for medical image segmentation.
Our code is available on: \url{https://github.com/shengfly/ProtoSeg}.
△ Less
Submitted 18 December, 2022;
originally announced December 2022.
-
Modeling Global Distribution for Federated Learning with Label Distribution Skew
Authors:
Tao Sheng,
Chengchao Shen,
Yuan Liu,
Yeyu Ou,
Zhe Qu,
Jianxin Wang
Abstract:
Federated learning achieves joint training of deep models by connecting decentralized data sources, which can significantly mitigate the risk of privacy leakage. However, in a more general case, the distributions of labels among clients are different, called ``label distribution skew''. Directly applying conventional federated learning without consideration of label distribution skew issue signifi…
▽ More
Federated learning achieves joint training of deep models by connecting decentralized data sources, which can significantly mitigate the risk of privacy leakage. However, in a more general case, the distributions of labels among clients are different, called ``label distribution skew''. Directly applying conventional federated learning without consideration of label distribution skew issue significantly hurts the performance of the global model. To this end, we propose a novel federated learning method, named FedMGD, to alleviate the performance degradation caused by the label distribution skew issue. It introduces a global Generative Adversarial Network to model the global data distribution without access to local datasets, so the global model can be trained using the global information of data distribution without privacy leakage. The experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art on several public benchmarks. Code is available at \url{https://github.com/Sheng-T/FedMGD}.
△ Less
Submitted 17 December, 2022;
originally announced December 2022.
-
Representation Learning of Knowledge Graph for Wireless Communication Networks
Authors:
Shiwen He,
Yeyu Ou,
Liangpeng Wang,
Hang Zhan,
Peng Ren,
Yongming Huang
Abstract:
With the application of the fifth-generation wireless communication technologies, more smart terminals are being used and generating huge amounts of data, which has prompted extensive research on how to handle and utilize these wireless data. Researchers currently focus on the research on the upper-layer application data or studying the intelligent transmission methods concerning a specific proble…
▽ More
With the application of the fifth-generation wireless communication technologies, more smart terminals are being used and generating huge amounts of data, which has prompted extensive research on how to handle and utilize these wireless data. Researchers currently focus on the research on the upper-layer application data or studying the intelligent transmission methods concerning a specific problem based on a large amount of data generated by the Monte Carlo simulations. This article aims to understand the endogenous relationship of wireless data by constructing a knowledge graph according to the wireless communication protocols, and domain expert knowledge and further investigating the wireless endogenous intelligence. We firstly construct a knowledge graph of the endogenous factors of wireless core network data collected via a 5G/B5G testing network. Then, a novel model based on graph convolutional neural networks is designed to learn the representation of the graph, which is used to classify graph nodes and simulate the relation prediction. The proposed model realizes the automatic nodes classification and network anomaly cause tracing. It is also applied to the public datasets in an unsupervised manner. Finally, the results show that the classification accuracy of the proposed model is better than the existing unsupervised graph neural network models, such as VGAE and ARVGE.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Patcher: Patch Transformers with Mixture of Experts for Precise Medical Image Segmentation
Authors:
Yanglan Ou,
Ye Yuan,
Xiaolei Huang,
Stephen T. C. Wong,
John Volpi,
James Z. Wang,
Kelvin Wong
Abstract:
We present a new encoder-decoder Vision Transformer architecture, Patcher, for medical image segmentation. Unlike standard Vision Transformers, it employs Patcher blocks that segment an image into large patches, each of which is further divided into small patches. Transformers are applied to the small patches within a large patch, which constrains the receptive field of each pixel. We intentionall…
▽ More
We present a new encoder-decoder Vision Transformer architecture, Patcher, for medical image segmentation. Unlike standard Vision Transformers, it employs Patcher blocks that segment an image into large patches, each of which is further divided into small patches. Transformers are applied to the small patches within a large patch, which constrains the receptive field of each pixel. We intentionally make the large patches overlap to enhance intra-patch communication. The encoder employs a cascade of Patcher blocks with increasing receptive fields to extract features from local to global levels. This design allows Patcher to benefit from both the coarse-to-fine feature extraction common in CNNs and the superior spatial relationship modeling of Transformers. We also propose a new mixture-of-experts (MoE) based decoder, which treats the feature maps from the encoder as experts and selects a suitable set of expert features to predict the label for each pixel. The use of MoE enables better specializations of the expert features and reduces interference between them during inference. Extensive experiments demonstrate that Patcher outperforms state-of-the-art Transformer- and CNN-based approaches significantly on stroke lesion segmentation and polyp segmentation. Code for Patcher is released with publication to facilitate future research.
△ Less
Submitted 29 May, 2023; v1 submitted 3 June, 2022;
originally announced June 2022.
-
UniInst: Unique Representation for End-to-End Instance Segmentation
Authors:
Yimin Ou,
Rui Yang,
Lufan Ma,
Yong Liu,
Jiangpeng Yan,
Shang Xu,
Chengjie Wang,
Xiu Li
Abstract:
Existing instance segmentation methods have achieved impressive performance but still suffer from a common dilemma: redundant representations (e.g., multiple boxes, grids, and anchor points) are inferred for one instance, which leads to multiple duplicated predictions. Thus, mainstream methods usually rely on a hand-designed non-maximum suppression (NMS) post-processing step to select the optimal…
▽ More
Existing instance segmentation methods have achieved impressive performance but still suffer from a common dilemma: redundant representations (e.g., multiple boxes, grids, and anchor points) are inferred for one instance, which leads to multiple duplicated predictions. Thus, mainstream methods usually rely on a hand-designed non-maximum suppression (NMS) post-processing step to select the optimal prediction result, which hinders end-to-end training. To address this issue, we propose a box-free and NMS-free end-to-end instance segmentation framework, termed UniInst, that yields only one unique representation for each instance. Specifically, we design an instance-aware one-to-one assignment scheme, namely Only Yield One Representation (OYOR), which dynamically assigns one unique representation to each instance according to the matching quality between predictions and ground truths. Then, a novel prediction re-ranking strategy is elegantly integrated into the framework to address the misalignment between the classification score and the mask quality, enabling the learned representation to be more discriminative. With these techniques, our UniInst, the first FCN-based box-free and NMS-free instance segmentation framework, achieves competitive performance, e.g., 39.0 mask AP using ResNet-50-FPN and 40.2 mask AP using ResNet-101-FPN, against mainstream methods on COCO test-dev 2017. Moreover, the proposed instance-aware method is robust to occlusion scenes, outperforming common baselines by remarkable mask AP on the heavily-occluded OCHuman benchmark. Code is available at https://github.com/b03505036/UniInst.
△ Less
Submitted 17 September, 2022; v1 submitted 25 May, 2022;
originally announced May 2022.
-
A Sub-pixel Accurate Quantification of Joint Space Narrowing Progression in Rheumatoid Arthritis
Authors:
Yafei Ou,
Prasoon Ambalathankandy,
Ryunosuke Furuya,
Seiya Kawada,
Tianyu Zeng,
Yujie An,
Tamotsu Kamishima,
Kenichi Tamura,
Masayuki Ikebe
Abstract:
Rheumatoid arthritis (RA) is a chronic autoimmune disease that primarily affects peripheral synovial joints, like fingers, wrist and feet. Radiology plays a critical role in the diagnosis and monitoring of RA. Limited by the current spatial resolution of radiographic imaging, joint space narrowing (JSN) progression of RA with the same reason above can be less than one pixel per year with universal…
▽ More
Rheumatoid arthritis (RA) is a chronic autoimmune disease that primarily affects peripheral synovial joints, like fingers, wrist and feet. Radiology plays a critical role in the diagnosis and monitoring of RA. Limited by the current spatial resolution of radiographic imaging, joint space narrowing (JSN) progression of RA with the same reason above can be less than one pixel per year with universal spatial resolution. Insensitive monitoring of JSN can hinder the radiologist/rheumatologist from making a proper and timely clinical judgment. In this paper, we propose a novel and sensitive method that we call partial image phase-only correlation which aims to automatically quantify JSN progression in the early stages of RA. The majority of the current literature utilizes the mean error, root-mean-square deviation and standard deviation to report the accuracy at pixel level. Our work measures JSN progression between a baseline and its follow-up finger joint images by using the phase spectrum in the frequency domain. Using this study, the mean error can be reduced to 0.0130mm when applied to phantom radiographs with ground truth, and 0.0519mm standard deviation for clinical radiography. With its sub-pixel accuracy far beyond manual measurement, we are optimistic that our work is promising for automatically quantifying JSN progression.
△ Less
Submitted 1 November, 2022; v1 submitted 19 May, 2022;
originally announced May 2022.
-
Semantic Geometric Fusion Multi-object Tracking and Lidar Odometry in Dynamic Environment
Authors:
Tingchen Ma,
Yongsheng Ou
Abstract:
The SLAM system based on static scene assumption will introduce huge estimation errors when moving objects appear in the field of view. This paper proposes a novel multi-object dynamic lidar odometry (MLO) based on semantic object detection technology to solve this problem. The MLO system can provide reliable localization of robot and semantic objects and build long-term static maps in complex dyn…
▽ More
The SLAM system based on static scene assumption will introduce huge estimation errors when moving objects appear in the field of view. This paper proposes a novel multi-object dynamic lidar odometry (MLO) based on semantic object detection technology to solve this problem. The MLO system can provide reliable localization of robot and semantic objects and build long-term static maps in complex dynamic scenes. For ego-motion estimation, we use the environment features that take semantic and geometric consistency constraints into account in the extraction process. The filtering features are robust to semantic movable and unknown dynamic objects. At the same time, a least square estimator using the semantic bounding box and object point cloud is proposed to achieve accurate and stable multi-object tracking between frames. In the mapping module, we further realize dynamic semantic object detection based on the absolute trajectory tracking list (ATTL). Then, static semantic objects and environmental features can be used to eliminate accumulated localization errors and build pure static maps. Experiments on public KITTI data sets show that the proposed system can achieve more accurate and robust tracking of the object and better real-time localization accuracy in complex scenes compared with existing technologies.
△ Less
Submitted 2 March, 2023; v1 submitted 25 April, 2022;
originally announced April 2022.
-
Deep Relation Learning for Regression and Its Application to Brain Age Estimation
Authors:
Sheng He,
Yanfang Feng,
P. Ellen Grant,
Yangming Ou
Abstract:
Most deep learning models for temporal regression directly output the estimation based on single input images, ignoring the relationships between different images. In this paper, we propose deep relation learning for regression, aiming to learn different relations between a pair of input images. Four non-linear relations are considered: "cumulative relation", "relative relation", "maximal relation…
▽ More
Most deep learning models for temporal regression directly output the estimation based on single input images, ignoring the relationships between different images. In this paper, we propose deep relation learning for regression, aiming to learn different relations between a pair of input images. Four non-linear relations are considered: "cumulative relation", "relative relation", "maximal relation" and "minimal relation". These four relations are learned simultaneously from one deep neural network which has two parts: feature extraction and relation regression. We use an efficient convolutional neural network to extract deep features from the pair of input images and apply a Transformer for relation learning. The proposed method is evaluated on a merged dataset with 6,049 subjects with ages of 0-97 years using 5-fold cross-validation for the task of brain age estimation. The experimental results have shown that the proposed method achieved a mean absolute error (MAE) of 2.38 years, which is lower than the MAEs of 8 other state-of-the-art algorithms with statistical significance (p$<$0.05) in paired T-test (two-side).
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
Revisiting Recursive Least Squares for Training Deep Neural Networks
Authors:
Chunyuan Zhang,
Qi Song,
Hui Zhou,
Yigui Ou,
Hongyao Deng,
Laurence Tianruo Yang
Abstract:
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence. However, previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions. In this paper, to overcome these drawbacks, we propose three novel RLS optimization algorithms for t…
▽ More
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence. However, previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions. In this paper, to overcome these drawbacks, we propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks (including long short-term memory networks), by using the error backpropagation and our average-approximation RLS method, together with the equivalent gradients of the linear least squares loss function with respect to the linear outputs of hidden layers. Compared with previous RLS optimization algorithms, our algorithms are simple and elegant. They can be viewed as an improved stochastic gradient descent (SGD) algorithm, which uses the inverse autocorrelation matrix of each layer as the adaptive learning rate. Their time and space complexities are only several times those of SGD. They only require the loss function to be the mean squared error and the activation function of the output layer to be invertible. In fact, our algorithms can be also used in combination with other first-order optimization algorithms without requiring these two preconditions. In addition, we present two improved methods for our algorithms. Finally, we demonstrate their effectiveness compared to the Adam algorithm on MNIST, CIFAR-10 and IMDB datasets, and investigate the influences of their hyperparameters experimentally.
△ Less
Submitted 7 September, 2021;
originally announced September 2021.
-
Global-Local Transformer for Brain Age Estimation
Authors:
Sheng He,
P. Ellen Grant,
Yangming Ou
Abstract:
Deep learning can provide rapid brain age estimation based on brain magnetic resonance imaging (MRI). However, most studies use one neural network to extract the global information from the whole input image, ignoring the local fine-grained details. In this paper, we propose a global-local transformer, which consists of a global-pathway to extract the global-context information from the whole inpu…
▽ More
Deep learning can provide rapid brain age estimation based on brain magnetic resonance imaging (MRI). However, most studies use one neural network to extract the global information from the whole input image, ignoring the local fine-grained details. In this paper, we propose a global-local transformer, which consists of a global-pathway to extract the global-context information from the whole input image and a local-pathway to extract the local fine-grained details from local patches. The fine-grained information from the local patches are fused with the global-context information by the attention mechanism, inspired by the transformer, to estimate the brain age. We evaluate the proposed method on 8 public datasets with 8,379 healthy brain MRIs with the age range of 0-97 years. 6 datasets are used for cross-validation and 2 datasets are used for evaluating the generality. Comparing with other state-of-the-art methods, the proposed global-local transformer reduces the mean absolute error of the estimated ages to 2.70 years and increases the correlation coefficient of the estimated age and the chronological age to 0.9853. In addition, our proposed method provides regional information of which local patches are most informative for brain age estimation. Our source code is available on: \url{https://github.com/shengfly/global-local-transformer}.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
An Overview on the Application of Graph Neural Networks in Wireless Networks
Authors:
S. He,
S. Xiong,
Y. Ou,
J. Zhang,
J. Wang,
Y. Huang,
Y. Zhang
Abstract:
In recent years, with the rapid enhancement of computing power, deep learning methods have been widely applied in wireless networks and achieved impressive performance. To effectively exploit the information of graph-structured data as well as contextual information, graph neural networks (GNNs) have been introduced to address a series of optimization problems of wireless networks. In this overvie…
▽ More
In recent years, with the rapid enhancement of computing power, deep learning methods have been widely applied in wireless networks and achieved impressive performance. To effectively exploit the information of graph-structured data as well as contextual information, graph neural networks (GNNs) have been introduced to address a series of optimization problems of wireless networks. In this overview, we first illustrate the construction method of wireless communication graph for various wireless networks and simply introduce the progress of several classical paradigms of GNNs. Then, several applications of GNNs in wireless networks such as resource allocation and several emerging fields, are discussed in detail. Finally, some research trends about the applications of GNNs in wireless communication systems are discussed.
△ Less
Submitted 17 November, 2021; v1 submitted 7 July, 2021;
originally announced July 2021.
-
Visual Relationship Forecasting in Videos
Authors:
Li Mi,
Yangjun Ou,
Zhenzhong Chen
Abstract:
Real-world scenarios often require the anticipation of object interactions in unknown future, which would assist the decision-making process of both humans and agents. To meet this challenge, we present a new task named Visual Relationship Forecasting (VRF) in videos to explore the prediction of visual relationships in a reasoning manner. Specifically, given a subject-object pair with H existing f…
▽ More
Real-world scenarios often require the anticipation of object interactions in unknown future, which would assist the decision-making process of both humans and agents. To meet this challenge, we present a new task named Visual Relationship Forecasting (VRF) in videos to explore the prediction of visual relationships in a reasoning manner. Specifically, given a subject-object pair with H existing frames, VRF aims to predict their future interactions for the next T frames without visual evidence. To evaluate the VRF task, we introduce two video datasets named VRF-AG and VRF-VidOR, with a series of spatio-temporally localized visual relation annotations in a video. These two datasets densely annotate 13 and 35 visual relationships in 1923 and 13447 video clips, respectively. In addition, we present a novel Graph Convolutional Transformer (GCT) framework, which captures both object-level and frame-level dependencies by spatio-temporal Graph Convolution Network and Transformer. Experimental results on both VRF-AG and VRF-VidOR datasets demonstrate that GCT outperforms the state-of-the-art sequence modelling methods on visual relationship forecasting.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
LambdaUNet: 2.5D Stroke Lesion Segmentation of Diffusion-weighted MR Images
Authors:
Yanglan Ou,
Ye Yuan,
Xiaolei Huang,
Kelvin Wong,
John Volpi,
James Z. Wang,
Stephen T. C. Wong
Abstract:
Diffusion-weighted (DW) magnetic resonance imaging is essential for the diagnosis and treatment of ischemic stroke. DW images (DWIs) are usually acquired in multi-slice settings where lesion areas in two consecutive 2D slices are highly discontinuous due to large slice thickness and sometimes even slice gaps. Therefore, although DWIs contain rich 3D information, they cannot be treated as regular 3…
▽ More
Diffusion-weighted (DW) magnetic resonance imaging is essential for the diagnosis and treatment of ischemic stroke. DW images (DWIs) are usually acquired in multi-slice settings where lesion areas in two consecutive 2D slices are highly discontinuous due to large slice thickness and sometimes even slice gaps. Therefore, although DWIs contain rich 3D information, they cannot be treated as regular 3D or 2D images. Instead, DWIs are somewhere in-between (or 2.5D) due to the volumetric nature but inter-slice discontinuities. Thus, it is not ideal to apply most existing segmentation methods as they are designed for either 2D or 3D images. To tackle this problem, we propose a new neural network architecture tailored for segmenting highly-discontinuous 2.5D data such as DWIs. Our network, termed LambdaUNet, extends UNet by replacing convolutional layers with our proposed Lambda+ layers. In particular, Lambda+ layers transform both intra-slice and inter-slice context around a pixel into linear functions, called lambdas, which are then applied to the pixel to produce informative 2.5D features. LambdaUNet is simple yet effective in combining sparse inter-slice information from adjacent slices while also capturing dense contextual features within a single slice. Experiments on a unique clinical dataset demonstrate that LambdaUNet outperforms existing 3D/2D image segmentation methods including recent variants of UNet. Code for LambdaUNet is released with the publication to facilitate future research.
△ Less
Submitted 29 May, 2023; v1 submitted 28 April, 2021;
originally announced April 2021.
-
LGA-RCNN: Loss-Guided Attention for Object Detection
Authors:
Xin Yi,
Jiahao Wu,
Bo Ma,
Yangtong Ou,
Longyao Liu
Abstract:
Object detection is widely studied in computer vision filed. In recent years, certain representative deep learning based detection methods along with solid benchmarks are proposed, which boosts the development of related researchs. However, existing detection methods still suffer from undesirable performance under challenges such as camouflage, blur, inter-class similarity, intra-class variance an…
▽ More
Object detection is widely studied in computer vision filed. In recent years, certain representative deep learning based detection methods along with solid benchmarks are proposed, which boosts the development of related researchs. However, existing detection methods still suffer from undesirable performance under challenges such as camouflage, blur, inter-class similarity, intra-class variance and complex environment. To address this issue, we propose LGA-RCNN which utilizes a loss-guided attention (LGA) module to highlight representative region of objects. Then, those highlighted local information are fused with global information for precise classification and localization.
△ Less
Submitted 12 May, 2021; v1 submitted 28 April, 2021;
originally announced April 2021.
-
Epsilon Consistent Mixup: Structural Regularization with an Adaptive Consistency-Interpolation Tradeoff
Authors:
Vincent Pisztora,
Yanglan Ou,
Xiaolei Huang,
Francesca Chiaromonte,
Jia Li
Abstract:
In this paper we propose $ε$-Consistent Mixup ($ε$mu). $ε$mu is a data-based structural regularization technique that combines Mixup's linear interpolation with consistency regularization in the Mixup direction, by compelling a simple adaptive tradeoff between the two. This learnable combination of consistency and interpolation induces a more flexible structure on the evolution of the response acr…
▽ More
In this paper we propose $ε$-Consistent Mixup ($ε$mu). $ε$mu is a data-based structural regularization technique that combines Mixup's linear interpolation with consistency regularization in the Mixup direction, by compelling a simple adaptive tradeoff between the two. This learnable combination of consistency and interpolation induces a more flexible structure on the evolution of the response across the feature space and is shown to improve semi-supervised classification accuracy on the SVHN and CIFAR10 benchmark datasets, yielding the largest gains in the most challenging low label-availability scenarios. Empirical studies comparing $ε$mu and Mixup are presented and provide insight into the mechanisms behind $ε$mu's effectiveness. In particular, $ε$mu is found to produce more accurate synthetic labels and more confident predictions than Mixup.
△ Less
Submitted 29 September, 2021; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Mitigating Hand Blockage with Non-Directional Beamforming Codebooks
Authors:
Vasanthan Raghavan,
Ricardo A. Motos,
M. Ali Tassoudji,
Yu-Chin Ou,
Ozge H. Koymen,
Junyi Li
Abstract:
Hand blockage leads to significant performance impairments at millimeter wave carrier frequencies. A number of prior works have characterized the loss in signal strength with the hand using studies with horn antennas and form-factor user equipments (UEs). However, the impact of the hand on the effective phase response seen by the antenna elements has not been studied so far. Towards this goal, we…
▽ More
Hand blockage leads to significant performance impairments at millimeter wave carrier frequencies. A number of prior works have characterized the loss in signal strength with the hand using studies with horn antennas and form-factor user equipments (UEs). However, the impact of the hand on the effective phase response seen by the antenna elements has not been studied so far. Towards this goal, we consider a measurement framework that uses a hand phantom holding the UE in relaxed positions reflective of talk mode, watching videos, and playing games. We first study the impact of blockage on a directional beam steering codebook. The tight phase relationship across antenna elements needed to steer beams leads to a significant performance degradation as the hand surface can distort the observed amplitudes and phases across the antenna elements, which cannot be matched by this codebook. To overcome this loss, we propose a non-directional beamforming codebook made of amplitudes and/or quantized phases with both these quantities estimated as necessary. Theoretical as well as numerical studies show that the proposed codebook can de-randomize the phase distortions induced by the hand and coherently combine the energy across antenna elements and thus help in mitigating hand blockage losses.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Whitening Sentence Representations for Better Semantics and Faster Retrieval
Authors:
Jianlin Su,
Jiarun Cao,
Weijie Liu,
Yangyiwen Ou
Abstract:
Pre-training models such as BERT have achieved great success in many natural language processing tasks. However, how to obtain better sentence representation through these pre-training models is still worthy to exploit. Previous work has shown that the anisotropy problem is an critical bottleneck for BERT-based sentence representation which hinders the model to fully utilize the underlying semanti…
▽ More
Pre-training models such as BERT have achieved great success in many natural language processing tasks. However, how to obtain better sentence representation through these pre-training models is still worthy to exploit. Previous work has shown that the anisotropy problem is an critical bottleneck for BERT-based sentence representation which hinders the model to fully utilize the underlying semantic features. Therefore, some attempts of boosting the isotropy of sentence distribution, such as flow-based model, have been applied to sentence representations and achieved some improvement. In this paper, we find that the whitening operation in traditional machine learning can similarly enhance the isotropy of sentence representations and achieve competitive results. Furthermore, the whitening technique is also capable of reducing the dimensionality of the sentence representation. Our experimental results show that it can not only achieve promising performance but also significantly reduce the storage cost and accelerate the model retrieval speed.
△ Less
Submitted 28 March, 2021;
originally announced March 2021.