subscribe to arXiv mailings

Highly Accelerated MRI via Implicit Neural Representation Guided Posterior Sampling of Diffusion Models

Authors: Jiayue Chu, Chenhe Du, Xiyue Lin, Yuyao Zhang, Hongjiang Wei

Abstract: Reconstructing high-fidelity magnetic resonance (MR) images from under-sampled k-space is a commonly used strategy to reduce scan time. The posterior sampling of diffusion models based on the real measurement data holds significant promise of improved reconstruction accuracy. However, traditional posterior sampling methods often lack effective data consistency guidance, leading to inaccurate and u… ▽ More Reconstructing high-fidelity magnetic resonance (MR) images from under-sampled k-space is a commonly used strategy to reduce scan time. The posterior sampling of diffusion models based on the real measurement data holds significant promise of improved reconstruction accuracy. However, traditional posterior sampling methods often lack effective data consistency guidance, leading to inaccurate and unstable reconstructions. Implicit neural representation (INR) has emerged as a powerful paradigm for solving inverse problems by modeling a signal's attributes as a continuous function of spatial coordinates. In this study, we present a novel posterior sampler for diffusion models using INR, named DiffINR. The INR-based component incorporates both the diffusion prior distribution and the MRI physical model to ensure high data fidelity. DiffINR demonstrates superior performance on experimental datasets with remarkable accuracy, even under high acceleration factors (up to R=12 in single-channel reconstruction). Notably, our proposed framework can be a generalizable framework to solve inverse problems in other medical imaging tasks. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.01604 [pdf, other]

doi 10.1016/j.neucom.2024.127905

An Empirical Study of Excitation and Aggregation Design Adaptions in CLIP4Clip for Video-Text Retrieval

Authors: Xiaolun Jing, Genke Yang, Jian Chu

Abstract: CLIP4Clip model transferred from the CLIP has been the de-factor standard to solve the video clip retrieval task from frame-level input, triggering the surge of CLIP4Clip-based models in the video-text retrieval domain. In this work, we rethink the inherent limitation of widely-used mean pooling operation in the frame features aggregation and investigate the adaptions of excitation and aggregation… ▽ More CLIP4Clip model transferred from the CLIP has been the de-factor standard to solve the video clip retrieval task from frame-level input, triggering the surge of CLIP4Clip-based models in the video-text retrieval domain. In this work, we rethink the inherent limitation of widely-used mean pooling operation in the frame features aggregation and investigate the adaptions of excitation and aggregation design for discriminative video representation generation. We present a novel excitationand-aggregation design, including (1) The excitation module is available for capturing non-mutuallyexclusive relationships among frame features and achieving frame-wise features recalibration, and (2) The aggregation module is applied to learn exclusiveness used for frame representations aggregation. Similarly, we employ the cascade of sequential module and aggregation design to generate discriminative video representation in the sequential type. Besides, we adopt the excitation design in the tight type to obtain representative frame features for multi-modal interaction. The proposed modules are evaluated on three benchmark datasets of MSR-VTT, ActivityNet and DiDeMo, achieving MSR-VTT (43.9 R@1), ActivityNet (44.1 R@1) and DiDeMo (31.0 R@1). They outperform the CLIP4Clip results by +1.2% (+0.5%), +4.5% (+1.9%) and +9.5% (+2.7%) relative (absolute) improvements, demonstrating the superiority of our proposed excitation and aggregation designs. We hope our work will serve as an alternative for frame representations aggregation and facilitate future research. △ Less

Submitted 8 June, 2024; v1 submitted 25 May, 2024; originally announced June 2024.

Comments: 20 pages

arXiv:2404.19360 [pdf, other]

Large Language Model Informed Patent Image Retrieval

Authors: Hao-Cheng Lo, Jung-Mei Chu, Jieh Hsiang, Chun-Chieh Cho

Abstract: In patent prosecution, image-based retrieval systems for identifying similarities between current patent images and prior art are pivotal to ensure the novelty and non-obviousness of patent applications. Despite their growing popularity in recent years, existing attempts, while effective at recognizing images within the same patent, fail to deliver practical value due to their limited generalizabi… ▽ More In patent prosecution, image-based retrieval systems for identifying similarities between current patent images and prior art are pivotal to ensure the novelty and non-obviousness of patent applications. Despite their growing popularity in recent years, existing attempts, while effective at recognizing images within the same patent, fail to deliver practical value due to their limited generalizability in retrieving relevant prior art. Moreover, this task inherently involves the challenges posed by the abstract visual features of patent images, the skewed distribution of image classifications, and the semantic information of image descriptions. Therefore, we propose a language-informed, distribution-aware multimodal approach to patent image feature learning, which enriches the semantic understanding of patent image by integrating Large Language Models and improves the performance of underrepresented classes with our proposed distribution-aware contrastive losses. Extensive experiments on DeepPatent2 dataset show that our proposed method achieves state-of-the-art or comparable performance in image-based patent retrieval with mAP +53.3%, Recall@10 +41.8%, and MRR@10 +51.9%. Furthermore, through an in-depth user analysis, we explore our model in aiding patent professionals in their image retrieval efforts, highlighting the model's real-world applicability and effectiveness. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: 8 pages. Under review

arXiv:2404.05576 [pdf, other]

Dynamic Backtracking in GFlowNets: Enhancing Decision Steps with Reward-Dependent Adjustment Mechanisms

Authors: Shuai Guo, Jielei Chu, Lei Zhu, Zhaoyu Li, Tianrui Li

Abstract: Generative Flow Networks (GFlowNets or GFNs) are probabilistic models predicated on Markov flows, and they employ specific amortization algorithms to learn stochastic policies that generate compositional substances including biomolecules, chemical materials, etc. With a strong ability to generate high-performance biochemical molecules, GFNs accelerate the discovery of scientific substances, effect… ▽ More Generative Flow Networks (GFlowNets or GFNs) are probabilistic models predicated on Markov flows, and they employ specific amortization algorithms to learn stochastic policies that generate compositional substances including biomolecules, chemical materials, etc. With a strong ability to generate high-performance biochemical molecules, GFNs accelerate the discovery of scientific substances, effectively overcoming the time-consuming, labor-intensive, and costly shortcomings of conventional material discovery methods. However, previous studies rarely focus on accumulating exploratory experience by adjusting generative structures, which leads to disorientation in complex sampling spaces. Efforts to address this issue, such as LS-GFN, are limited to local greedy searches and lack broader global adjustments. This paper introduces a novel variant of GFNs, the Dynamic Backtracking GFN (DB-GFN), which improves the adaptability of decision-making steps through a reward-based dynamic backtracking mechanism. DB-GFN allows backtracking during the network construction process according to the current state's reward value, thereby correcting disadvantageous decisions and exploring alternative pathways during the exploration process. When applied to generative tasks involving biochemical molecules and genetic material sequences, DB-GFN outperforms GFN models such as LS-GFN and GTB, as well as traditional reinforcement learning methods, in sample quality, sample exploration quantity, and training convergence speed. Additionally, owing to its orthogonal nature, DB-GFN shows great potential in future improvements of GFNs, and it can be integrated with other strategies to achieve higher search performance. △ Less

Submitted 13 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.17445 [pdf, other]

Incorporating Exponential Smoothing into MLP: A Simple but Effective Sequence Model

Authors: Jiqun Chu, Zuoquan Lin

Abstract: Modeling long-range dependencies in sequential data is a crucial step in sequence learning. A recently developed model, the Structured State Space (S4), demonstrated significant effectiveness in modeling long-range sequences. However, It is unclear whether the success of S4 can be attributed to its intricate parameterization and HiPPO initialization or simply due to State Space Models (SSMs). To f… ▽ More Modeling long-range dependencies in sequential data is a crucial step in sequence learning. A recently developed model, the Structured State Space (S4), demonstrated significant effectiveness in modeling long-range sequences. However, It is unclear whether the success of S4 can be attributed to its intricate parameterization and HiPPO initialization or simply due to State Space Models (SSMs). To further investigate the potential of the deep SSMs, we start with exponential smoothing (ETS), a simple SSM, and propose a stacked architecture by directly incorporating it into an element-wise MLP. We augment simple ETS with additional parameters and complex field to reduce the inductive bias. Despite increasing less than 1\% of parameters of element-wise MLP, our models achieve comparable results to S4 on the LRA benchmark. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 12 pages, 5 tables, 3 figures

arXiv:2403.13347 [pdf, other]

vid-TLDR: Training Free Token merging for Light-weight Video Transformer

Authors: Joonmyung Choi, Sanghyeok Lee, Jaewon Chu, Minhyuk Choi, Hyunwoo J. Kim

Abstract: Video Transformers have become the prevalent solution for various video downstream tasks with superior expressive power and flexibility. However, these video transformers suffer from heavy computational costs induced by the massive number of tokens across the entire video frames, which has been the major barrier to training the model. Further, the patches irrelevant to the main contents, e.g., bac… ▽ More Video Transformers have become the prevalent solution for various video downstream tasks with superior expressive power and flexibility. However, these video transformers suffer from heavy computational costs induced by the massive number of tokens across the entire video frames, which has been the major barrier to training the model. Further, the patches irrelevant to the main contents, e.g., backgrounds, degrade the generalization performance of models. To tackle these issues, we propose training free token merging for lightweight video Transformer (vid-TLDR) that aims to enhance the efficiency of video Transformers by merging the background tokens without additional training. For vid-TLDR, we introduce a novel approach to capture the salient regions in videos only with the attention map. Further, we introduce the saliency-aware token merging strategy by dropping the background tokens and sharpening the object scores. Our experiments show that vid-TLDR significantly mitigates the computational complexity of video Transformers while achieving competitive performance compared to the base model without vid-TLDR. Code is available at https://github.com/mlvlab/vid-TLDR. △ Less

Submitted 30 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: Conference on Computer Vision and Pattern Recognition (CVPR), 2024

arXiv:2402.15119 [pdf]

A multidisciplinary framework for deconstructing bots' pluripotency in dualistic antagonism

Authors: Wentao Xu, Kazutoshi Sasahara, Jianxun Chu, Bin Wang, Wenlu Fan, Zhiwen Hu

Abstract: Anthropomorphic social bots are engineered to emulate human verbal communication and generate toxic or inflammatory content across social networking services (SNSs). Bot-disseminated misinformation could subtly yet profoundly reshape societal processes by complexly interweaving factors like repeated disinformation exposure, amplified political polarization, compromised indicators of democratic hea… ▽ More Anthropomorphic social bots are engineered to emulate human verbal communication and generate toxic or inflammatory content across social networking services (SNSs). Bot-disseminated misinformation could subtly yet profoundly reshape societal processes by complexly interweaving factors like repeated disinformation exposure, amplified political polarization, compromised indicators of democratic health, shifted perceptions of national identity, propagation of false social norms, and manipulation of collective memory over time. However, extrapolating bots' pluripotency across hybridized, multilingual, and heterogeneous media ecologies from isolated SNS analyses remains largely unknown, underscoring the need for a comprehensive framework to characterise bots' emergent risks to civic discourse. Here we propose an interdisciplinary framework to characterise bots' pluripotency, incorporating quantification of influence, network dynamics monitoring, and interlingual feature analysis. When applied to the geopolitical discourse around the Russo-Ukrainian conflict, results from interlanguage toxicity profiling and network analysis elucidated spatiotemporal trajectories of pro-Russian and pro-Ukrainian human and bots across hybrid SNSs. Weaponized bots predominantly inhabited X, while human primarily populated Reddit in the social media warfare. This rigorous framework promises to elucidate interlingual homogeneity and heterogeneity in bots' pluripotent behaviours, revealing synergistic human-bot mechanisms underlying regimes of information manipulation, echo chamber formation, and collective memory manifestation in algorithmically structured societies. △ Less

Submitted 11 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

ACM Class: J.4

arXiv:2402.11231 [pdf]

Enhancing Security in Blockchain Networks: Anomalies, Frauds, and Advanced Detection Techniques

Authors: Joerg Osterrieder, Stephen Chan, Jeffrey Chu, Yuanyuan Zhang, Branka Hadji Misheva, Codruta Mare

Abstract: Blockchain technology, a foundational distributed ledger system, enables secure and transparent multi-party transactions. Despite its advantages, blockchain networks are susceptible to anomalies and frauds, posing significant risks to their integrity and security. This paper offers a detailed examination of blockchain's key definitions and properties, alongside a thorough analysis of the various a… ▽ More Blockchain technology, a foundational distributed ledger system, enables secure and transparent multi-party transactions. Despite its advantages, blockchain networks are susceptible to anomalies and frauds, posing significant risks to their integrity and security. This paper offers a detailed examination of blockchain's key definitions and properties, alongside a thorough analysis of the various anomalies and frauds that undermine these networks. It describes an array of detection and prevention strategies, encompassing statistical and machine learning methods, game-theoretic solutions, digital forensics, reputation-based systems, and comprehensive risk assessment techniques. Through case studies, we explore practical applications of anomaly and fraud detection in blockchain networks, extracting valuable insights and implications for both current practice and future research. Moreover, we spotlight emerging trends and challenges within the field, proposing directions for future investigation and technological development. Aimed at both practitioners and researchers, this paper seeks to provide a technical, in-depth overview of anomaly and fraud detection within blockchain networks, marking a significant step forward in the search for enhanced network security and reliability. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.09846 [pdf]

doi 10.1029/2020EA001340

A Deep Learning Approach to Radar-based QPE

Authors: Ting-Shuo Yo, Shih-Hao Su, Jung-Lien Chu, Chiao-Wei Chang, Hung-Chi Kuo

Abstract: In this study, we propose a volume-to-point framework for quantitative precipitation estimation (QPE) based on the Quantitative Precipitation Estimation and Segregation Using Multiple Sensor (QPESUMS) Mosaic Radar data set. With a data volume consisting of the time series of gridded radar reflectivities over the Taiwan area, we used machine learning algorithms to establish a statistical model for… ▽ More In this study, we propose a volume-to-point framework for quantitative precipitation estimation (QPE) based on the Quantitative Precipitation Estimation and Segregation Using Multiple Sensor (QPESUMS) Mosaic Radar data set. With a data volume consisting of the time series of gridded radar reflectivities over the Taiwan area, we used machine learning algorithms to establish a statistical model for QPE in weather stations. The model extracts spatial and temporal features from the input data volume and then associates these features with the location-specific precipitations. In contrast to QPE methods based on the Z-R relation, we leverage the machine learning algorithms to automatically detect the evolution and movement of weather systems and associate these patterns to a location with specific topographic attributes. Specifically, we evaluated this framework with the hourly precipitation data of 45 weather stations in Taipei during 2013-2016. In comparison to the operational QPE scheme used by the Central Weather Bureau, the volume-to-point framework performed comparably well in general cases and excelled in detecting heavy-rainfall events. By using the current results as the reference benchmark, the proposed method can integrate the heterogeneous data sources and potentially improve the forecast in extreme precipitation scenarios. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 22 pages, 11 figures. Published in Earth and Space Science

Journal ref: Earth Space Sci. 2021, 8, e2020EA001340

arXiv:2402.06938 [pdf, other]

Efficient Resource Scheduling for Distributed Infrastructures Using Negotiation Capabilities

Authors: Junjie Chu, Prashant Singh, Salman Toor

Abstract: In the past few decades, the rapid development of information and internet technologies has spawned massive amounts of data and information. The information explosion drives many enterprises or individuals to seek to rent cloud computing infrastructure to put their applications in the cloud. However, the agreements reached between cloud computing providers and clients are often not efficient. Many… ▽ More In the past few decades, the rapid development of information and internet technologies has spawned massive amounts of data and information. The information explosion drives many enterprises or individuals to seek to rent cloud computing infrastructure to put their applications in the cloud. However, the agreements reached between cloud computing providers and clients are often not efficient. Many factors affect the efficiency, such as the idleness of the providers' cloud computing infrastructure, and the additional cost to the clients. One possible solution is to introduce a comprehensive, bargaining game (a type of negotiation), and schedule resources according to the negotiation results. We propose an agent-based auto-negotiation system for resource scheduling based on fuzzy logic. The proposed method can complete a one-to-one auto-negotiation process and generate optimal offers for the provider and client. We compare the impact of different member functions, fuzzy rule sets, and negotiation scenario cases on the offers to optimize the system. It can be concluded that our proposed method can utilize resources more efficiently and is interpretable, highly flexible, and customizable. We successfully train machine learning models to replace the fuzzy negotiation system to improve processing speed. The article also highlights possible future improvements to the proposed system and machine learning models. All the codes and data are available in the open-source repository. △ Less

Submitted 13 February, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

Comments: Accepted in IEEE CLOUD 2023. 13 pages, 5 figures

arXiv:2402.05668 [pdf, other]

Comprehensive Assessment of Jailbreak Attacks Against LLMs

Authors: Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, Yang Zhang

Abstract: Misuse of the Large Language Models (LLMs) has raised widespread concern. To address this issue, safeguards have been taken to ensure that LLMs align with social ethics. However, recent findings have revealed an unsettling vulnerability bypassing the safeguards of LLMs, known as jailbreak attacks. By applying techniques, such as employing role-playing scenarios, adversarial examples, or subtle sub… ▽ More Misuse of the Large Language Models (LLMs) has raised widespread concern. To address this issue, safeguards have been taken to ensure that LLMs align with social ethics. However, recent findings have revealed an unsettling vulnerability bypassing the safeguards of LLMs, known as jailbreak attacks. By applying techniques, such as employing role-playing scenarios, adversarial examples, or subtle subversion of safety objectives as a prompt, LLMs can produce an inappropriate or even harmful response. While researchers have studied several categories of jailbreak attacks, they have done so in isolation. To fill this gap, we present the first large-scale measurement of various jailbreak attack methods. We concentrate on 13 cutting-edge jailbreak methods from four categories, 160 questions from 16 violation categories, and six popular LLMs. Our extensive experimental results demonstrate that the optimized jailbreak prompts consistently achieve the highest attack success rates, as well as exhibit robustness across different LLMs. Some jailbreak prompt datasets, available from the Internet, can also achieve high attack success rates on many LLMs, such as ChatGLM3, GPT-3.5, and PaLM2. Despite the claims from many organizations regarding the coverage of violation categories in their policies, the attack success rates from these categories remain high, indicating the challenges of effectively aligning LLM policies and the ability to counter jailbreak attacks. We also discuss the trade-off between the attack performance and efficiency, as well as show that the transferability of the jailbreak prompts is still viable, becoming an option for black-box models. Overall, our research highlights the necessity of evaluating different jailbreak methods. We hope our study can provide insights for future research on jailbreak attacks and serve as a benchmark tool for evaluating them for practitioners. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 18 pages, 12 figures

arXiv:2402.02987 [pdf, other]

Conversation Reconstruction Attack Against GPT Models

Authors: Junjie Chu, Zeyang Sha, Michael Backes, Yang Zhang

Abstract: In recent times, significant advancements have been made in the field of large language models (LLMs), represented by GPT series models. To optimize task execution, users often engage in multi-round conversations with GPT models hosted in cloud environments. These multi-round conversations, potentially replete with private information, require transmission and storage within the cloud. However, th… ▽ More In recent times, significant advancements have been made in the field of large language models (LLMs), represented by GPT series models. To optimize task execution, users often engage in multi-round conversations with GPT models hosted in cloud environments. These multi-round conversations, potentially replete with private information, require transmission and storage within the cloud. However, this operational paradigm introduces additional attack surfaces. In this paper, we first introduce a specific Conversation Reconstruction Attack targeting GPT models. Our introduced Conversation Reconstruction Attack is composed of two steps: hijacking a session and reconstructing the conversations. Subsequently, we offer an exhaustive evaluation of the privacy risks inherent in conversations when GPT models are subjected to the proposed attack. However, GPT-4 demonstrates certain robustness to the proposed attacks. We then introduce two advanced attacks aimed at better reconstructing previous conversations, specifically the UNR attack and the PBU attack. Our experimental findings indicate that the PBU attack yields substantial performance across all models, achieving semantic similarity scores exceeding 0.60, while the UNR attack is effective solely on GPT-3.5. Our results reveal the concern about privacy risks associated with conversations involving GPT models and aim to draw the community's attention to prevent the potential misuse of these models' remarkable capabilities. We will responsibly disclose our findings to the suppliers of related large language models. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 17 pages, 11 figures

arXiv:2402.00421 [pdf, other]

From PARIS to LE-PARIS: Toward Patent Response Automation with Recommender Systems and Collaborative Large Language Models

Authors: Jung-Mei Chu, Hao-Cheng Lo, Jieh Hsiang, Chun-Chieh Cho

Abstract: In patent prosecution, timely and effective responses to Office Actions (OAs) are crucial for securing patents. However, past automation and artificial intelligence research have largely overlooked this aspect. To bridge this gap, our study introduces the Patent Office Action Response Intelligence System (PARIS) and its advanced version, the Large Language Model (LLM) Enhanced PARIS (LE-PARIS). Th… ▽ More In patent prosecution, timely and effective responses to Office Actions (OAs) are crucial for securing patents. However, past automation and artificial intelligence research have largely overlooked this aspect. To bridge this gap, our study introduces the Patent Office Action Response Intelligence System (PARIS) and its advanced version, the Large Language Model (LLM) Enhanced PARIS (LE-PARIS). These systems are designed to enhance the efficiency of patent attorneys in handling OA responses through collaboration with AI. The systems' key features include the construction of an OA Topics Database, development of Response Templates, and implementation of Recommender Systems and LLM-based Response Generation. To validate the effectiveness of the systems, we have employed a multi-paradigm analysis using the USPTO Office Action database and longitudinal data based on attorney interactions with our systems over six years. Through five studies, we have examined the constructiveness of OA topics (studies 1 and 2) using topic modeling and our proposed Delphi process, the efficacy of our proposed hybrid LLM-based recommender system tailored for OA responses (study 3), the quality of generated responses (study 4), and the systems' practical value in real-world scenarios through user studies (study 5). The results indicate that both PARIS and LE-PARIS significantly achieve key metrics and have a positive impact on attorney performance. △ Less

Submitted 4 March, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: 28 pages, 5 figures, typos corrected, references added, under review

arXiv:2311.16207 [pdf, other]

The Graph Convolutional Network with Multi-representation Alignment for Drug Synergy Prediction

Authors: Xinxing Yang, Genke Yang, Jian Chu

Abstract: Drug combination refers to the use of two or more drugs to treat a specific disease at the same time. It is currently the mainstream way to treat complex diseases. Compared with single drugs, drug combinations have better efficacy and can better inhibit toxicity and drug resistance. The computational model based on deep learning concatenates the representation of multiple drugs and the correspondi… ▽ More Drug combination refers to the use of two or more drugs to treat a specific disease at the same time. It is currently the mainstream way to treat complex diseases. Compared with single drugs, drug combinations have better efficacy and can better inhibit toxicity and drug resistance. The computational model based on deep learning concatenates the representation of multiple drugs and the corresponding cell line feature as input, and the output is whether the drug combination can have an inhibitory effect on the cell line. However, this strategy of concatenating multiple representations has the following defects: the alignment of drug representation and cell line representation is ignored, resulting in the synergistic relationship not being reflected positionally in the embedding space. Moreover, the alignment measurement function in deep learning cannot be suitable for drug synergy prediction tasks due to differences in input types. Therefore, in this work, we propose a graph convolutional network with multi-representation alignment (GCNMRA) for predicting drug synergy. In the GCNMRA model, we designed a multi-representation alignment function suitable for the drug synergy prediction task so that the positional relationship between drug representations and cell line representation is reflected in the embedding space. In addition, the vector modulus of drug representations and cell line representation is considered to improve the accuracy of calculation results and accelerate model convergence. Finally, many relevant experiments were run on multiple drug synergy datasets to verify the effectiveness of the above innovative elements and the excellence of the GCNMRA model. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 14 pages;

arXiv:2310.20258 [pdf, other]

Advancing Bayesian Optimization via Learning Correlated Latent Space

Authors: Seunghun Lee, Jaewon Chu, Sihyeon Kim, Juyeon Ko, Hyunwoo J. Kim

Abstract: Bayesian optimization is a powerful method for optimizing black-box functions with limited function evaluations. Recent works have shown that optimization in a latent space through deep generative models such as variational autoencoders leads to effective and efficient Bayesian optimization for structured or discrete data. However, as the optimization does not take place in the input space, it lea… ▽ More Bayesian optimization is a powerful method for optimizing black-box functions with limited function evaluations. Recent works have shown that optimization in a latent space through deep generative models such as variational autoencoders leads to effective and efficient Bayesian optimization for structured or discrete data. However, as the optimization does not take place in the input space, it leads to an inherent gap that results in potentially suboptimal solutions. To alleviate the discrepancy, we propose Correlated latent space Bayesian Optimization (CoBO), which focuses on learning correlated latent spaces characterized by a strong correlation between the distances in the latent space and the distances within the objective function. Specifically, our method introduces Lipschitz regularization, loss weighting, and trust region recoordination to minimize the inherent gap around the promising areas. We demonstrate the effectiveness of our approach on several optimization tasks in discrete data, such as molecule design and arithmetic expression fitting, and achieve high performance within a small budget. △ Less

Submitted 19 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.15484 [pdf, other]

NuTrea: Neural Tree Search for Context-guided Multi-hop KGQA

Authors: Hyeong Kyu Choi, Seunghun Lee, Jaewon Chu, Hyunwoo J. Kim

Abstract: Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves retrieving nodes from a knowledge graph (KG) to answer natural language questions. Recent GNN-based approaches formulate this task as a KG path searching problem, where messages are sequentially propagated from the seed node towards the answer nodes. However, these messages are past-oriented, and they do not consider the f… ▽ More Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves retrieving nodes from a knowledge graph (KG) to answer natural language questions. Recent GNN-based approaches formulate this task as a KG path searching problem, where messages are sequentially propagated from the seed node towards the answer nodes. However, these messages are past-oriented, and they do not consider the full KG context. To make matters worse, KG nodes often represent proper noun entities and are sometimes encrypted, being uninformative in selecting between paths. To address these problems, we propose Neural Tree Search (NuTrea), a tree search-based GNN model that incorporates the broader KG context. Our model adopts a message-passing scheme that probes the unreached subtree regions to boost the past-oriented embeddings. In addition, we introduce the Relation Frequency-Inverse Entity Frequency (RF-IEF) node embedding that considers the global KG context to better characterize ambiguous KG nodes. The general effectiveness of our approach is demonstrated through experiments on three major multi-hop KGQA benchmark datasets, and our extensive analyses further validate its expressiveness and robustness. Overall, NuTrea provides a powerful means to query the KG with complex natural language questions. Code is available at https://github.com/mlvlab/NuTrea. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Neural Information Processing Systems (NeurIPS) 2023

arXiv:2310.08984 [pdf, other]

UniParser: Multi-Human Parsing with Unified Correlation Representation Learning

Authors: Jiaming Chu, Lei Jin, Junliang Xing, Jian Zhao

Abstract: Multi-human parsing is an image segmentation task necessitating both instance-level and fine-grained category-level information. However, prior research has typically processed these two types of information through separate branches and distinct output formats, leading to inefficient and redundant frameworks. This paper introduces UniParser, which integrates instance-level and category-level repr… ▽ More Multi-human parsing is an image segmentation task necessitating both instance-level and fine-grained category-level information. However, prior research has typically processed these two types of information through separate branches and distinct output formats, leading to inefficient and redundant frameworks. This paper introduces UniParser, which integrates instance-level and category-level representations in three key aspects: 1) we propose a unified correlation representation learning approach, allowing our network to learn instance and category features within the cosine space; 2) we unify the form of outputs of each modules as pixel-level segmentation results while supervising instance and category features using a homogeneous label accompanied by an auxiliary loss; and 3) we design a joint optimization procedure to fuse instance and category representations. By virtual of unifying instance-level and category-level output, UniParser circumvents manually designed post-processing techniques and surpasses state-of-the-art methods, achieving 49.3% AP on MHPv2.0 and 60.4% AP on CIHP. We will release our source code, pretrained models, and online demos to facilitate future studies. △ Less

Submitted 19 May, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

arXiv:2309.10508 [pdf, ps, other]

Enhanced C-V2X Mode 4 to Optimize Age of Information and Reliability for IoV

Authors: Jiahou Chu, Qiong Wu, Qiang Fan, Zhengquan Li

Abstract: Internet of vehicles (IoV) has emerged as a key technology to realize real-time vehicular application. For IoV, vehicles adopt cellular vehicle-to-everything (C-V2X) standard to support direct communication among them. C-V2X mode 4 controls resource allocation without the assistance of cellular network, hence it is widely used for IoV. However, C-V2X mode 4 has two drawbacks. First is that vehicle… ▽ More Internet of vehicles (IoV) has emerged as a key technology to realize real-time vehicular application. For IoV, vehicles adopt cellular vehicle-to-everything (C-V2X) standard to support direct communication among them. C-V2X mode 4 controls resource allocation without the assistance of cellular network, hence it is widely used for IoV. However, C-V2X mode 4 has two drawbacks. First is that vehicles cannot communicate with each other for a period in some case which will cause an increase in age of information (AoI); second is that vehicles may select resource already occupied by others which will deteriorate the reliability. To address the two drawbacks, we propose an enhanced C-V2X mode 4 to optimize AoI and reliability. In addition, we consider the fact that for most vehicular applications, each vehicle periodically requires fresh information of vehicles within a certain distance and propose a new performance metric to evaluate the system AoI for IoV. Furthermore, we construct a platform through integrating SUMO and NS3. We demonstrate the superiority of the enhanced C-V2X mode 4 base on this simulation platform. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: This paper has been accpeted by ICCT 2023. The source code can be found at https://github.com/qiongwu86/ns3 sumo cv2x mode4.git

arXiv:2309.04737 [pdf, other]

Learning Spiking Neural Network from Easy to Hard task

Authors: Lingling Tang, Jiangtao Hu, Hua Yu, Surui Liu, Jielei Chu

Abstract: Starting with small and simple concepts, and gradually introducing complex and difficult concepts is the natural process of human learning. Spiking Neural Networks (SNNs) aim to mimic the way humans process information, but current SNNs models treat all samples equally, which does not align with the principles of human learning and overlooks the biological plausibility of SNNs. To address this, we… ▽ More Starting with small and simple concepts, and gradually introducing complex and difficult concepts is the natural process of human learning. Spiking Neural Networks (SNNs) aim to mimic the way humans process information, but current SNNs models treat all samples equally, which does not align with the principles of human learning and overlooks the biological plausibility of SNNs. To address this, we propose a CL-SNN model that introduces Curriculum Learning(CL) into SNNs, making SNNs learn more like humans and providing higher biological interpretability. CL is a training strategy that advocates presenting easier data to models before gradually introducing more challenging data, mimicking the human learning process. We use a confidence-aware loss to measure and process the samples with different difficulty levels. By learning the confidence of different samples, the model reduces the contribution of difficult samples to parameter optimization automatically. We conducted experiments on static image datasets MNIST, Fashion-MNIST, CIFAR10, and neuromorphic datasets N-MNIST, CIFAR10-DVS, DVS-Gesture. The results are promising. To our best knowledge, this is the first proposal to enhance the biologically plausibility of SNNs by introducing CL. △ Less

Submitted 25 September, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

arXiv:2308.09363 [pdf, other]

Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models

Authors: Dohwan Ko, Ji Soo Lee, Miso Choi, Jaewon Chu, Jihwan Park, Hyunwoo J. Kim

Abstract: Video Question Answering (VideoQA) is a challenging task that entails complex multi-modal reasoning. In contrast to multiple-choice VideoQA which aims to predict the answer given several options, the goal of open-ended VideoQA is to answer questions without restricting candidate answers. However, the majority of previous VideoQA models formulate open-ended VideoQA as a classification task to class… ▽ More Video Question Answering (VideoQA) is a challenging task that entails complex multi-modal reasoning. In contrast to multiple-choice VideoQA which aims to predict the answer given several options, the goal of open-ended VideoQA is to answer questions without restricting candidate answers. However, the majority of previous VideoQA models formulate open-ended VideoQA as a classification task to classify the video-question pairs into a fixed answer set, i.e., closed-vocabulary, which contains only frequent answers (e.g., top-1000 answers). This leads the model to be biased toward only frequent answers and fail to generalize on out-of-vocabulary answers. We hence propose a new benchmark, Open-vocabulary Video Question Answering (OVQA), to measure the generalizability of VideoQA models by considering rare and unseen answers. In addition, in order to improve the model's generalization power, we introduce a novel GNN-based soft verbalizer that enhances the prediction on rare and unseen answers by aggregating the information from their similar words. For evaluation, we introduce new baselines by modifying the existing (closed-vocabulary) open-ended VideoQA models and improve their performances by further taking into account rare and unseen answers. Our ablation studies and qualitative analyses demonstrate that our GNN-based soft verbalizer further improves the model performance, especially on rare and unseen answers. We hope that our benchmark OVQA can serve as a guide for evaluating the generalizability of VideoQA models and inspire future research. Code is available at https://github.com/mlvlab/OVQA. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: Accepted paper at ICCV 2023

arXiv:2307.08989 [pdf, other]

GraphCL-DTA: a graph contrastive learning with molecular semantics for drug-target binding affinity prediction

Authors: Xinxing Yang, Genke Yang, Jian Chu

Abstract: Drug-target binding affinity prediction plays an important role in the early stages of drug discovery, which can infer the strength of interactions between new drugs and new targets. However, the performance of previous computational models is limited by the following drawbacks. The learning of drug representation relies only on supervised data, without taking into account the information containe… ▽ More Drug-target binding affinity prediction plays an important role in the early stages of drug discovery, which can infer the strength of interactions between new drugs and new targets. However, the performance of previous computational models is limited by the following drawbacks. The learning of drug representation relies only on supervised data, without taking into account the information contained in the molecular graph itself. Moreover, most previous studies tended to design complicated representation learning module, while uniformity, which is used to measure representation quality, is ignored. In this study, we propose GraphCL-DTA, a graph contrastive learning with molecular semantics for drug-target binding affinity prediction. In GraphCL-DTA, we design a graph contrastive learning framework for molecular graphs to learn drug representations, so that the semantics of molecular graphs are preserved. Through this graph contrastive framework, a more essential and effective drug representation can be learned without additional supervised data. Next, we design a new loss function that can be directly used to smoothly adjust the uniformity of drug and target representations. By directly optimizing the uniformity of representations, the representation quality of drugs and targets can be improved. The effectiveness of the above innovative elements is verified on two real datasets, KIBA and Davis. The excellent performance of GraphCL-DTA on the above datasets suggests its superiority to the state-of-the-art model. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: 13 pages, 4 figures, 5 tables

arXiv:2306.14320 [pdf, other]

doi 10.1109/HPEC55821.2022.9926408

Im2win: Memory Efficient Convolution On SIMD Architectures

Authors: Shuai Lu, Jun Chu, Xu T. Liu

Abstract: Convolution is the most expensive operation among neural network operations, thus its performance is critical to the overall performance of neural networks. Commonly used convolution approaches, including general matrix multiplication (GEMM)-based convolution and direct convolution, rely on im2col for data transformation or do not use data transformation at all, respectively. However, the im2col d… ▽ More Convolution is the most expensive operation among neural network operations, thus its performance is critical to the overall performance of neural networks. Commonly used convolution approaches, including general matrix multiplication (GEMM)-based convolution and direct convolution, rely on im2col for data transformation or do not use data transformation at all, respectively. However, the im2col data transformation can lead to at least 2$\times$ memory footprint compared to not using data transformation at all, thus limiting the size of neural network models running on memory-limited systems. Meanwhile, not using data transformation usually performs poorly due to nonconsecutive memory access although it consumes less memory. To solve those problems, we propose a new memory-efficient data transformation algorithm, called im2win. This algorithm refactorizes a row of square or rectangle dot product windows of the input image and flattens unique elements within these windows into a row in the output tensor, which enables consecutive memory access and data reuse, and thus greatly reduces the memory overhead. Furthermore, we propose a high-performance im2win-based convolution algorithm with various optimizations, including vectorization, loop reordering, etc. Our experimental results show that our algorithm reduces the memory overhead by average to 41.6% compared to the PyTorch's convolution implementation based on im2col, and achieves average to 3.6$\times$ and 5.3$\times$ speedup in performance compared to the im2col-based convolution and not using data transformation, respectively. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: Published at "2022 IEEE High Performance Extreme Computing Conference (HPEC)"

ACM Class: I.2.10

Journal ref: 2022 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 2022, pp. 1-7

arXiv:2306.14316 [pdf, other]

Im2win: An Efficient Convolution Paradigm on GPU

Authors: Shuai Lu, Jun Chu, Luanzheng Guo, Xu T. Liu

Abstract: Convolution is the most time-consuming operation in deep neural network operations, so its performance is critical to the overall performance of the neural network. The commonly used methods for convolution on GPU include the general matrix multiplication (GEMM)-based convolution and the direct convolution. GEMM-based convolution relies on the im2col algorithm, which results in a large memory foot… ▽ More Convolution is the most time-consuming operation in deep neural network operations, so its performance is critical to the overall performance of the neural network. The commonly used methods for convolution on GPU include the general matrix multiplication (GEMM)-based convolution and the direct convolution. GEMM-based convolution relies on the im2col algorithm, which results in a large memory footprint and reduced performance. Direct convolution does not have the large memory footprint problem, but the performance is not on par with GEMM-based approach because of the discontinuous memory access. This paper proposes a window-order-based convolution paradigm on GPU, called im2win, which not only reduces memory footprint but also offers continuous memory accesses, resulting in improved performance. Furthermore, we apply a range of optimization techniques on the convolution CUDA kernel, including shared memory, tiling, micro-kernel, double buffer, and prefetching. We compare our implementation with the direct convolution, and PyTorch's GEMM-based convolution with cuBLAS and six cuDNN-based convolution implementations, with twelve state-of-the-art DNN benchmarks. The experimental results show that our implementation 1) uses less memory footprint by 23.1% and achieves 3.5$\times$ TFLOPS compared with cuBLAS, 2) uses less memory footprint by 32.8% and achieves up to 1.8$\times$ TFLOPS compared with the best performant convolutions in cuDNN, and 3) achieves up to 155$\times$ TFLOPS compared with the direct convolution. We further perform an ablation study on the applied optimization techniques and find that the micro-kernel has the greatest positive impact on performance. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: Accepted at "29th International European conference on parallel and distributed computing (Euro-Par'2023)"

ACM Class: I.2.10

arXiv:2305.07290 [pdf, other]

The 3rd Anti-UAV Workshop & Challenge: Methods and Results

Authors: Jian Zhao, Jianan Li, Lei Jin, Jiaming Chu, Zhihao Zhang, Jun Wang, Jiangqiang Xia, Kai Wang, Yang Liu, Sadaf Gulshad, Jiaojiao Zhao, Tianyang Xu, Xuefeng Zhu, Shihan Liu, Zheng Zhu, Guibo Zhu, Zechao Li, Zheng Wang, Baigui Sun, Yandong Guo, Shin ichi Satoh, Junliang Xing, Jane Shen Shengmei

Abstract: The 3rd Anti-UAV Workshop & Challenge aims to encourage research in developing novel and accurate methods for multi-scale object tracking. The Anti-UAV dataset used for the Anti-UAV Challenge has been publicly released. There are two main differences between this year's competition and the previous two. First, we have expanded the existing dataset, and for the first time, released a training set s… ▽ More The 3rd Anti-UAV Workshop & Challenge aims to encourage research in developing novel and accurate methods for multi-scale object tracking. The Anti-UAV dataset used for the Anti-UAV Challenge has been publicly released. There are two main differences between this year's competition and the previous two. First, we have expanded the existing dataset, and for the first time, released a training set so that participants can focus on improving their models. Second, we set up two tracks for the first time, i.e., Anti-UAV Tracking and Anti-UAV Detection & Tracking. Around 76 participating teams from the globe competed in the 3rd Anti-UAV Challenge. In this paper, we provide a brief summary of the 3rd Anti-UAV Workshop & Challenge including brief introductions to the top three methods in each track. The submission leaderboard will be reopened for researchers that are interested in the Anti-UAV challenge. The benchmark dataset and other information can be found at: https://anti-uav.github.io/. △ Less

Submitted 15 July, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

Comments: Technical report for 3rd Anti-UAV Workshop and Challenge. arXiv admin note: text overlap with arXiv:2108.09909

arXiv:2304.11356 [pdf, other]

doi 10.1145/3581783.3611993

Single-stage Multi-human Parsing via Point Sets and Center-based Offsets

Authors: Jiaming Chu, Lei Jin, Junliang Xing, Jian Zhao

Abstract: This work studies the multi-human parsing problem. Existing methods, either following top-down or bottom-up two-stage paradigms, usually involve expensive computational costs. We instead present a high-performance Single-stage Multi-human Parsing (SMP) deep architecture that decouples the multi-human parsing problem into two fine-grained sub-problems, i.e., locating the human body and parts. SMP l… ▽ More This work studies the multi-human parsing problem. Existing methods, either following top-down or bottom-up two-stage paradigms, usually involve expensive computational costs. We instead present a high-performance Single-stage Multi-human Parsing (SMP) deep architecture that decouples the multi-human parsing problem into two fine-grained sub-problems, i.e., locating the human body and parts. SMP leverages the point features in the barycenter positions to obtain their segmentation and then generates a series of offsets from the barycenter of the human body to the barycenters of parts, thus performing human body and parts matching without the grouping process. Within the SMP architecture, we propose a Refined Feature Retain module to extract the global feature of instances through generated mask attention and a Mask of Interest Reclassify module as a trainable plug-in module to refine the classification results with the predicted segmentation. Extensive experiments on the MHPv2.0 dataset demonstrate the best effectiveness and efficiency of the proposed method, surpassing the state-of-the-art method by 2.1% in AP50p, 1.0% in APvolp, and 1.2% in PCP50. In particular, the proposed method requires fewer training epochs and a less complex model architecture. We will release our source codes, pretrained models, and online demos to facilitate further studies. △ Less

Submitted 22 April, 2023; originally announced April 2023.

arXiv:2302.10301 [pdf, other]

Artificial Intelligence System for Detection and Screening of Cardiac Abnormalities using Electrocardiogram Images

Authors: Deyun Zhang, Shijia Geng, Yang Zhou, Weilun Xu, Guodong Wei, Kai Wang, Jie Yu, Qiang Zhu, Yongkui Li, Yonghong Zhao, Xingyue Chen, Rui Zhang, Zhaoji Fu, Rongbo Zhou, Yanqi E, Sumei Fan, Qinghao Zhao, Chuandong Cheng, Nan Peng, Liang Zhang, Linlin Zheng, Jianjun Chu, Hongbin Xu, Chen Tan, Jian Liu , et al. (6 additional authors not shown)

Abstract: The artificial intelligence (AI) system has achieved expert-level performance in electrocardiogram (ECG) signal analysis. However, in underdeveloped countries or regions where the healthcare information system is imperfect, only paper ECGs can be provided. Analysis of real-world ECG images (photos or scans of paper ECGs) remains challenging due to complex environments or interference. In this stud… ▽ More The artificial intelligence (AI) system has achieved expert-level performance in electrocardiogram (ECG) signal analysis. However, in underdeveloped countries or regions where the healthcare information system is imperfect, only paper ECGs can be provided. Analysis of real-world ECG images (photos or scans of paper ECGs) remains challenging due to complex environments or interference. In this study, we present an AI system developed to detect and screen cardiac abnormalities (CAs) from real-world ECG images. The system was evaluated on a large dataset of 52,357 patients from multiple regions and populations across the world. On the detection task, the AI system obtained area under the receiver operating curve (AUC) of 0.996 (hold-out test), 0.994 (external test 1), 0.984 (external test 2), and 0.979 (external test 3), respectively. Meanwhile, the detection results of AI system showed a strong correlation with the diagnosis of cardiologists (cardiologist 1 (R=0.794, p<1e-3), cardiologist 2 (R=0.812, p<1e-3)). On the screening task, the AI system achieved AUCs of 0.894 (hold-out test) and 0.850 (external test). The screening performance of the AI system was better than that of the cardiologists (AI system (0.846) vs. cardiologist 1 (0.520) vs. cardiologist 2 (0.480)). Our study demonstrates the feasibility of an accurate, objective, easy-to-use, fast, and low-cost AI system for CA detection and screening. The system has the potential to be used by healthcare professionals, caregivers, and general users to assess CAs based on real-world ECG images. △ Less

Submitted 10 February, 2023; originally announced February 2023.

Comments: 47 pages, 29 figures

arXiv:2301.06448 [pdf, other]

The Balanced Matrix Factorization for Computational Drug Repositioning

Authors: Xinxing Yang, Genke Yang, Jian Chu

Abstract: Computational drug repositioning aims to discover new uses of drugs that have been marketed. However, the existing models suffer from the following limitations. Firstly, in the real world, only a minority of diseases have definite treatment drugs. This leads to an imbalance in the proportion of validated drug-disease associations (positive samples) and unvalidated drug-disease associations (negati… ▽ More Computational drug repositioning aims to discover new uses of drugs that have been marketed. However, the existing models suffer from the following limitations. Firstly, in the real world, only a minority of diseases have definite treatment drugs. This leads to an imbalance in the proportion of validated drug-disease associations (positive samples) and unvalidated drug-disease associations (negative samples), which disrupts the optimization gradient of the model. Secondly, the existing drug representation does not take into account the behavioral information of the drug, resulting in its inability to comprehensively model the latent feature of the drug. In this work, we propose a balanced matrix factorization with embedded behavior information (BMF) for computational drug repositioning to address the above-mentioned shortcomings. Specifically, in the BMF model, we propose a novel balanced contrastive loss (BCL) to optimize the category imbalance problem in computational drug repositioning. The BCL optimizes the parameters in the model by maximizing the similarity between the target drug and positive disease, and minimizing the similarity between the target drug and negative disease below the margin. In addition, we designed a method to enhance drug representation using its behavioral information. The comprehensive experiments on three computational drug repositioning datasets validate the effectiveness of the above improvement points. And the superiority of BMF model is demonstrated by experimental comparison with seven benchmark models. △ Less

Submitted 16 January, 2023; originally announced January 2023.

arXiv:2210.02365 [pdf, other]

doi 10.1145/3552437.3558545

SoccerNet 2022 Challenges Results

Authors: Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao , et al. (69 additional authors not shown)

Abstract: The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team. In 2022, the challenges were composed of 6 vision-based tasks: (1) action spotting, focusing on retrieving action timestamps in long untrimmed videos, (2) replay grounding, focusing on retrieving the live moment of an action shown in a replay, (3) pitch localization, focusing on det… ▽ More The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team. In 2022, the challenges were composed of 6 vision-based tasks: (1) action spotting, focusing on retrieving action timestamps in long untrimmed videos, (2) replay grounding, focusing on retrieving the live moment of an action shown in a replay, (3) pitch localization, focusing on detecting line and goal part elements, (4) camera calibration, dedicated to retrieving the intrinsic and extrinsic camera parameters, (5) player re-identification, focusing on retrieving the same players across multiple views, and (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams. Compared to last year's challenges, tasks (1-2) had their evaluation metrics redefined to consider tighter temporal accuracies, and tasks (3-6) were novel, including their underlying data and annotations. More information on the tasks, challenges and leaderboards are available on https://www.soccer-net.org. Baselines and development kits are available on https://github.com/SoccerNet. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: Accepted at ACM MMSports 2022

arXiv:2209.15574 [pdf, other]

An improved algorithm for Generalized Čech complex construction

Authors: Jie Chu, Mikael Vejdemo-Johansson, Ping Ji

Abstract: In this paper, we present an algorithm that computes the generalized Čech complex for a finite set of disks where each may have a different radius in 2D space. An extension of this algorithm is also proposed for a set of balls in 3D space with different radius. To compute a $k$-simplex, we leverage the computation performed in the round of $(k-1)$-simplices such that we can reduce the number of… ▽ More In this paper, we present an algorithm that computes the generalized Čech complex for a finite set of disks where each may have a different radius in 2D space. An extension of this algorithm is also proposed for a set of balls in 3D space with different radius. To compute a $k$-simplex, we leverage the computation performed in the round of $(k-1)$-simplices such that we can reduce the number of potential candidates to verify to improve the efficiency. An efficient verification method is proposed to confirm if a $k$-simplex can be constructed on the basis of the $(k-1)$-simplices. We demonstrate the performance with a comparison to some closely related algorithms. △ Less

Submitted 30 September, 2022; originally announced September 2022.

MSC Class: 68U05; 57-08 ACM Class: F.2.2; I.3.5

arXiv:2208.13006 [pdf, other]

Neural Observer with Lyapunov Stability Guarantee for Uncertain Nonlinear Systems

Authors: Song Chen, Shengze Cai, Tehuan Chen, Chao Xu, Jian Chu

Abstract: In this paper, we propose a novel nonlinear observer based on neural networks, called neural observer, for observation tasks of linear time-invariant (LTI) systems and uncertain nonlinear systems. In particular, the neural observer designed for uncertain systems is inspired by the active disturbance rejection control, which can measure the uncertainty in real-time. The stability analysis (e.g., ex… ▽ More In this paper, we propose a novel nonlinear observer based on neural networks, called neural observer, for observation tasks of linear time-invariant (LTI) systems and uncertain nonlinear systems. In particular, the neural observer designed for uncertain systems is inspired by the active disturbance rejection control, which can measure the uncertainty in real-time. The stability analysis (e.g., exponential convergence rate) of LTI and uncertain nonlinear systems (involving neural observers) are presented and guaranteed, where it is shown that the observation problems can be solved only using the linear matrix inequalities (LMIs). Also, it is revealed that the observability and controllability of the system matrices are required to demonstrate the existence of solutions of LMIs. Finally, the effectiveness of neural observers is verified on three simulation cases, including the X-29A aircraft model, the nonlinear pendulum, and the four-wheel steering vehicle. △ Less

Submitted 16 January, 2023; v1 submitted 27 August, 2022; originally announced August 2022.

Comments: 15 pages, submitted to IEEE journal for possible publication

arXiv:2206.04688 [pdf, other]

A New Frontier of AI: On-Device AI Training and Personalization

Authors: Ji Joong Moon, Hyun Suk Lee, Jiho Chu, Donghak Park, Seungbaek Hong, Hyungjun Seo, Donghyeon Jeong, Sungsik Kong, MyungJoo Ham

Abstract: Modern consumer electronic devices have started executing deep learning-based intelligence services on devices, not cloud servers, to keep personal data on devices and to reduce network and cloud costs. We find such a trend as the opportunity to personalize intelligence services by updating neural networks with user data without exposing the data out of devices: on-device training. However, the li… ▽ More Modern consumer electronic devices have started executing deep learning-based intelligence services on devices, not cloud servers, to keep personal data on devices and to reduce network and cloud costs. We find such a trend as the opportunity to personalize intelligence services by updating neural networks with user data without exposing the data out of devices: on-device training. However, the limited resources of devices incurs significant difficulties. We propose a light-weight on-device training framework, NNTrainer, which provides highly memory-efficient neural network training techniques and proactive swapping based on fine-grained execution order analysis for neural networks. Moreover, its optimizations do not sacrifice accuracy and are transparent to training algorithms; thus, prior algorithmic studies may be implemented on top of NNTrainer. The evaluations show that NNTrainer can reduce memory consumption down to 1/20 (saving 95%!) and effectively personalizes intelligence services on devices. NNTrainer is cross-platform and practical open-source software, which is being deployed to millions of mobile devices. △ Less

Submitted 4 January, 2024; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 12 pages, 16 figures, Accepted in ICSE 2024

arXiv:2206.00262 [pdf, other]

Self-supervised Learning for Label Sparsity in Computational Drug Repositioning

Authors: Xinxing Yang, Genke Yang, Jian Chu

Abstract: The computational drug repositioning aims to discover new uses for marketed drugs, which can accelerate the drug development process and play an important role in the existing drug discovery system. However, the number of validated drug-disease associations is scarce compared to the number of drugs and diseases in the real world. Too few labeled samples will make the classification model unable to… ▽ More The computational drug repositioning aims to discover new uses for marketed drugs, which can accelerate the drug development process and play an important role in the existing drug discovery system. However, the number of validated drug-disease associations is scarce compared to the number of drugs and diseases in the real world. Too few labeled samples will make the classification model unable to learn effective latent factors of drugs, resulting in poor generalization performance. In this work, we propose a multi-task self-supervised learning framework for computational drug repositioning. The framework tackles label sparsity by learning a better drug representation. Specifically, we take the drug-disease association prediction problem as the main task, and the auxiliary task is to use data augmentation strategies and contrast learning to mine the internal relationships of the original drug features, so as to automatically learn a better drug representation without supervised labels. And through joint training, it is ensured that the auxiliary task can improve the prediction accuracy of the main task. More precisely, the auxiliary task improves drug representation and serving as additional regularization to improve generalization. Furthermore, we design a multi-input decoding network to improve the reconstruction ability of the autoencoder model. We evaluate our model using three real-world datasets. The experimental results demonstrate the effectiveness of the multi-task self-supervised learning framework, and its predictive ability is superior to the state-of-the-art model. △ Less

Submitted 1 June, 2022; originally announced June 2022.

Comments: 14 pages

arXiv:2204.03649 [pdf, other]

Unsupervised Prompt Learning for Vision-Language Models

Authors: Tony Huang, Jack Chu, Fangyun Wei

Abstract: Contrastive vision-language models like CLIP have shown great progress in transfer learning. In the inference stage, the proper text description, also known as prompt, needs to be carefully designed to correctly classify the given images. In order to avoid laborious prompt engineering, recent works such as CoOp, CLIP-Adapter and Tip-Adapter propose to adapt vision-language models for downstream im… ▽ More Contrastive vision-language models like CLIP have shown great progress in transfer learning. In the inference stage, the proper text description, also known as prompt, needs to be carefully designed to correctly classify the given images. In order to avoid laborious prompt engineering, recent works such as CoOp, CLIP-Adapter and Tip-Adapter propose to adapt vision-language models for downstream image recognition tasks on a small set of labeled data. Though promising improvements are achieved, requiring labeled data from the target datasets may restrict the scalability. In this paper, we explore a different scenario, in which the labels of the target datasets are unprovided, and we present an unsupervised prompt learning (UPL) approach to avoid prompt engineering while simultaneously improving transfer performance of CLIP-like vision-language models. As far as we know, UPL is the first work to introduce unsupervised learning into prompt learning. Experimentally, our UPL outperforms original CLIP with prompt engineering on ImageNet as well as other 10 datasets. An enhanced version of UPL is even competitive with the 8-shot CoOp and the 8-shot TIP-Adapter on most datasets. Code and models are available at https://github.com/tonyhuang2022/UPL. △ Less

Submitted 22 August, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

arXiv:2204.02688 [pdf, other]

SEAL: A Large-scale Video Dataset of Multi-grained Spatio-temporally Action Localization

Authors: Shimin Chen, Wei Li, Chen Chen, Jianyang Gu, Jiaming Chu, Xunqiang Tao, Yandong Guo

Abstract: In spite of many dataset efforts for human action recognition, current computer vision algorithms are still limited to coarse-grained spatial and temporal annotations among human daily life. In this paper, we introduce a novel large-scale video dataset dubbed SEAL for multi-grained Spatio-tEmporal Action Localization. SEAL consists of two kinds of annotations, SEAL Tubes and SEAL Clips. We observe… ▽ More In spite of many dataset efforts for human action recognition, current computer vision algorithms are still limited to coarse-grained spatial and temporal annotations among human daily life. In this paper, we introduce a novel large-scale video dataset dubbed SEAL for multi-grained Spatio-tEmporal Action Localization. SEAL consists of two kinds of annotations, SEAL Tubes and SEAL Clips. We observe that atomic actions can be combined into many complex activities. SEAL Tubes provide both atomic action and complex activity annotations in tubelet level, producing 49.6k atomic actions spanning 172 action categories and 17.7k complex activities spanning 200 activity categories. SEAL Clips localizes atomic actions in space during two-second clips, producing 510.4k action labels with multiple labels per person. Extensive experimental results show that SEAL significantly helps to advance video understanding. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: 17 pages,6 figures

arXiv:2203.16014 [pdf, other]

ESNI: Domestic Robots Design for Elderly and Disabled People

Authors: Junchi Chu, Xueyun Tang

Abstract: Our paper focuses on the research of the possibility for speech recognition intelligent agents to assist the elderly and disabled people's lives, to improve their life quality by utilizing cutting-edge technologies. After researching the attitude of elderly and disabled people toward the household agent, we propose a design framework: ESNI(Exploration, Segmentation, Navigation, Instruction) that a… ▽ More Our paper focuses on the research of the possibility for speech recognition intelligent agents to assist the elderly and disabled people's lives, to improve their life quality by utilizing cutting-edge technologies. After researching the attitude of elderly and disabled people toward the household agent, we propose a design framework: ESNI(Exploration, Segmentation, Navigation, Instruction) that apply to mobile agent, achieve some functionalities such as processing human commands, picking up a specified object, and moving an object to another location. The agent starts the exploration in an unseen environment, stores each item's information in the grid cells to his memory and analyzes the corresponding features for each section. We divided our indoor environment into 6 sections: Kitchen, Living room, Bedroom, Studio, Bathroom, Balcony. The agent uses algorithms to assign sections for each grid cell then generates a navigation trajectory base on the section segmentation. When the user gives a command to the agent, feature words will be extracted and processed into a sequence of sub-tasks. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2111.14696 [pdf, other]

The Computational Drug Repositioning without Negative Sampling

Authors: Xinxing Yang, Genke Yang, Jian Chu

Abstract: Computational drug repositioning technology is an effective tool to accelerate drug development. Although this technique has been widely used and successful in recent decades, many existing models still suffer from multiple drawbacks such as the massive number of unvalidated drug-disease associations and the inner product. The limitations of these works are mainly due to the following two reasons:… ▽ More Computational drug repositioning technology is an effective tool to accelerate drug development. Although this technique has been widely used and successful in recent decades, many existing models still suffer from multiple drawbacks such as the massive number of unvalidated drug-disease associations and the inner product. The limitations of these works are mainly due to the following two reasons: firstly, previous works used negative sampling techniques to treat unvalidated drug-disease associations as negative samples, which is invalid in real-world settings; secondly, the inner product cannot fully take into account the feature information contained in the latent factor of drug and disease. In this paper, we propose a novel PUON framework for addressing the above deficiencies, which models the risk estimator of computational drug repositioning only using validated (Positive) and unvalidated (Unlabelled) drug-disease associations without employing negative sampling techniques. The PUON also proposed an Outer Neighborhood-based classifier for modeling the cross-feature information of the latent facotor. For a comprehensive comparison, we considered 8 popular baselines. Extensive experiments in four real-world datasets showed that PUON model achieved the best performance based on 6 evaluation metrics. △ Less

Submitted 31 May, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

Comments: 12 pages,10 figures

arXiv:2110.05603 [pdf, other]

Generalizing to New Domains by Mapping Natural Language to Lifted LTL

Authors: Eric Hsiung, Hiloni Mehta, Junchi Chu, Xinyu Liu, Roma Patel, Stefanie Tellex, George Konidaris

Abstract: Recent work on using natural language to specify commands to robots has grounded that language to LTL. However, mapping natural language task specifications to LTL task specifications using language models require probability distributions over finite vocabulary. Existing state-of-the-art methods have extended this finite vocabulary to include unseen terms from the input sequence to improve output… ▽ More Recent work on using natural language to specify commands to robots has grounded that language to LTL. However, mapping natural language task specifications to LTL task specifications using language models require probability distributions over finite vocabulary. Existing state-of-the-art methods have extended this finite vocabulary to include unseen terms from the input sequence to improve output generalization. However, novel out-of-vocabulary atomic propositions cannot be generated using these methods. To overcome this, we introduce an intermediate contextual query representation which can be learned from single positive task specification examples, associating a contextual query with an LTL template. We demonstrate that this intermediate representation allows for generalization over unseen object references, assuming accurate groundings are available. We compare our method of mapping natural language task specifications to intermediate contextual queries against state-of-the-art CopyNet models capable of translating natural language to LTL, by evaluating whether correct LTL for manipulation and navigation task specifications can be output, and show that our method outperforms the CopyNet model on unseen object references. We demonstrate that the grounded LTL our method outputs can be used for planning in a simulated OO-MDP environment. Finally, we discuss some common failure modes encountered when translating natural language task specifications to grounded LTL. △ Less

Submitted 9 March, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: 7 pages (6 + 1 references page), 3 figures, 2 tables. Accepted to ICRA 2022. To appear in Proceedings of the 2022 International Conference on Robotics and Automation, May 2022

arXiv:2109.07690 [pdf, other]

The Neural Metric Factorization for Computational Drug Repositioning

Authors: Xinxing Yang, Genke Yangand Jian Chu

Abstract: Computational drug repositioning aims to discover new therapeutic diseases for marketed drugs and has the advantages of low cost, short development cycle, and high controllability compared to traditional drug development. The matrix factorization model has become the cornerstone technique for computational drug repositioning due to its ease of implementation and excellent scalability. However, the… ▽ More Computational drug repositioning aims to discover new therapeutic diseases for marketed drugs and has the advantages of low cost, short development cycle, and high controllability compared to traditional drug development. The matrix factorization model has become the cornerstone technique for computational drug repositioning due to its ease of implementation and excellent scalability. However, the matrix factorization model uses the inner product to represent the association between drugs and diseases, which is lacking in expressive ability. Moreover, the degree of similarity of drugs or diseases could not be implied on their respective latent factor vectors, which is not satisfy the common sense of drug discovery. Therefore, a neural metric factorization model (NMF) for computational drug repositioning is proposed in this work. We novelly consider the latent factor vector of drugs and diseases as a point in the high-dimensional coordinate system and propose a generalized Euclidean distance to represent the association between drugs and diseases to compensate for the shortcomings of the inner product. Furthermore, by embedding multiple drug (disease) metrics information into the encoding space of the latent factor vector, the information about the similarity between drugs (diseases) can be reflected in the distance between latent factor vectors. Finally, we conduct wide analysis experiments on two real datasets to demonstrate the effectiveness of the above improvement points and the superiority of the NMF model. △ Less

Submitted 28 November, 2021; v1 submitted 15 September, 2021; originally announced September 2021.

Comments: 16 pages

arXiv:2106.07874 [pdf]

Towards the Objective Speech Assessment of Smoking Status based on Voice Features: A Review of the Literature

Authors: Zhizhong Ma, Chris Bullen, Joanna Ting Wai Chu, Ruili Wang, Yingchun Wang, Satwinder Singh

Abstract: In smoking cessation clinical research and practice, objective validation of self-reported smoking status is crucial for ensuring the reliability of the primary outcome, that is, smoking abstinence. Speech signals convey important information about a speaker, such as age, gender, body size, emotional state, and health state. We investigated (1) if smoking could measurably alter voice features, (2)… ▽ More In smoking cessation clinical research and practice, objective validation of self-reported smoking status is crucial for ensuring the reliability of the primary outcome, that is, smoking abstinence. Speech signals convey important information about a speaker, such as age, gender, body size, emotional state, and health state. We investigated (1) if smoking could measurably alter voice features, (2) if smoking cessation could lead to changes in voice, and therefore (3) if the voice-based smoking status assessment has the potential to be used as an objective smoking cessation validation method. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2105.10552 [pdf]

doi 10.1109/TCBB.2021.3109557

GapPredict: A Language Model for Resolving Gaps in Draft Genome Assemblies

Authors: Eric Chen, Justin Chu, Jessica Zhang, Rene L. Warren, Inanc Birol

Abstract: Short-read DNA sequencing instruments can yield over 1e+12 bases per run, typically composed of reads 150 bases long. Despite this high throughput, de novo assembly algorithms have difficulty reconstructing contiguous genome sequences using short reads due to both repetitive and difficult-to-sequence regions in these genomes. Some of the short read assembly challenges are mitigated by scaffolding… ▽ More Short-read DNA sequencing instruments can yield over 1e+12 bases per run, typically composed of reads 150 bases long. Despite this high throughput, de novo assembly algorithms have difficulty reconstructing contiguous genome sequences using short reads due to both repetitive and difficult-to-sequence regions in these genomes. Some of the short read assembly challenges are mitigated by scaffolding assembled sequences using paired-end reads. However, unresolved sequences in these scaffolds appear as "gaps". Here, we introduce GapPredict, a tool that uses a character-level language model to predict unresolved nucleotides in scaffold gaps. We benchmarked GapPredict against the state-of-the-art gap-filling tool Sealer, and observed that the former can fill 65.6% of the sampled gaps that were left unfilled by the latter, demonstrating the practical utility of deep learning approaches to the gap-filling problem in genome sequence assembly. △ Less

Submitted 24 May, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

Comments: 9 pages, 7 figures. IEEE/ACM Trans Comput Biol Bioinform (2021)

arXiv:2009.12559 [pdf, other]

doi 10.1109/TIP.2020.3018221

Affinity Space Adaptation for Semantic Segmentation Across Domains

Authors: Wei Zhou, Yukang Wang, Jiajia Chu, Jiehua Yang, Xiang Bai, Yongchao Xu

Abstract: Semantic segmentation with dense pixel-wise annotation has achieved excellent performance thanks to deep learning. However, the generalization of semantic segmentation in the wild remains challenging. In this paper, we address the problem of unsupervised domain adaptation (UDA) in semantic segmentation. Motivated by the fact that source and target domain have invariant semantic structures, we prop… ▽ More Semantic segmentation with dense pixel-wise annotation has achieved excellent performance thanks to deep learning. However, the generalization of semantic segmentation in the wild remains challenging. In this paper, we address the problem of unsupervised domain adaptation (UDA) in semantic segmentation. Motivated by the fact that source and target domain have invariant semantic structures, we propose to exploit such invariance across domains by leveraging co-occurring patterns between pairwise pixels in the output of structured semantic segmentation. This is different from most existing approaches that attempt to adapt domains based on individual pixel-wise information in image, feature, or output level. Specifically, we perform domain adaptation on the affinity relationship between adjacent pixels termed affinity space of source and target domain. To this end, we develop two affinity space adaptation strategies: affinity space cleaning and adversarial affinity space alignment. Extensive experiments demonstrate that the proposed method achieves superior performance against some state-of-the-art methods on several challenging benchmarks for semantic segmentation across domains. The code is available at https://github.com/idealwei/ASANet. △ Less

Submitted 26 September, 2020; originally announced September 2020.

Comments: Accepted by IEEE TIP

arXiv:2008.03868 [pdf, ps, other]

Robust Design for NOMA-based Multi-Beam LEO Satellite Internet of Things

Authors: Jianhang Chu, Xiaoming Chen, Caijun Zhong, Zhaoyang Zhang

Abstract: In this paper, we investigate the issue of massive access in a beyond fifth-generation (B5G) multi-beam low earth orbit (LEO) satellite internet of things (IoT) network in the presence of channel phase uncertainty due to channel state information (CSI) conveyance from the devices to the satellite via the gateway. Rather than time division multiple access (TDMA) or frequency division multiple acces… ▽ More In this paper, we investigate the issue of massive access in a beyond fifth-generation (B5G) multi-beam low earth orbit (LEO) satellite internet of things (IoT) network in the presence of channel phase uncertainty due to channel state information (CSI) conveyance from the devices to the satellite via the gateway. Rather than time division multiple access (TDMA) or frequency division multiple access (FDMA) with multi-color pattern, a new non-orthogonal multiple access (NOMA) scheme is adopted to support massive IoT distributed over a very wide range. Considering the limited energy on the LEO satellite, two robust beamforming algorithms against channel phase uncertainty are proposed for minimizing the total power consumption in the scenarios of noncritical IoT applications and critical IoT applications, respectively. Both thoeretical analysis and simulation results validate the effectiveness and robustness of the proposed algorithms for supporting massive access in satellite IoT. △ Less

Submitted 9 August, 2020; originally announced August 2020.

arXiv:2008.03468 [pdf, other]

TGK-Planner: An Efficient Topology Guided Kinodynamic Planner for Autonomous Quadrotors

Authors: Hongkai Ye, Xin Zhou, Zhepei Wang, Chao Xu, Jian Chu, Fei Gao

Abstract: In this paper, we propose a lightweight yet effective Topology Guided Kinodynamic planner (TGK-Planner) for quadrotor aggressive flights with limited onboard computing resources. The proposed system follows the traditional hierarchical planning workflow, with novel designs to improve the robustness and efficiency in both the pathfinding and trajectory optimization sub-modules. Firstly, we propose… ▽ More In this paper, we propose a lightweight yet effective Topology Guided Kinodynamic planner (TGK-Planner) for quadrotor aggressive flights with limited onboard computing resources. The proposed system follows the traditional hierarchical planning workflow, with novel designs to improve the robustness and efficiency in both the pathfinding and trajectory optimization sub-modules. Firstly, we propose the topology guided graph, which roughly captures the topological structure of the environment and guides the state sampling of a sampling-based kinodynamic planner. In this way, we significantly improve the efficiency of finding a safe and dynamically feasible trajectory. Then, we refine the smoothness and continuity of the trajectory in an optimization framework, which incorporates the homotopy constraint to guarantee the safety of the trajectory. The optimization program is formulated as a sequence of quadratic programmings (QPs) and can be iteratively solved in a few milliseconds. Finally, the proposed system is integrated into a fully autonomous quadrotor and validated in various simulated and real-world scenarios. Benchmark comparisons show that our method outperforms state-of-the-art methods with regard to efficiency and trajectory quality. Moreover, we will release our code as an open-source package. △ Less

Submitted 8 November, 2020; v1 submitted 8 August, 2020; originally announced August 2020.

arXiv:2003.06321 [pdf, other]

Micro-supervised Disturbance Learning: A Perspective of Representation Probability Distribution

Authors: Jielei Chu, Jing Liu, Hongjun Wang, Meng Hua, Zhiguo Gong, Tianrui Li

Abstract: The instability is shown in the existing methods of representation learning based on Euclidean distance under a broad set of conditions. Furthermore, the scarcity and high cost of labels prompt us to explore more expressive representation learning methods which depends on the labels as few as possible. To address these issues, the small-perturbation ideology is firstly introduced on the representa… ▽ More The instability is shown in the existing methods of representation learning based on Euclidean distance under a broad set of conditions. Furthermore, the scarcity and high cost of labels prompt us to explore more expressive representation learning methods which depends on the labels as few as possible. To address these issues, the small-perturbation ideology is firstly introduced on the representation learning model based on the representation probability distribution. The positive small-perturbation information (SPI) which only depend on two labels of each cluster is used to stimulate the representation probability distribution and then two variant models are proposed to fine-tune the expected representation distribution of RBM, namely, Micro-supervised Disturbance GRBM (Micro-DGRBM) and Micro-supervised Disturbance RBM (Micro-DRBM) models. The Kullback-Leibler (KL) divergence of SPI is minimized in the same cluster to promote the representation probability distributions to become more similar in Contrastive Divergence(CD) learning. In contrast, the KL divergence of SPI is maximized in the different clusters to enforce the representation probability distributions to become more dissimilar in CD learning. To explore the representation learning capability under the continuous stimulation of the SPI, we present a deep Micro-supervised Disturbance Learning (Micro-DL) framework based on the Micro-DGRBM and Micro-DRBM models and compare it with a similar deep structure which has not any external stimulation. Experimental results demonstrate that the proposed deep Micro-DL architecture shows better performance in comparison to the baseline method, the most related shallow models and deep frameworks for clustering. △ Less

Submitted 6 October, 2021; v1 submitted 13 March, 2020; originally announced March 2020.

Comments: 14 pages

arXiv:2003.06113 [pdf, ps, other]

Ultra Efficient Transfer Learning with Meta Update for Cross Subject EEG Classification

Authors: Tiehang Duan, Mihir Chauhan, Mohammad Abuzar Shaikh, Jun Chu, Sargur Srihari

Abstract: The pattern of Electroencephalogram (EEG) signal differs significantly across different subjects, and poses challenge for EEG classifiers in terms of 1) effectively adapting a learned classifier onto a new subject, 2) retaining knowledge of known subjects after the adaptation. We propose an efficient transfer learning method, named Meta UPdate Strategy (MUPS-EEG), for continuous EEG classification… ▽ More The pattern of Electroencephalogram (EEG) signal differs significantly across different subjects, and poses challenge for EEG classifiers in terms of 1) effectively adapting a learned classifier onto a new subject, 2) retaining knowledge of known subjects after the adaptation. We propose an efficient transfer learning method, named Meta UPdate Strategy (MUPS-EEG), for continuous EEG classification across different subjects. The model learns effective representations with meta update which accelerates adaptation on new subject and mitigate forgetting of knowledge on previous subjects at the same time. The proposed mechanism originates from meta learning and works to 1) find feature representation that is broadly suitable for different subjects, 2) maximizes sensitivity of loss function for fast adaptation on new subject. The method can be applied to all deep learning oriented models. Extensive experiments on two public datasets demonstrate the effectiveness of the proposed model, outperforming current state of the arts by a large margin in terms of both adapting on new subject and retain knowledge of learned subjects. △ Less

Submitted 1 March, 2021; v1 submitted 13 March, 2020; originally announced March 2020.

arXiv:2002.10629 [pdf, other]

Alternating Minimization Based Trajectory Generation for Quadrotor Aggressive Flight

Authors: Zhepei Wang, Xin Zhou, Chao Xu, Jian Chu, Fei Gao

Abstract: With much research has been conducted into trajectory planning for quadrotors, planning with spatial and temporal optimal trajectories in real-time is still challenging. In this paper, we propose a framework for generating large-scale piecewise polynomial trajectories for aggressive autonomous flights, with highlights on its superior computational efficiency and simultaneous spatial-temporal optim… ▽ More With much research has been conducted into trajectory planning for quadrotors, planning with spatial and temporal optimal trajectories in real-time is still challenging. In this paper, we propose a framework for generating large-scale piecewise polynomial trajectories for aggressive autonomous flights, with highlights on its superior computational efficiency and simultaneous spatial-temporal optimality. Exploiting the implicitly decoupled structure of the planning problem, we conduct alternating minimization between boundary conditions and time durations of trajectory pieces. In each minimization phase, we leverage the algebraic convenience of the sub-problem to escape poor local minima and achieve the lowest time consumption. Theoretical analysis for the global/local convergence rate of our proposed method is provided. Moreover, based on polynomial theory, an extremely fast feasibility check method is designed for various kinds of constraints. By incorporating the method into our alternating structure, a constrained minimization algorithm is constructed to optimize trajectories on the premise of feasibility. Benchmark evaluation shows that our algorithm outperforms state-of-the-art methods regarding efficiency, optimality, and scalability. Aggressive flight experiments in a limited space with dense obstacles are presented to demonstrate the performance of the proposed algorithm. We release our implementation as an open-source ros-package. △ Less

Submitted 24 February, 2020; originally announced February 2020.

Comments: The paper is submitted to RA-L/IROS 2020

arXiv:1906.05173 [pdf, other]

Multi-local Collaborative AutoEncoder

Authors: Jielei Chu, Hongjun Wang, Jing Liu, Zhiguo Gong, Tianrui Li

Abstract: The excellent performance of representation learning of autoencoders have attracted considerable interest in various applications. However, the structure and multi-local collaborative relationships of unlabeled data are ignored in their encoding procedure that limits the capability of feature extraction. This paper presents a Multi-local Collaborative AutoEncoder (MC-AE), which consists of novel m… ▽ More The excellent performance of representation learning of autoencoders have attracted considerable interest in various applications. However, the structure and multi-local collaborative relationships of unlabeled data are ignored in their encoding procedure that limits the capability of feature extraction. This paper presents a Multi-local Collaborative AutoEncoder (MC-AE), which consists of novel multi-local collaborative representation RBM (mcrRBM) and multi-local collaborative representation GRBM (mcrGRBM) models. Here, the Locality Sensitive Hashing (LSH) method is used to divide the input data into multi-local cross blocks which contains multi-local collaborative relationships of the unlabeled data and features since the similar multi-local instances and features of the input data are divided into the same block. In mcrRBM and mcrGRBM models, the structure and multi-local collaborative relationships of unlabeled data are integrated into their encoding procedure. Then, the local hidden features converges on the center of each local collaborative block. Under the collaborative joint influence of each local block, the proposed MC-AE has powerful capability of representation learning for unsupervised clustering. However, our MC-AE model perhaps perform training process for a long time on the large-scale and high-dimensional datasets because more local collaborative blocks are integrate into it. Five most related deep models are compared with our MC-AE. The experimental results show that the proposed MC-AE has more excellent capabilities of collaborative representation and generalization than the contrastive deep models. △ Less

Submitted 8 October, 2021; v1 submitted 12 June, 2019; originally announced June 2019.

arXiv:1905.08736 [pdf]

doi 10.1002/asl.861

Identification of synoptic weather types over Taiwan area with multiple classifiers

Authors: Shih-Hao Su, Jung-Lien Chu, Ting-Shuo Yo, Lee-Yaw Lin

Abstract: In this study, a novel machine learning approach was used to classify three types of synoptic weather events in Taiwan area from 2001 to 2010. We used reanalysis data with three machine learning algorithms to recognize weather systems and evaluated their performance. Overall, the classifiers successfully identified 52-83% of weather events (hit rate), which is higher than the performance of tradit… ▽ More In this study, a novel machine learning approach was used to classify three types of synoptic weather events in Taiwan area from 2001 to 2010. We used reanalysis data with three machine learning algorithms to recognize weather systems and evaluated their performance. Overall, the classifiers successfully identified 52-83% of weather events (hit rate), which is higher than the performance of traditional objective methods. The results showed that the machine learning approach gave low false alarm rate in general, while the support vector machine (SVM) with more principal components of reanalysis data had higher hit rate on all tested weather events. The sensitivity tests of grid data resolution indicated that the differences between the high- and low-resolution datasets are limited, which implied that the proposed method can achieve reasonable performance in weather forecasting with minimal resources. By identifying daily weather systems in historical reanalysis data, this method can be used to study long-term weather changes, to monitor climatological-scale variations, and to provide a better estimate of climate projections. Furthermore, this method can also serve as an alternative to model output statistics and potentially be used for synoptic weather forecasting. △ Less

Submitted 21 May, 2019; originally announced May 2019.

Comments: journal article, open access

Journal ref: Atmos Sci Lett.2018;e861

arXiv:1812.02621 [pdf, other]

doi 10.1109/ICFHR-2018.2018.00041

Hybrid Feature Learning for Handwriting Verification

Authors: Mohammad Abuzar Shaikh, Mihir Chauhan, Jun Chu, Sargur Srihari

Abstract: We propose an effective Hybrid Deep Learning (HDL) architecture for the task of determining the probability that a questioned handwritten word has been written by a known writer. HDL is an amalgamation of Auto-Learned Features (ALF) and Human-Engineered Features (HEF). To extract auto-learned features we use two methods: First, Two Channel Convolutional Neural Network (TC-CNN); Second, Two Channel… ▽ More We propose an effective Hybrid Deep Learning (HDL) architecture for the task of determining the probability that a questioned handwritten word has been written by a known writer. HDL is an amalgamation of Auto-Learned Features (ALF) and Human-Engineered Features (HEF). To extract auto-learned features we use two methods: First, Two Channel Convolutional Neural Network (TC-CNN); Second, Two Channel Autoencoder (TC-AE). Furthermore, human-engineered features are extracted by using two methods: First, Gradient Structural Concavity (GSC); Second, Scale Invariant Feature Transform (SIFT). Experiments are performed by complementing one of the HEF methods with one ALF method on 150000 pairs of samples of the word "AND" cropped from handwritten notes written by 1500 writers. Our results indicate that HDL architecture with AE-GSC achieves 99.7% accuracy on seen writer dataset and 92.16% accuracy on shuffled writer dataset which out performs CEDAR-FOX, as for unseen writer dataset, AE-SIFT performs comparable to this sophisticated handwriting comparison tool. △ Less

Submitted 18 November, 2018; originally announced December 2018.

Comments: Accepted and presented in International Conference on Frontiers in Handwriting Recognition (ICFHR) 2018

arXiv:1812.01967 [pdf, other]

Unsupervised Feature Learning Architecture with Multi-clustering Integration RBM

Authors: Jielei Chu, Hongjun Wang, Jing Liu, Zhiguo Gong, Tianrui Li

Abstract: In this paper, we present a novel unsupervised feature learning architecture, which consists of a multi-clustering integration module and a variant of RBM termed multi-clustering integration RBM (MIRBM). In the multi-clustering integration module, we apply three unsupervised K-means, affinity propagation and spectral clustering algorithms to obtain three different clustering partitions (CPs) witho… ▽ More In this paper, we present a novel unsupervised feature learning architecture, which consists of a multi-clustering integration module and a variant of RBM termed multi-clustering integration RBM (MIRBM). In the multi-clustering integration module, we apply three unsupervised K-means, affinity propagation and spectral clustering algorithms to obtain three different clustering partitions (CPs) without any background knowledge or label. Then, an unanimous voting strategy is used to generate a local clustering partition (LCP). The novel MIRBM model is a core feature encoding part of the proposed unsupervised feature learning architecture. The novelty of it is that the LCP as an unsupervised guidance is integrated into one step contrastive divergence (CD1) learning to guide the distribution of the hidden layer features. For the instance in the same LCP cluster, the hidden and reconstructed hidden layer features of the MIRBM model in the proposed architecture tend to constrict together in the training process. Meanwhile, each LCP center tends to disperse from each other as much as possible in the hidden and reconstructed hidden layer during training. The experiments demonstrate that the proposed unsupervised feature learning architecture has more powerful feature representation and generalization capability than the state-of-the-art graph regularized RBM (GraphRBM) for clustering tasks in the Microsoft Research Asia Multimedia (MSRA-MM)2.0 dataset. △ Less

Submitted 2 April, 2020; v1 submitted 5 December, 2018; originally announced December 2018.

Showing 1–50 of 58 results for author: Chu, J