-
Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for accurate object detection
Authors:
Zhiqiang Yang,
Qiu Guan,
Keer Zhao,
Jianmin Yang,
Xinli Xu,
Haixia Long,
Ying Tang
Abstract:
Due to the effective performance of multi-scale feature fusion, Path Aggregation FPN (PAFPN) is widely employed in YOLO detectors. However, it cannot efficiently and adaptively integrate high-level semantic information with low-level spatial information simultaneously. We propose a new model named MAF-YOLO in this paper, which is a novel object detection framework with a versatile neck named Multi…
▽ More
Due to the effective performance of multi-scale feature fusion, Path Aggregation FPN (PAFPN) is widely employed in YOLO detectors. However, it cannot efficiently and adaptively integrate high-level semantic information with low-level spatial information simultaneously. We propose a new model named MAF-YOLO in this paper, which is a novel object detection framework with a versatile neck named Multi-Branch Auxiliary FPN (MAFPN). Within MAFPN, the Superficial Assisted Fusion (SAF) module is designed to combine the output of the backbone with the neck, preserving an optimal level of shallow information to facilitate subsequent learning. Meanwhile, the Advanced Assisted Fusion (AAF) module deeply embedded within the neck conveys a more diverse range of gradient information to the output layer.
Furthermore, our proposed Re-parameterized Heterogeneous Efficient Layer Aggregation Network (RepHELAN) module ensures that both the overall model architecture and convolutional design embrace the utilization of heterogeneous large convolution kernels. Therefore, this guarantees the preservation of information related to small targets while simultaneously achieving the multi-scale receptive field. Finally, taking the nano version of MAF-YOLO for example, it can achieve 42.4% AP on COCO with only 3.76M learnable parameters and 10.51G FLOPs, and approximately outperforms YOLOv8n by about 5.1%. The source code of this work is available at: https://github.com/yang-0201/MAF-YOLO.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Evaluating Large Language Models for Public Health Classification and Extraction Tasks
Authors:
Joshua Harris,
Timothy Laurence,
Leo Loman,
Fan Grayson,
Toby Nonnenmacher,
Harry Long,
Loes WalsGriffith,
Amy Douglas,
Holly Fountain,
Stelios Georgiou,
Jo Hardstaff,
Kathryn Hopkins,
Y-Ling Chi,
Galena Kuyumdzhieva,
Lesley Larkin,
Samuel Collins,
Hamish Mohammed,
Thomas Finnie,
Luke Hounsome,
Steven Riley
Abstract:
Advances in Large Language Models (LLMs) have led to significant interest in their potential to support human experts across a range of domains, including public health. In this work we present automated evaluations of LLMs for public health tasks involving the classification and extraction of free text. We combine six externally annotated datasets with seven new internally annotated datasets to e…
▽ More
Advances in Large Language Models (LLMs) have led to significant interest in their potential to support human experts across a range of domains, including public health. In this work we present automated evaluations of LLMs for public health tasks involving the classification and extraction of free text. We combine six externally annotated datasets with seven new internally annotated datasets to evaluate LLMs for processing text related to: health burden, epidemiological risk factors, and public health interventions. We initially evaluate five open-weight LLMs (7-70 billion parameters) across all tasks using zero-shot in-context learning. We find that Llama-3-70B-Instruct is the highest performing model, achieving the best results on 15/17 tasks (using micro-F1 scores). We see significant variation across tasks with all open-weight LLMs scoring below 60% micro-F1 on some challenging tasks, such as Contact Classification, while all LLMs achieve greater than 80% micro-F1 on others, such as GI Illness Classification. For a subset of 12 tasks, we also evaluate GPT-4 and find comparable results to Llama-3-70B-Instruct, which scores equally or outperforms GPT-4 on 6 of the 12 tasks. Overall, based on these initial results we find promising signs that LLMs may be useful tools for public health experts to extract information from a wide variety of free text sources, and support public health surveillance, research, and interventions.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
DuEDL: Dual-Branch Evidential Deep Learning for Scribble-Supervised Medical Image Segmentation
Authors:
Yitong Yang,
Xinli Xu,
Haigen Hu,
Haixia Long,
Qianwei Zhou,
Qiu Guan
Abstract:
Despite the recent progress in medical image segmentation with scribble-based annotations, the segmentation results of most models are still not ro-bust and generalizable enough in open environments. Evidential deep learn-ing (EDL) has recently been proposed as a promising solution to model predictive uncertainty and improve the reliability of medical image segmen-tation. However directly applying…
▽ More
Despite the recent progress in medical image segmentation with scribble-based annotations, the segmentation results of most models are still not ro-bust and generalizable enough in open environments. Evidential deep learn-ing (EDL) has recently been proposed as a promising solution to model predictive uncertainty and improve the reliability of medical image segmen-tation. However directly applying EDL to scribble-supervised medical im-age segmentation faces a tradeoff between accuracy and reliability. To ad-dress the challenge, we propose a novel framework called Dual-Branch Evi-dential Deep Learning (DuEDL). Firstly, the decoder of the segmentation network is changed to two different branches, and the evidence of the two branches is fused to generate high-quality pseudo-labels. Then the frame-work applies partial evidence loss and two-branch consistent loss for joint training of the model to adapt to the scribble supervision learning. The pro-posed method was tested on two cardiac datasets: ACDC and MSCMRseg. The results show that our method significantly enhances the reliability and generalization ability of the model without sacrificing accuracy, outper-forming state-of-the-art baselines. The code is available at https://github.com/Gardnery/DuEDL.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report
Authors:
Bin Ren,
Yawei Li,
Nancy Mehta,
Radu Timofte,
Hongyuan Yu,
Cheng Wan,
Yuxin Hong,
Bingnan Han,
Zhuoyuan Wu,
Yajun Zou,
Yuqing Liu,
Jizhe Li,
Keji He,
Chao Fan,
Heng Zhang,
Xiaolin Zhang,
Xuanwu Yin,
Kunlong Zuo,
Bohao Liao,
Peizhe Xia,
Long Peng,
Zhibo Du,
Xin Di,
Wangkai Li,
Yang Wang
, et al. (109 additional authors not shown)
Abstract:
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such…
▽ More
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/.
△ Less
Submitted 25 June, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Analyzing Divergence for Nondeterministic Probabilistic Models
Authors:
Hao Wu,
Yuxi Fu,
Huan Long,
Xian Xu,
Wenbo Zhang
Abstract:
Branching and weak probabilistic bisimilarities are two well-known notions capturing behavioral equivalence between nondeterministic probabilistic systems. For probabilistic systems, divergence is of major concern. Recently several divergence-sensitive refinements of branching and weak probabilistic bisimilarities have been proposed in the literature. Both the definitions of these equivalences and…
▽ More
Branching and weak probabilistic bisimilarities are two well-known notions capturing behavioral equivalence between nondeterministic probabilistic systems. For probabilistic systems, divergence is of major concern. Recently several divergence-sensitive refinements of branching and weak probabilistic bisimilarities have been proposed in the literature. Both the definitions of these equivalences and the techniques to investigate them differ significantly. This paper presents a comprehensive comparative study on divergence-sensitive behavioral equivalence relations that refine the branching and weak probabilistic bisimilarities. Additionally, these equivalence relations are shown to have efficient checking algorithms. The techniques of this paper might be of independent interest in a more general setting.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
CGGM: A conditional graph generation model with adaptive sparsity for node anomaly detection in IoT networks
Authors:
Xianshi Su,
Munan Li,
Tongbang Jiang,
Hao Long
Abstract:
Dynamic graphs are extensively employed for detecting anomalous behavior in nodes within the Internet of Things (IoT). Generative models are often used to address the issue of imbalanced node categories in dynamic graphs. Nevertheless, the constraints it faces include the monotonicity of adjacency relationships, the difficulty in constructing multi-dimensional features for nodes, and the lack of a…
▽ More
Dynamic graphs are extensively employed for detecting anomalous behavior in nodes within the Internet of Things (IoT). Generative models are often used to address the issue of imbalanced node categories in dynamic graphs. Nevertheless, the constraints it faces include the monotonicity of adjacency relationships, the difficulty in constructing multi-dimensional features for nodes, and the lack of a method for end-to-end generation of multiple categories of nodes. This paper presents a novel graph generation model, called CGGM, designed specifically to generate a larger number of nodes belonging to the minority class. The mechanism for generating an adjacency matrix, through adaptive sparsity, enhances flexibility in its structure. The feature generation module, called multidimensional features generator (MFG) to generate node features along with topological information. Labels are transformed into embedding vectors, serving as conditional constraints to control the generation of synthetic data across multiple categories. Using a multi-stage loss, the distribution of synthetic data is adjusted to closely resemble that of real data. In extensive experiments, we show that CGGM's synthetic data outperforms state-of-the-art methods across various metrics. Our results demonstrate efficient generation of diverse data categories, robustly enhancing multi-category classification model performance.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Minimizing Block Incentive Volatility Through Verkle Tree-Based Dynamic Transaction Storage
Authors:
Xiongfei Zhao,
Gerui Zhang,
Hou-Wan Long,
Yain-Whar Si
Abstract:
Transaction fees are a crucial revenue source for miners in public and consortium blockchains. However, while public blockchains have additional revenue streams, transaction fees serve as the primary income for miners in consortium blockchains formed by various financial institutions. These miners allocate different levels of computing resources to process transactions and earn corresponding fees.…
▽ More
Transaction fees are a crucial revenue source for miners in public and consortium blockchains. However, while public blockchains have additional revenue streams, transaction fees serve as the primary income for miners in consortium blockchains formed by various financial institutions. These miners allocate different levels of computing resources to process transactions and earn corresponding fees. Nonetheless, relying solely on transaction fees can lead to significant volatility and encourage non-standard mining behaviors, thereby posing threats to the blockchain's security and integrity. Despite previous attempts to mitigate the impact of transaction fees on illicit mining behaviors, a comprehensive solution to this vulnerability is yet to be established. To address this gap, we introduce a novel approach that leverages Dynamic Transaction Storage (DTS) strategies to effectively minimize block incentive volatility. Our solution implements a Verkle tree-based storage mechanism to reduce bandwidth consumption. Moreover, to configure the DTS strategies, we evaluate several optimization algorithms and formulate the challenge as a Vehicle Routing Problem. Our experiments conducted using historical transactions from Bitcoin and remittance data from the Industrial and Commercial Bank of China reveal that the strategy focusing on time-based transaction incorporation priority, while excluding a designated space for small-fee transactions, as discovered by the gradient-based optimizer algorithm, proves most effective in reducing volatility. Hence, the DTS strategy can sustain stable block incentives irrespective of transaction types or user bidding behavior. Furthermore, the inclusion of higher-fee transactions, often smaller in size, can alleviate propagation delays and the occurrence of forks.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Dynamic Mining Interval to Improve Blockchain Throughput
Authors:
Hou-Wan Long,
Xiongfei Zhao,
Yain-Whar Si
Abstract:
Decentralized Finance (DeFi), propelled by Blockchain technology, has revolutionized traditional financial systems, improving transparency, reducing costs, and fostering financial inclusion. However, transaction activities in these systems fluctuate significantly and the throughput can be effected. To address this issue, we propose a Dynamic Mining Interval (DMI) mechanism that adjusts mining inte…
▽ More
Decentralized Finance (DeFi), propelled by Blockchain technology, has revolutionized traditional financial systems, improving transparency, reducing costs, and fostering financial inclusion. However, transaction activities in these systems fluctuate significantly and the throughput can be effected. To address this issue, we propose a Dynamic Mining Interval (DMI) mechanism that adjusts mining intervals in response to block size and trading volume to enhance the transaction throughput of Blockchain platforms. Besides, in the context of public Blockchains such as Bitcoin, Ethereum, and Litecoin, a shift towards transaction fees dominance over coin-based rewards is projected in near future. As a result, the ecosystem continues to face threats from deviant mining activities such as Undercutting Attacks, Selfish Mining, and Pool Hopping, among others. In recent years, Dynamic Transaction Storage (DTS) strategies were proposed to allocate transactions dynamically based on fees thereby stabilizing block incentives. However, DTS' utilization of Merkle tree leaf nodes can reduce system throughput. To alleviate this problem, in this paper, we propose an approach for combining DMI and DTS. Besides, we also discuss the DMI selection mechanism for adjusting mining intervals based on various factors.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action Recognition
Authors:
Nguyen Huu Bao Long
Abstract:
Graph convolutional networks (GCNs) have been widely used and achieved remarkable results in skeleton-based action recognition. We think the key to skeleton-based action recognition is a skeleton hanging in frames, so we focus on how the Graph Convolutional Convolution networks learn different topologies and effectively aggregate joint features in the global temporal and local temporal. In this wo…
▽ More
Graph convolutional networks (GCNs) have been widely used and achieved remarkable results in skeleton-based action recognition. We think the key to skeleton-based action recognition is a skeleton hanging in frames, so we focus on how the Graph Convolutional Convolution networks learn different topologies and effectively aggregate joint features in the global temporal and local temporal. In this work, we propose three Channel-wise Tolopogy Graph Convolution based on Channel-wise Topology Refinement Graph Convolution (CTR-GCN). Combining CTR-GCN with two joint cross-attention modules can capture the upper-lower body part and hand-foot relationship skeleton features. After that, to capture features of human skeletons changing in frames we design the Temporal Attention Transformers to extract skeletons effectively. The Temporal Attention Transformers can learn the temporal features of human skeleton sequences. Finally, we fuse the temporal features output scale with MLP and classification. We develop a powerful graph convolutional network named Spatial Temporal Effective Body-part Cross Attention Transformer which notably high-performance on the NTU RGB+D, NTU RGB+D 120 datasets. Our code and models are available at https://github.com/maclong01/STEP-CATFormer
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
GaitASMS: Gait Recognition by Adaptive Structured Spatial Representation and Multi-Scale Temporal Aggregation
Authors:
Yan Sun,
Hu Long,
Xueling Feng,
Mark Nixon
Abstract:
Gait recognition is one of the most promising video-based biometric technologies. The edge of silhouettes and motion are the most informative feature and previous studies have explored them separately and achieved notable results. However, due to occlusions and variations in viewing angles, their gait recognition performance is often affected by the predefined spatial segmentation strategy. Moreov…
▽ More
Gait recognition is one of the most promising video-based biometric technologies. The edge of silhouettes and motion are the most informative feature and previous studies have explored them separately and achieved notable results. However, due to occlusions and variations in viewing angles, their gait recognition performance is often affected by the predefined spatial segmentation strategy. Moreover, traditional temporal pooling usually neglects distinctive temporal information in gait. To address the aforementioned issues, we propose a novel gait recognition framework, denoted as GaitASMS, which can effectively extract the adaptive structured spatial representations and naturally aggregate the multi-scale temporal information. The Adaptive Structured Representation Extraction Module (ASRE) separates the edge of silhouettes by using the adaptive edge mask and maximizes the representation in semantic latent space. Moreover, the Multi-Scale Temporal Aggregation Module (MSTA) achieves effective modeling of long-short-range temporal information by temporally aggregated structure. Furthermore, we propose a new data augmentation, denoted random mask, to enrich the sample space of long-term occlusion and enhance the generalization of the model. Extensive experiments conducted on two datasets demonstrate the competitive advantage of proposed method, especially in complex scenes, i.e. BG and CL. On the CASIA-B dataset, GaitASMS achieves the average accuracy of 93.5\% and outperforms the baseline on rank-1 accuracies by 3.4\% and 6.3\%, respectively, in BG and CL. The ablation experiments demonstrate the effectiveness of ASRE and MSTA. The source code is available at https://github.com/YanSungithub/GaitASMS.
△ Less
Submitted 21 February, 2024; v1 submitted 29 July, 2023;
originally announced July 2023.
-
Dual-modality Smart Shoes for Quantitative Assessment of Hemiplegic Patients' Lower Limbs' Muscle Strength
Authors:
Huajun Long,
Jie Li,
Rui Li,
Xinfeng Liu,
Jingyuan Cheng
Abstract:
Stroke can lead to the impaired motor ability of the patient's lower limbs and hemiplegia. Accurate assessment of the lower limbs' motor ability is important for diagnosis and rehabilitation. To digitalize such assessment so that each test can be traced back any time and subjectivity can be avoided, we test how dual-modality smart shoes equipped with pressure-sensitive insoles and inertial measure…
▽ More
Stroke can lead to the impaired motor ability of the patient's lower limbs and hemiplegia. Accurate assessment of the lower limbs' motor ability is important for diagnosis and rehabilitation. To digitalize such assessment so that each test can be traced back any time and subjectivity can be avoided, we test how dual-modality smart shoes equipped with pressure-sensitive insoles and inertial measurement units can be used for this purpose. A 5m walking test protocol, including the left and right turns, is designed. Data are collected from 23 patients and 17 healthy subjects. For the lower limbs' motor ability, the tests are observed by two physicians and assessed using the five graded Medical Research Council scale for muscle examination. The average of two physicians' scores for the same patient is used as the ground truth. Using the feature set we developed, 100\% accuracy is achieved in classifying the patients and healthy subjects. For patients' muscle strength, a mean absolute error of 0.143 and a maximum error of 0.395 is achieved using our feature set and the regression method, closer to the ground truth than the scores from each physician (mean absolute error: 0.217, maximum error: 0.5). We thus validate the possibility of using such smart shoes to objectively and accurately evaluate the lower limbs' muscle strength of the stroke patients.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Electricity Price Prediction for Energy Storage System Arbitrage: A Decision-focused Approach
Authors:
Linwei Sang,
Yinliang Xu,
Huan Long,
Qinran Hu,
Hongbin Sun
Abstract:
Electricity price prediction plays a vital role in energy storage system (ESS) management. Current prediction models focus on reducing prediction errors but overlook their impact on downstream decision-making. So this paper proposes a decision-focused electricity price prediction approach for ESS arbitrage to bridge the gap from the downstream optimization model to the prediction model. The decisi…
▽ More
Electricity price prediction plays a vital role in energy storage system (ESS) management. Current prediction models focus on reducing prediction errors but overlook their impact on downstream decision-making. So this paper proposes a decision-focused electricity price prediction approach for ESS arbitrage to bridge the gap from the downstream optimization model to the prediction model. The decision-focused approach aims at utilizing the downstream arbitrage model for training prediction models. It measures the difference between actual decisions under the predicted price and oracle decisions under the true price, i.e., decision error, by regret, transforms it into the tractable surrogate regret, and then derives the gradients to predicted price for training prediction models. Based on the prediction and decision errors, this paper proposes the hybrid loss and corresponding stochastic gradient descent learning method to learn prediction models for prediction and decision accuracy. The case study verifies that the proposed approach can efficiently bring more economic benefits and reduce decision errors by flattening the time distribution of prediction errors, compared to prediction models for only minimizing prediction errors.
△ Less
Submitted 29 April, 2023;
originally announced May 2023.
-
Mutual Exclusive Modulator for Long-Tailed Recognition
Authors:
Haixu Long,
Xiaolin Zhang,
Yanbin Liu,
Zongtai Luo,
Jianbo Liu
Abstract:
The long-tailed recognition (LTR) is the task of learning high-performance classifiers given extremely imbalanced training samples between categories. Most of the existing works address the problem by either enhancing the features of tail classes or re-balancing the classifiers to reduce the inductive bias. In this paper, we try to look into the root cause of the LTR task, i.e., training samples f…
▽ More
The long-tailed recognition (LTR) is the task of learning high-performance classifiers given extremely imbalanced training samples between categories. Most of the existing works address the problem by either enhancing the features of tail classes or re-balancing the classifiers to reduce the inductive bias. In this paper, we try to look into the root cause of the LTR task, i.e., training samples for each class are greatly imbalanced, and propose a straightforward solution. We split the categories into three groups, i.e., many, medium and few, according to the number of training images. The three groups of categories are separately predicted to reduce the difficulty for classification. This idea naturally arises a new problem of how to assign a given sample to the right class groups? We introduce a mutual exclusive modulator which can estimate the probability of an image belonging to each group. Particularly, the modulator consists of a light-weight module and learned with a mutual exclusive objective. Hence, the output probabilities of the modulator encode the data volume clues of the training dataset. They are further utilized as prior information to guide the prediction of the classifier. We conduct extensive experiments on multiple datasets, e.g., ImageNet-LT, Place-LT and iNaturalist 2018 to evaluate the proposed approach. Our method achieves competitive performance compared to the state-of-the-art benchmarks.
△ Less
Submitted 11 April, 2023; v1 submitted 19 February, 2023;
originally announced February 2023.
-
Cooperation and Competition: Flocking with Evolutionary Multi-Agent Reinforcement Learning
Authors:
Yunxiao Guo,
Xinjia Xie,
Runhao Zhao,
Chenglan Zhu,
Jiangting Yin,
Han Long
Abstract:
Flocking is a very challenging problem in a multi-agent system; traditional flocking methods also require complete knowledge of the environment and a precise model for control. In this paper, we propose Evolutionary Multi-Agent Reinforcement Learning (EMARL) in flocking tasks, a hybrid algorithm that combines cooperation and competition with little prior knowledge. As for cooperation, we design th…
▽ More
Flocking is a very challenging problem in a multi-agent system; traditional flocking methods also require complete knowledge of the environment and a precise model for control. In this paper, we propose Evolutionary Multi-Agent Reinforcement Learning (EMARL) in flocking tasks, a hybrid algorithm that combines cooperation and competition with little prior knowledge. As for cooperation, we design the agents' reward for flocking tasks according to the boids model. While for competition, agents with high fitness are designed as senior agents, and those with low fitness are designed as junior, letting junior agents inherit the parameters of senior agents stochastically. To intensify competition, we also design an evolutionary selection mechanism that shows effectiveness on credit assignment in flocking tasks. Experimental results in a range of challenging and self-contrast benchmarks demonstrate that EMARL significantly outperforms the full competition or cooperation methods.
△ Less
Submitted 13 September, 2022; v1 submitted 10 September, 2022;
originally announced September 2022.
-
CIM-PPO:Proximal Policy Optimization with Liu-Correntropy Induced Metric
Authors:
Yunxiao Guo,
Han Long,
Xiaojun Duan,
Kaiyuan Feng,
Maochu Li,
Xiaying Ma
Abstract:
As an algorithm based on deep reinforcement learning, Proximal Policy Optimization (PPO) performs well in many complex tasks and has become one of the most popular RL algorithms in recent years. According to the mechanism of penalty in surrogate objective, PPO can be divided into PPO with KL Divergence (KL-PPO) and PPO with Clip function(Clip-PPO). Clip-PPO is widely used in a variety of practical…
▽ More
As an algorithm based on deep reinforcement learning, Proximal Policy Optimization (PPO) performs well in many complex tasks and has become one of the most popular RL algorithms in recent years. According to the mechanism of penalty in surrogate objective, PPO can be divided into PPO with KL Divergence (KL-PPO) and PPO with Clip function(Clip-PPO). Clip-PPO is widely used in a variety of practical scenarios and has attracted the attention of many researchers. Therefore, many variations have also been created, making the algorithm better and better. However, as a more theoretical algorithm, KL-PPO was neglected because its performance was not as good as CliP-PPO. In this article, we analyze the asymmetry effect of KL divergence on PPO's objective function , and give the inequality that can indicate when the asymmetry will affect the efficiency of KL-PPO. Proposed PPO with Correntropy Induced Metric algorithm(CIM-PPO) that use the theory of correntropy(a symmetry metric method that was widely used in M-estimation to evaluate two distributions' difference)and applied it in PPO. Then, we designed experiments based on OpenAIgym to test the effectiveness of the new algorithm and compare it with KL-PPO and CliP-PPO.
△ Less
Submitted 3 March, 2022; v1 submitted 20 October, 2021;
originally announced October 2021.
-
Robust End-to-End Offline Chinese Handwriting Text Page Spotter with Text Kernel
Authors:
Zhihao Wang,
Yanwei Yu,
Yibo Wang,
Haixu Long,
Fazheng Wang
Abstract:
Offline Chinese handwriting text recognition is a long-standing research topic in the field of pattern recognition. In previous studies, text detection and recognition are separated, which leads to the fact that text recognition is highly dependent on the detection results. In this paper, we propose a robust end-to-end Chinese text page spotter framework. It unifies text detection and text recogni…
▽ More
Offline Chinese handwriting text recognition is a long-standing research topic in the field of pattern recognition. In previous studies, text detection and recognition are separated, which leads to the fact that text recognition is highly dependent on the detection results. In this paper, we propose a robust end-to-end Chinese text page spotter framework. It unifies text detection and text recognition with text kernel that integrates global text feature information to optimize the recognition from multiple scales, which reduces the dependence of detection and improves the robustness of the system. Our method achieves state-of-the-art results on the CASIA-HWDB2.0-2.2 dataset and ICDAR-2013 competition dataset. Without any language model, the correct rates are 99.12% and 94.27% for line-level recognition, and 99.03% and 94.20% for page-level recognition, respectively.
△ Less
Submitted 4 July, 2021;
originally announced July 2021.
-
SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II
Authors:
Xiangjun Wang,
Junxiao Song,
Penghui Qi,
Peng Peng,
Zhenkun Tang,
Wei Zhang,
Weimin Li,
Xiongjun Pi,
Jujie He,
Chao Gao,
Haitao Long,
Quan Yuan
Abstract:
AlphaStar, the AI that reaches GrandMaster level in StarCraft II, is a remarkable milestone demonstrating what deep reinforcement learning can achieve in complex Real-Time Strategy (RTS) games. However, the complexities of the game, algorithms and systems, and especially the tremendous amount of computation needed are big obstacles for the community to conduct further research in this direction. W…
▽ More
AlphaStar, the AI that reaches GrandMaster level in StarCraft II, is a remarkable milestone demonstrating what deep reinforcement learning can achieve in complex Real-Time Strategy (RTS) games. However, the complexities of the game, algorithms and systems, and especially the tremendous amount of computation needed are big obstacles for the community to conduct further research in this direction. We propose a deep reinforcement learning agent, StarCraft Commander (SCC). With order of magnitude less computation, it demonstrates top human performance defeating GrandMaster players in test matches and top professional players in a live event. Moreover, it shows strong robustness to various human strategies and discovers novel strategies unseen from human plays. In this paper, we will share the key insights and optimizations on efficient imitation learning and reinforcement learning for StarCraft II full game.
△ Less
Submitted 9 June, 2021; v1 submitted 24 December, 2020;
originally announced December 2020.
-
Implementation of Security Features in Software Development Phases
Authors:
Ariessa Davaindran Lingham,
Nelson Tang Kwong Kin,
Chen Wan Jing,
Chong Heng Loong,
Fatima-tuz-Zahra
Abstract:
Security holds an important role in a software. Most people are not aware of the significance of security in software system and tend to assume that they will be fine without security in their software systems. However, the lack of security features causes to expose all the vulnerabilities possible to the public. This provides opportunities for the attackers to perform dangerous activities to the…
▽ More
Security holds an important role in a software. Most people are not aware of the significance of security in software system and tend to assume that they will be fine without security in their software systems. However, the lack of security features causes to expose all the vulnerabilities possible to the public. This provides opportunities for the attackers to perform dangerous activities to the vulnerable insecure systems. This is the reason why many organizations are reported for being victims of system security attacks. In order to achieve the security requirement, developers must take time to study so that they truly understand the consequences and importance of security. Hence, this paper is written to discuss how secure software development can be performed. To reach the goal of this paper, relevant researches have been reviewed. Multiple case study papers have been studied to find out the answers to how the vulnerabilities are identified, how to eliminate them, when to implement security features, why do we implement them. Finally, the paper is concluded with final remarks on implementation of security features during software development process. It is expected that this paper will be a contribution towards the aforementioned software security domain which is often ignored during practical application.
△ Less
Submitted 24 December, 2020;
originally announced December 2020.
-
Webpage Segmentation for Extracting Images and Their Surrounding Contextual Information
Authors:
F. Fauzi,
H. J. Long,
M. Belkhatir
Abstract:
Web images come in hand with valuable contextual information. Although this information has long been mined for various uses such as image annotation, clustering of images, inference of image semantic content, etc., insufficient attention has been given to address issues in mining this contextual information. In this paper, we propose a webpage segmentation algorithm targeting the extraction of we…
▽ More
Web images come in hand with valuable contextual information. Although this information has long been mined for various uses such as image annotation, clustering of images, inference of image semantic content, etc., insufficient attention has been given to address issues in mining this contextual information. In this paper, we propose a webpage segmentation algorithm targeting the extraction of web images and their contextual information based on their characteristics as they appear on webpages. We conducted a user study to obtain a human-labeled dataset to validate the effectiveness of our method and experiments demonstrated that our method can achieve better results compared to an existing segmentation algorithm.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
On Geometric Structure of Activation Spaces in Neural Networks
Authors:
Yuting Jia,
Haiwen Wang,
Shuo Shao,
Huan Long,
Yunsong Zhou,
Xinbing Wang
Abstract:
In this paper, we investigate the geometric structure of activation spaces of fully connected layers in neural networks and then show applications of this study. We propose an efficient approximation algorithm to characterize the convex hull of massive points in high dimensional space. Based on this new algorithm, four common geometric properties shared by the activation spaces are concluded, whic…
▽ More
In this paper, we investigate the geometric structure of activation spaces of fully connected layers in neural networks and then show applications of this study. We propose an efficient approximation algorithm to characterize the convex hull of massive points in high dimensional space. Based on this new algorithm, four common geometric properties shared by the activation spaces are concluded, which gives a rather clear description of the activation spaces. We then propose an alternative classification method grounding on the geometric structure description, which works better than neural networks alone. Surprisingly, this data classification method can be an indicator of overfitting in neural networks. We believe our work reveals several critical intrinsic properties of modern neural networks and further gives a new metric for evaluating them.
△ Less
Submitted 2 April, 2019;
originally announced April 2019.
-
Time Reversal based MAC for Multi-Hop Underwater Acoustic Networks
Authors:
Ruiqin Zhao,
Hao Long,
Octavia A. Dobre,
Xiaohong Shen,
Telex M. N. Ngatched,
Haodi Mei
Abstract:
Constrained-energy underwater acoustic nodes are typically connected via a multi-hop underwater acoustic network (MHUAN) to cover a broad marine region. Recently, protocols for efficiently connecting such nodes have received considerable attention. In this paper, we show that the time reversal (TR) process plays an important role in the medium access control (MAC) because of its physical capabilit…
▽ More
Constrained-energy underwater acoustic nodes are typically connected via a multi-hop underwater acoustic network (MHUAN) to cover a broad marine region. Recently, protocols for efficiently connecting such nodes have received considerable attention. In this paper, we show that the time reversal (TR) process plays an important role in the medium access control (MAC) because of its physical capability to exploit the multi-path energy from the richly scattering underwater environment, as well as to focus the signal energy in both spatial and temporal domains. In MHUANs, with severe multi-path propagation at the physical layer, the active TR process spatially focuses the signals to the location of the intended receiver; this significantly diminishes the interference among parallel links. We propose an active TR-based MAC protocol for MHUANs, with the aim of minimizing collision and maximizing channel utilization simultaneously. Furthermore, by considering the impact of the cross-correlation between different links on the TR-based medium access, we derive the threshold of the link cross-correlation to resolve collision caused by the high cross-correlation between realistic links. We perform simulations using the OPNET and BELLHOP environments, and show that the proposed TR-based MAC results in significantly improved throughput, decreased delay, and reduced data drop ratio in MHUANs.
△ Less
Submitted 14 March, 2019;
originally announced March 2019.
-
Knowledge-based Fully Convolutional Network and Its Application in Segmentation of Lung CT Images
Authors:
Tao Yu,
Yu Qiao,
Huan Long
Abstract:
A variety of deep neural networks have been applied in medical image segmentation and achieve good performance. Unlike natural images, medical images of the same imaging modality are characterized by the same pattern, which indicates that same normal organs or tissues locate at similar positions in the images. Thus, in this paper we try to incorporate the prior knowledge of medical images into the…
▽ More
A variety of deep neural networks have been applied in medical image segmentation and achieve good performance. Unlike natural images, medical images of the same imaging modality are characterized by the same pattern, which indicates that same normal organs or tissues locate at similar positions in the images. Thus, in this paper we try to incorporate the prior knowledge of medical images into the structure of neural networks such that the prior knowledge can be utilized for accurate segmentation. Based on this idea, we propose a novel deep network called knowledge-based fully convolutional network (KFCN) for medical image segmentation. The segmentation function and corresponding error is analyzed. We show the existence of an asymptotically stable region for KFCN which traditional FCN doesn't possess. Experiments validate our knowledge assumption about the incorporation of prior knowledge into the convolution kernels of KFCN and show that KFCN can achieve a reasonable segmentation and a satisfactory accuracy.
△ Less
Submitted 22 May, 2018;
originally announced May 2018.
-
Curvature-based Comparison of Two Neural Networks
Authors:
Tao Yu,
Huan Long,
John E. Hopcroft
Abstract:
In this paper we show the similarities and differences of two deep neural networks by comparing the manifolds composed of activation vectors in each fully connected layer of them. The main contribution of this paper includes 1) a new data generating algorithm which is crucial for determining the dimension of manifolds; 2) a systematic strategy to compare manifolds. Especially, we take Riemann curv…
▽ More
In this paper we show the similarities and differences of two deep neural networks by comparing the manifolds composed of activation vectors in each fully connected layer of them. The main contribution of this paper includes 1) a new data generating algorithm which is crucial for determining the dimension of manifolds; 2) a systematic strategy to compare manifolds. Especially, we take Riemann curvature and sectional curvature as part of criterion, which can reflect the intrinsic geometric properties of manifolds. Some interesting results and phenomenon are given, which help in specifying the similarities and differences between the features extracted by two networks and demystifying the intrinsic mechanism of deep neural networks.
△ Less
Submitted 21 January, 2018;
originally announced January 2018.
-
An Enhanced LMMSE Channel Estimation under High Speed Railway Scenarios
Authors:
Qing Tang,
Hang Long,
Haojun Yang,
Yuli Li
Abstract:
With the rapid deployment of the high speed railway (HSR), the wireless communication in HSR has been one of the indispensable scenarios in the fifth generation (5G) communications. In order to improve the performance of the orthogonal frequency division multiplexing (OFDM) system in the HSR scenarios, we propose an enhanced linear minimum mean square error channel estimation scheme based on multi…
▽ More
With the rapid deployment of the high speed railway (HSR), the wireless communication in HSR has been one of the indispensable scenarios in the fifth generation (5G) communications. In order to improve the performance of the orthogonal frequency division multiplexing (OFDM) system in the HSR scenarios, we propose an enhanced linear minimum mean square error channel estimation scheme based on multi-path Doppler frequency offset (DFO) estimation in this paper. The proposed scheme can estimate DFO of each path, and generate the frequency and time channel correlation more accurately, which can improve the accuracy of channel estimation in the HSR scenarios. Simulation results show that the proposed scheme can reduce the channel estimation error and achieve attractive gain in the HSR scenarios.
△ Less
Submitted 1 December, 2017;
originally announced December 2017.
-
The Local Dimension of Deep Manifold
Authors:
Mengxiao Zhang,
Wangquan Wu,
Yanren Zhang,
Kun He,
Tao Yu,
Huan Long,
John E. Hopcroft
Abstract:
Based on our observation that there exists a dramatic drop for the singular values of the fully connected layers or a single feature map of the convolutional layer, and that the dimension of the concatenated feature vector almost equals the summation of the dimension on each feature map, we propose a singular value decomposition (SVD) based approach to estimate the dimension of the deep manifolds…
▽ More
Based on our observation that there exists a dramatic drop for the singular values of the fully connected layers or a single feature map of the convolutional layer, and that the dimension of the concatenated feature vector almost equals the summation of the dimension on each feature map, we propose a singular value decomposition (SVD) based approach to estimate the dimension of the deep manifolds for a typical convolutional neural network VGG19. We choose three categories from the ImageNet, namely Persian Cat, Container Ship and Volcano, and determine the local dimension of the deep manifolds of the deep layers through the tangent space of a target image. Through several augmentation methods, we found that the Gaussian noise method is closer to the intrinsic dimension, as by adding random noise to an image we are moving in an arbitrary dimension, and when the rank of the feature matrix of the augmented images does not increase we are very close to the local dimension of the manifold. We also estimate the dimension of the deep manifold based on the tangent space for each of the maxpooling layers. Our results show that the dimensions of different categories are close to each other and decline quickly along the convolutional layers and fully connected layers. Furthermore, we show that the dimensions decline quickly inside the Conv5 layer. Our work provides new insights for the intrinsic structure of deep neural networks and helps unveiling the inner organization of the black box of deep neural networks.
△ Less
Submitted 5 November, 2017;
originally announced November 2017.
-
Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games
Authors:
Peng Peng,
Ying Wen,
Yaodong Yang,
Quan Yuan,
Zhenkun Tang,
Haitao Long,
Jun Wang
Abstract:
Many artificial intelligence (AI) applications often require multiple intelligent agents to work in a collaborative effort. Efficient learning for intra-agent communication and coordination is an indispensable step towards general AI. In this paper, we take StarCraft combat game as a case study, where the task is to coordinate multiple agents as a team to defeat their enemies. To maintain a scalab…
▽ More
Many artificial intelligence (AI) applications often require multiple intelligent agents to work in a collaborative effort. Efficient learning for intra-agent communication and coordination is an indispensable step towards general AI. In this paper, we take StarCraft combat game as a case study, where the task is to coordinate multiple agents as a team to defeat their enemies. To maintain a scalable yet effective communication protocol, we introduce a Multiagent Bidirectionally-Coordinated Network (BiCNet ['bIknet]) with a vectorised extension of actor-critic formulation. We show that BiCNet can handle different types of combats with arbitrary numbers of AI agents for both sides. Our analysis demonstrates that without any supervisions such as human demonstrations or labelled data, BiCNet could learn various types of advanced coordination strategies that have been commonly used by experienced game players. In our experiments, we evaluate our approach against multiple baselines under different scenarios; it shows state-of-the-art performance, and possesses potential values for large-scale real-world applications.
△ Less
Submitted 14 September, 2017; v1 submitted 29 March, 2017;
originally announced March 2017.
-
On the Computation Power of Name Parameterization in Higher-order Processes
Authors:
Xian Xu,
Qiang Yin,
Huan Long
Abstract:
Parameterization extends higher-order processes with the capability of abstraction (akin to that in lambda-calculus), and is known to be able to enhance the expressiveness. This paper focuses on the parameterization of names, i.e. a construct that maps a name to a process, in the higher-order setting. We provide two results concerning its computation capacity. First, name parameterization brings u…
▽ More
Parameterization extends higher-order processes with the capability of abstraction (akin to that in lambda-calculus), and is known to be able to enhance the expressiveness. This paper focuses on the parameterization of names, i.e. a construct that maps a name to a process, in the higher-order setting. We provide two results concerning its computation capacity. First, name parameterization brings up a complete model, in the sense that it can express an elementary interactive model with built-in recursive functions. Second, we compare name parameterization with the well-known pi-calculus, and provide two encodings between them.
△ Less
Submitted 19 August, 2015;
originally announced August 2015.
-
Adaptive Spectrum Sharing of LTE Co-existing with WLAN in Unlicensed Frequency Bands
Authors:
Minyao Xing,
Yuexing Peng,
Teng Xia,
Hang Long,
Kan Zheng
Abstract:
With the increase of wireless communication demands, licensed spectrum for long term evolution (LTE) is no longer enough. The research effort has focused on implementing LTE to unlicensed frequency bands in recent years, which unavoidably brings the problem of LTE co-existence with other existing systems on the same band. This paper proposes an adaptive co-existence mechanism for LTE and wireless…
▽ More
With the increase of wireless communication demands, licensed spectrum for long term evolution (LTE) is no longer enough. The research effort has focused on implementing LTE to unlicensed frequency bands in recent years, which unavoidably brings the problem of LTE co-existence with other existing systems on the same band. This paper proposes an adaptive co-existence mechanism for LTE and wireless local area networks (WLAN) to enable a significant system performance of WLAN while LTE does not lose much as well. LTE realizes the co-existence by allocating time resources dynamically according to the traffic load of WLAN system.
△ Less
Submitted 26 March, 2015;
originally announced March 2015.
-
A Closed-Loop UL Power Control Scheme for Interference Mitigation in Dynamic TD-LTE Systems
Authors:
Qinqin Chen,
Hui Zhao,
Lin Li,
Hang Long,
Jianquan Wang,
Xiaoyue Hou
Abstract:
The TD-LTE system is envisaged to adopt dynamic time division duplexing (TDD) transmissions for small cells to adapt their communication service to the fast variation of downlink (DL) and uplink (UL) traffic demands. However, different DL/UL directions for the same subframe in adjacent cells will result in new destructive interference components, i.e., eNB-to-eNB and UE-to-UE, with levels that can…
▽ More
The TD-LTE system is envisaged to adopt dynamic time division duplexing (TDD) transmissions for small cells to adapt their communication service to the fast variation of downlink (DL) and uplink (UL) traffic demands. However, different DL/UL directions for the same subframe in adjacent cells will result in new destructive interference components, i.e., eNB-to-eNB and UE-to-UE, with levels that can significantly differ from one subframe to another. In this paper, a feasible UL power control mechanism is proposed to manage eNB-to-eNB interference, where different UL power control parameters are set based on different interference level. We consider the geometric location information and the subframe set selection process about adjacent eNBs when the interference level is estimated. The performance of the proposed scheme is evaluated through system level simulations and it is shown that the scheme can achieve preferable improvement in terms of UL average and 5%-ile packet throughputs compared with the original scheme without power control. Also, the UE-to-UE interference is not worse when the UE transmit power become higher.
△ Less
Submitted 26 March, 2015;
originally announced March 2015.
-
Divisible Load Scheduling in Mobile Grid based on Stackelberg Pricing Game
Authors:
Jiadi Chen,
Qiang Zheng,
Hang Long,
Wenbo Wang
Abstract:
Nowadays, it has become feasible to use mobile nodes as contributing entities in computing systems. In this paper, we consider a computational grid in which the mobile devices can share their idle resources to realize parallel processing. The overall computing task can be arbitrarily partitioned into multiple subtasks to be distributed to mobile resource providers (RPs). In this process, the compu…
▽ More
Nowadays, it has become feasible to use mobile nodes as contributing entities in computing systems. In this paper, we consider a computational grid in which the mobile devices can share their idle resources to realize parallel processing. The overall computing task can be arbitrarily partitioned into multiple subtasks to be distributed to mobile resource providers (RPs). In this process, the computation load scheduling problem is highlighted. Based on the optimization objective, i.e., minimizing the task makespan, a buyer-seller model in which the task sponsor can inspire the SPs to share their computing resources by paying certain profits, is proposed. The Stackelberg Pricing Game (SPG) is employed to obtain the optimal price and shared resource amount of each SP. Finally, we evaluate the performance of the proposed algorithm by system simulation and the results indicate that the SPG-based load scheduling algorithm can significantly improve the time gain in mobile grid systems.
△ Less
Submitted 17 March, 2015;
originally announced March 2015.
-
An SMDP-based Resource Management Scheme for Distributed Cloud Systems
Authors:
Jiadi Chen,
Hang Long,
Qiang Zheng,
Minyao Xing,
Wenbo Wang
Abstract:
In this paper, the resource management problem in geographically distributed cloud systems is considered. The Follow Me Cloud concept which enables service migration across federated data centers (DCs) is adopted. Therefore, there are two types of service requests to the DC, i.e., new requests (NRs) initiated in the local service area and migration requests (MRs) generated when mobile users move a…
▽ More
In this paper, the resource management problem in geographically distributed cloud systems is considered. The Follow Me Cloud concept which enables service migration across federated data centers (DCs) is adopted. Therefore, there are two types of service requests to the DC, i.e., new requests (NRs) initiated in the local service area and migration requests (MRs) generated when mobile users move across service areas. A novel resource management scheme is proposed to help the resource manager decide whether to accept the service requests (NRs or MRs) or not and determine how much resources should be allocated to each service (if accepted). The optimization objective is to maximize the average system reward and keep the rejection probability of service requests under a certain threshold. Numerical results indicate that the proposed scheme can significantly improve the overall system utility as well as the user experience compared with other resource management schemes.
△ Less
Submitted 17 March, 2015;
originally announced March 2015.