-
GTA: A Benchmark for General Tool Agents
Authors:
Jize Wang,
Zerun Ma,
Yining Li,
Songyang Zhang,
Cailian Chen,
Kai Chen,
Xinyi Le
Abstract:
Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, fa…
▽ More
Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, failing to reveal the agents' real-world problem-solving abilities effectively. To address this, we propose GTA, a benchmark for General Tool Agents, featuring three main aspects: (i) Real user queries: human-written queries with simple real-world objectives but implicit tool-use, requiring the LLM to reason the suitable tools and plan the solution steps. (ii) Real deployed tools: an evaluation platform equipped with tools across perception, operation, logic, and creativity categories to evaluate the agents' actual task execution performance. (iii) Real multimodal inputs: authentic image files, such as spatial scenes, web page screenshots, tables, code snippets, and printed/handwritten materials, used as the query contexts to align with real-world scenarios closely. We design 229 real-world tasks and executable tool chains to evaluate mainstream LLMs. Our findings show that real-world user queries are challenging for existing LLMs, with GPT-4 completing less than 50% of the tasks and most LLMs achieving below 25%. This evaluation reveals the bottlenecks in the tool-use capabilities of current LLMs in real-world scenarios, which provides future direction for advancing general-purpose tool agents. The code and dataset are available at https://github.com/open-compass/GTA.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
ShapeFormer: Shapelet Transformer for Multivariate Time Series Classification
Authors:
Xuan-May Le,
Ling Luo,
Uwe Aickelin,
Minh-Tuan Tran
Abstract:
Multivariate time series classification (MTSC) has attracted significant research attention due to its diverse real-world applications. Recently, exploiting transformers for MTSC has achieved state-of-the-art performance. However, existing methods focus on generic features, providing a comprehensive understanding of data, but they ignore class-specific features crucial for learning the representat…
▽ More
Multivariate time series classification (MTSC) has attracted significant research attention due to its diverse real-world applications. Recently, exploiting transformers for MTSC has achieved state-of-the-art performance. However, existing methods focus on generic features, providing a comprehensive understanding of data, but they ignore class-specific features crucial for learning the representative characteristics of each class. This leads to poor performance in the case of imbalanced datasets or datasets with similar overall patterns but differing in minor class-specific details. In this paper, we propose a novel Shapelet Transformer (ShapeFormer), which comprises class-specific and generic transformer modules to capture both of these features. In the class-specific module, we introduce the discovery method to extract the discriminative subsequences of each class (i.e. shapelets) from the training set. We then propose a Shapelet Filter to learn the difference features between these shapelets and the input time series. We found that the difference feature for each shapelet contains important class-specific features, as it shows a significant distinction between its class and others. In the generic module, convolution filters are used to extract generic features that contain information to distinguish among all classes. For each module, we employ the transformer encoder to capture the correlation between their features. As a result, the combination of two transformer modules allows our model to exploit the power of both types of features, thereby enhancing the classification performance. Our experiments on 30 UEA MTSC datasets demonstrate that ShapeFormer has achieved the highest accuracy ranking compared to state-of-the-art methods. The code is available at https://github.com/xuanmay2701/shapeformer.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Text-Enhanced Data-free Approach for Federated Class-Incremental Learning
Authors:
Minh-Tuan Tran,
Trung Le,
Xuan-May Le,
Mehrtash Harandi,
Dinh Phung
Abstract:
Federated Class-Incremental Learning (FCIL) is an underexplored yet pivotal issue, involving the dynamic addition of new classes in the context of federated learning. In this field, Data-Free Knowledge Transfer (DFKT) plays a crucial role in addressing catastrophic forgetting and data privacy problems. However, prior approaches lack the crucial synergy between DFKT and the model training phases, c…
▽ More
Federated Class-Incremental Learning (FCIL) is an underexplored yet pivotal issue, involving the dynamic addition of new classes in the context of federated learning. In this field, Data-Free Knowledge Transfer (DFKT) plays a crucial role in addressing catastrophic forgetting and data privacy problems. However, prior approaches lack the crucial synergy between DFKT and the model training phases, causing DFKT to encounter difficulties in generating high-quality data from a non-anchored latent space of the old task model. In this paper, we introduce LANDER (Label Text Centered Data-Free Knowledge Transfer) to address this issue by utilizing label text embeddings (LTE) produced by pretrained language models. Specifically, during the model training phase, our approach treats LTE as anchor points and constrains the feature embeddings of corresponding training samples around them, enriching the surrounding area with more meaningful information. In the DFKT phase, by using these LTE anchors, LANDER can synthesize more meaningful samples, thereby effectively addressing the forgetting problem. Additionally, instead of tightly constraining embeddings toward the anchor, the Bounding Loss is introduced to encourage sample embeddings to remain flexible within a defined radius. This approach preserves the natural differences in sample embeddings and mitigates the embedding overlap caused by heterogeneous federated settings. Extensive experiments conducted on CIFAR100, Tiny-ImageNet, and ImageNet demonstrate that LANDER significantly outperforms previous methods and achieves state-of-the-art performance in FCIL. The code is available at https://github.com/tmtuan1307/lander.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
RflyMAD: A Dataset for Multicopter Fault Detection and Health Assessment
Authors:
Xiangli Le,
Bo Jin,
Gen Cui,
Xunhua Dai,
Quan Quan
Abstract:
This paper presents an open-source dataset RflyMAD, a Multicopter Abnomal Dataset developed by Reliable Flight Control (Rfly) Group aiming to promote the development of research fields like fault detection and isolation (FDI) or health assessment (HA). The entire 114 GB dataset includes 11 types of faults under 6 flight statuses which are adapted from ADS-33 file to cover more occasions in which t…
▽ More
This paper presents an open-source dataset RflyMAD, a Multicopter Abnomal Dataset developed by Reliable Flight Control (Rfly) Group aiming to promote the development of research fields like fault detection and isolation (FDI) or health assessment (HA). The entire 114 GB dataset includes 11 types of faults under 6 flight statuses which are adapted from ADS-33 file to cover more occasions in which the multicopters have different mobility levels when faults occur. In the total 5629 flight cases, the fault time is up to 3283 minutes, and there are 2566 cases for software-in-the-loop (SIL) simulation, 2566 cases for hardware-in-the-loop (HIL) simulation and 497 cases for real flight. As it contains simulation data based on RflySim and real flight data, it is possible to improve the quantity while increasing the data quality. In each case, there are ULog, Telemetry log, Flight information and processed files for researchers to use and check. The RflyMAD dataset could be used as a benchmark for fault diagnosis methods and the support relationship between simulation data and real flight is verified through transfer learning methods. More methods as a baseline will be presented in the future, and RflyMAD will be updated with more data and types. In addition, the dataset and related toolkit can be accessed through https://rfly-openha.github.io/documents/4_resources/dataset.html.
△ Less
Submitted 11 January, 2024; v1 submitted 19 November, 2023;
originally announced November 2023.
-
NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation
Authors:
Minh-Tuan Tran,
Trung Le,
Xuan-May Le,
Mehrtash Harandi,
Quan Hung Tran,
Dinh Phung
Abstract:
Data-Free Knowledge Distillation (DFKD) has made significant recent strides by transferring knowledge from a teacher neural network to a student neural network without accessing the original data. Nonetheless, existing approaches encounter a significant challenge when attempting to generate samples from random noise inputs, which inherently lack meaningful information. Consequently, these models s…
▽ More
Data-Free Knowledge Distillation (DFKD) has made significant recent strides by transferring knowledge from a teacher neural network to a student neural network without accessing the original data. Nonetheless, existing approaches encounter a significant challenge when attempting to generate samples from random noise inputs, which inherently lack meaningful information. Consequently, these models struggle to effectively map this noise to the ground-truth sample distribution, resulting in prolonging training times and low-quality outputs. In this paper, we propose a novel Noisy Layer Generation method (NAYER) which relocates the random source from the input to a noisy layer and utilizes the meaningful constant label-text embedding (LTE) as the input. LTE is generated by using the language model once, and then it is stored in memory for all subsequent training processes. The significance of LTE lies in its ability to contain substantial meaningful inter-class information, enabling the generation of high-quality samples with only a few training steps. Simultaneously, the noisy layer plays a key role in addressing the issue of diversity in sample generation by preventing the model from overemphasizing the constrained label information. By reinitializing the noisy layer in each iteration, we aim to facilitate the generation of diverse samples while still retaining the method's efficiency, thanks to the ease of learning provided by LTE. Experiments carried out on multiple datasets demonstrate that our NAYER not only outperforms the state-of-the-art methods but also achieves speeds 5 to 15 times faster than previous approaches. The code is available at https://github.com/tmtuan1307/nayer.
△ Less
Submitted 21 March, 2024; v1 submitted 30 September, 2023;
originally announced October 2023.
-
Adversarial Attacks on Code Models with Discriminative Graph Patterns
Authors:
Thanh-Dat Nguyen,
Yang Zhou,
Xuan Bach D. Le,
Patanamon,
Thongtanunam,
David Lo
Abstract:
Pre-trained language models of code are now widely used in various software engineering tasks such as code generation, code completion, vulnerability detection, etc. This, in turn, poses security and reliability risks to these models. One of the important threats is \textit{adversarial attacks}, which can lead to erroneous predictions and largely affect model performance on downstream tasks. Curre…
▽ More
Pre-trained language models of code are now widely used in various software engineering tasks such as code generation, code completion, vulnerability detection, etc. This, in turn, poses security and reliability risks to these models. One of the important threats is \textit{adversarial attacks}, which can lead to erroneous predictions and largely affect model performance on downstream tasks. Current adversarial attacks on code models usually adopt fixed sets of program transformations, such as variable renaming and dead code insertion, leading to limited attack effectiveness. To address the aforementioned challenges, we propose a novel adversarial attack framework, GraphCodeAttack, to better evaluate the robustness of code models. Given a target code model, GraphCodeAttack automatically mines important code patterns, which can influence the model's decisions, to perturb the structure of input code to the model. To do so, GraphCodeAttack uses a set of input source codes to probe the model's outputs and identifies the \textit{discriminative} ASTs patterns that can influence the model decisions. GraphCodeAttack then selects appropriate AST patterns, concretizes the selected patterns as attacks, and inserts them as dead code into the model's input program. To effectively synthesize attacks from AST patterns, GraphCodeAttack uses a separate pre-trained code model to fill in the ASTs with concrete code snippets. We evaluate the robustness of two popular code models (e.g., CodeBERT and GraphCodeBERT) against our proposed approach on three tasks: Authorship Attribution, Vulnerability Prediction, and Clone Detection. The experimental results suggest that our proposed approach significantly outperforms state-of-the-art approaches in attacking code models such as CARROT and ALERT.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues
Authors:
Yue Liu,
Thanh Le-Cong,
Ratnadira Widyasari,
Chakkrit Tantithamthavorn,
Li Li,
Xuan-Bach D. Le,
David Lo
Abstract:
We systematically study the quality of 4,066 ChatGPT-generated code implemented in two popular programming languages, i.e., Java and Python, for 2,033 programming tasks. The goal of this work is three folds. First, we analyze the correctness of ChatGPT on code generation tasks and uncover the factors that influence its effectiveness, including task difficulty, programming language, time that tasks…
▽ More
We systematically study the quality of 4,066 ChatGPT-generated code implemented in two popular programming languages, i.e., Java and Python, for 2,033 programming tasks. The goal of this work is three folds. First, we analyze the correctness of ChatGPT on code generation tasks and uncover the factors that influence its effectiveness, including task difficulty, programming language, time that tasks are introduced, and program size. Second, we identify and characterize potential issues with the quality of ChatGPT-generated code. Last, we provide insights into how these issues can be mitigated. Experiments highlight that out of 4,066 programs generated by ChatGPT, 2,756 programs are deemed correct, 1,082 programs provide wrong outputs, and 177 programs contain compilation or runtime errors. Additionally, we further analyze other characteristics of the generated code through static analysis tools, such as code style and maintainability, and find that 1,930 ChatGPT-generated code snippets suffer from maintainability issues. Subsequently, we investigate ChatGPT's self-repairing ability and its interaction with static analysis tools to fix the errors uncovered in the previous step. Experiments suggest that ChatGPT can partially address these challenges, improving code quality by more than 20%, but there are still limitations and opportunities for improvement. Overall, our study provides valuable insights into the current limitations of ChatGPT and offers a roadmap for future research and development efforts to enhance the code generation capabilities of AI models like ChatGPT.
△ Less
Submitted 14 December, 2023; v1 submitted 24 July, 2023;
originally announced July 2023.
-
Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model
Authors:
Xiaohuai Le,
Tong Lei,
Li Chen,
Yiqing Guo,
Chao He,
Cheng Chen,
Xianjun Xia,
Hua Gao,
Yijian Xiao,
Piao Ding,
Shenyi Song,
Jing Lu
Abstract:
With fewer feature dimensions, filter banks are often used in light-weight full-band speech enhancement models. In order to further enhance the coarse speech in the sub-band domain, it is necessary to apply a post-filtering for harmonic retrieval. The signal processing-based comb filters used in RNNoise and PercepNet have limited performance and may cause speech quality degradation due to inaccura…
▽ More
With fewer feature dimensions, filter banks are often used in light-weight full-band speech enhancement models. In order to further enhance the coarse speech in the sub-band domain, it is necessary to apply a post-filtering for harmonic retrieval. The signal processing-based comb filters used in RNNoise and PercepNet have limited performance and may cause speech quality degradation due to inaccurate fundamental frequency estimation. To tackle this problem, we propose a learnable comb filter to enhance harmonics. Based on the sub-band model, we design a DNN-based fundamental frequency estimator to estimate the discrete fundamental frequencies and a comb filter for harmonic enhancement, which are trained via an end-to-end pattern. The experiments show the advantages of our proposed method over PecepNet and DeepFilterNet.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Multi-Granularity Detector for Vulnerability Fixes
Authors:
Truong Giang Nguyen,
Thanh Le-Cong,
Hong Jin Kang,
Ratnadira Widyasari,
Chengran Yang,
Zhipeng Zhao,
Bowen Xu,
Jiayuan Zhou,
Xin Xia,
Ahmed E. Hassan,
Xuan-Bach D. Le,
David Lo
Abstract:
With the increasing reliance on Open Source Software, users are exposed to third-party library vulnerabilities. Software Composition Analysis (SCA) tools have been created to alert users of such vulnerabilities. SCA requires the identification of vulnerability-fixing commits. Prior works have proposed methods that can automatically identify such vulnerability-fixing commits. However, identifying s…
▽ More
With the increasing reliance on Open Source Software, users are exposed to third-party library vulnerabilities. Software Composition Analysis (SCA) tools have been created to alert users of such vulnerabilities. SCA requires the identification of vulnerability-fixing commits. Prior works have proposed methods that can automatically identify such vulnerability-fixing commits. However, identifying such commits is highly challenging, as only a very small minority of commits are vulnerability fixing. Moreover, code changes can be noisy and difficult to analyze. We observe that noise can occur at different levels of detail, making it challenging to detect vulnerability fixes accurately.
To address these challenges and boost the effectiveness of prior works, we propose MiDas (Multi-Granularity Detector for Vulnerability Fixes). Unique from prior works, Midas constructs different neural networks for each level of code change granularity, corresponding to commit-level, file-level, hunk-level, and line-level, following their natural organization. It then utilizes an ensemble model that combines all base models to generate the final prediction. This design allows MiDas to better handle the noisy and highly imbalanced nature of vulnerability-fixing commit data. Additionally, to reduce the human effort required to inspect code changes, we have designed an effort-aware adjustment for Midas's outputs based on commit length. The evaluation results demonstrate that MiDas outperforms the current state-of-the-art baseline in terms of AUC by 4.9% and 13.7% on Java and Python-based datasets, respectively. Furthermore, in terms of two effort-aware metrics, EffortCost@L and Popt@L, MiDas also outperforms the state-of-the-art baseline, achieving improvements of up to 28.2% and 15.9% on Java, and 60% and 51.4% on Python, respectively.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Personalized speech enhancement combining band-split RNN and speaker attentive module
Authors:
Xiaohuai Le,
Li Chen,
Chao He,
Yiqing Guo,
Cheng Chen,
Xianjun Xia,
Jing Lu
Abstract:
Target speaker information can be utilized in speech enhancement (SE) models to more effectively extract the desired speech. Previous works introduce the speaker embedding into speech enhancement models by means of concatenation or affine transformation. In this paper, we propose a speaker attentive module to calculate the attention scores between the speaker embedding and the intermediate feature…
▽ More
Target speaker information can be utilized in speech enhancement (SE) models to more effectively extract the desired speech. Previous works introduce the speaker embedding into speech enhancement models by means of concatenation or affine transformation. In this paper, we propose a speaker attentive module to calculate the attention scores between the speaker embedding and the intermediate features, which are used to rescale the features. By merging this module in the state-of-the-art SE model, we construct the personalized SE model for ICASSP Signal Processing Grand Challenge: DNS Challenge 5 (2023). Our system achieves a final score of 0.529 on the blind test set of track1 and 0.549 on track2.
△ Less
Submitted 16 March, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
CHRONOS: Time-Aware Zero-Shot Identification of Libraries from Vulnerability Reports
Authors:
Yunbo Lyu,
Thanh Le-Cong,
Hong Jin Kang,
Ratnadira Widyasari,
Zhipeng Zhao,
Xuan-Bach D. Le,
Ming Li,
David Lo
Abstract:
Tools that alert developers about library vulnerabilities depend on accurate, up-to-date vulnerability databases which are maintained by security researchers. These databases record the libraries related to each vulnerability. However, the vulnerability reports may not explicitly list every library and human analysis is required to determine all the relevant libraries. Human analysis may be slow a…
▽ More
Tools that alert developers about library vulnerabilities depend on accurate, up-to-date vulnerability databases which are maintained by security researchers. These databases record the libraries related to each vulnerability. However, the vulnerability reports may not explicitly list every library and human analysis is required to determine all the relevant libraries. Human analysis may be slow and expensive, which motivates the need for automated approaches. Researchers and practitioners have proposed to automatically identify libraries from vulnerability reports using extreme multi-label learning (XML).
While state-of-the-art XML techniques showed promising performance, their experiment settings do not practically fit what happens in reality. Previous studies randomly split the vulnerability reports data for training and testing their models without considering the chronological order of the reports. This may unduly train the models on chronologically newer reports while testing the models on chronologically older ones. However, in practice, one often receives chronologically new reports, which may be related to previously unseen libraries. Under this practical setting, we observe that the performance of current XML techniques declines substantially, e.g., F1 decreased from 0.7 to 0.28 under experiments without and with consideration of chronological order of vulnerability reports.
We propose a practical library identification approach, namely CHRONOS, based on zero-shot learning. The novelty of CHRONOS is three-fold. First, CHRONOS fits into the practical pipeline by considering the chronological order of vulnerability reports. Second, CHRONOS enriches the data of the vulnerability descriptions and labels using a carefully designed data enhancement step. Third, CHRONOS exploits the temporal ordering of the vulnerability reports using a cache to prioritize prediction of...
△ Less
Submitted 29 July, 2023; v1 submitted 10 January, 2023;
originally announced January 2023.
-
Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning
Authors:
Thanh Le-Cong,
Duc-Minh Luong,
Xuan Bach D. Le,
David Lo,
Nhat-Hoa Tran,
Bui Quang-Huy,
Quyet-Thang Huynh
Abstract:
Automated program repair (APR) faces the challenge of test overfitting, where generated patches pass validation tests but fail to generalize. Existing methods for patch assessment involve generating new tests or manual inspection, which can be time-consuming or biased. In this paper, we propose a novel technique, INVALIDATOR, to automatically assess the correctness of APR-generated patches via sem…
▽ More
Automated program repair (APR) faces the challenge of test overfitting, where generated patches pass validation tests but fail to generalize. Existing methods for patch assessment involve generating new tests or manual inspection, which can be time-consuming or biased. In this paper, we propose a novel technique, INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR leverages program invariants to reason about program semantics while also capturing program syntax through language semantics learned from a large code corpus using a pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that an APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains erroneous behaviors from the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is threefold. First, INVALIDATOR leverages both semantic and syntactic reasoning to enhance its discriminative capability. Second, INVALIDATOR does not require new test cases to be generated, but instead only relies on the current test suite and uses invariant inference to generalize program behaviors. Third, INVALIDATOR is fully automated. Experimental results demonstrate that INVALIDATOR outperforms existing methods in terms of Accuracy and F-measure, correctly identifying 79% of overfitting patches and detecting 23% more overfitting patches than the best baseline.
△ Less
Submitted 17 March, 2023; v1 submitted 3 January, 2023;
originally announced January 2023.
-
An Efficient Cyclic Entailment Procedure in a Fragment of Separation Logic
Authors:
Quang Loc Le,
Xuan-Bach D. Le
Abstract:
An efficient entailment proof system is essential to compositional verification using separation logic. Unfortunately, existing decision procedures are either inexpressive or inefficient. For example, Smallfoot is an efficient procedure but only works with hardwired lists and trees. Other procedures that can support general inductive predicates run exponentially in time as their proof search requi…
▽ More
An efficient entailment proof system is essential to compositional verification using separation logic. Unfortunately, existing decision procedures are either inexpressive or inefficient. For example, Smallfoot is an efficient procedure but only works with hardwired lists and trees. Other procedures that can support general inductive predicates run exponentially in time as their proof search requires back-tracking to deal with a disjunction in the consequent.
This paper presents a decision procedure to derive cyclic entailment proofs for general inductive predicates in polynomial time. Our procedure is efficient and does not require back-tracking; it uses normalisation rules that help avoid the introduction of disjunction in the consequent. Moreover, our decidable fragment is sufficiently expressive: It is based on compositional predicates and can capture a wide range of data structures, including sorted and nested list segments, skip lists with fast forward pointers, and binary search trees. We have implemented the proposal in a prototype tool and evaluated it over challenging problems taken from a recent separation logic competition. The experimental results confirm the efficiency of the proposed system.
△ Less
Submitted 2 October, 2022;
originally announced October 2022.
-
VulCurator: A Vulnerability-Fixing Commit Detector
Authors:
Truong Giang Nguyen,
Thanh Le-Cong,
Hong Jin Kang,
Xuan-Bach D. Le,
David Lo
Abstract:
Open-source software (OSS) vulnerability management process is important nowadays, as the number of discovered OSS vulnerabilities is increasing over time. Monitoring vulnerability-fixing commits is a part of the standard process to prevent vulnerability exploitation. Manually detecting vulnerability-fixing commits is, however, time consuming due to the possibly large number of commits to review.…
▽ More
Open-source software (OSS) vulnerability management process is important nowadays, as the number of discovered OSS vulnerabilities is increasing over time. Monitoring vulnerability-fixing commits is a part of the standard process to prevent vulnerability exploitation. Manually detecting vulnerability-fixing commits is, however, time consuming due to the possibly large number of commits to review. Recently, many techniques have been proposed to automatically detect vulnerability-fixing commits using machine learning. These solutions either: (1) did not use deep learning, or (2) use deep learning on only limited sources of information. This paper proposes VulCurator, a tool that leverages deep learning on richer sources of information, including commit messages, code changes and issue reports for vulnerability-fixing commit classifica- tion. Our experimental results show that VulCurator outperforms the state-of-the-art baselines up to 16.1% in terms of F1-score. VulCurator tool is publicly available at https://github.com/ntgiang71096/VFDetector and https://zenodo.org/record/7034132#.Yw3MN-xBzDI, with a demo video at https://youtu.be/uMlFmWSJYOE.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
AutoPruner: Transformer-Based Call Graph Pruning
Authors:
Thanh Le-Cong,
Hong Jin Kang,
Truong Giang Nguyen,
Stefanus Agus Haryono,
David Lo,
Xuan-Bach D. Le,
Huynh Quyet Thang
Abstract:
Constructing a static call graph requires trade-offs between soundness and precision. Program analysis techniques for constructing call graphs are unfortunately usually imprecise. To address this problem, researchers have recently proposed call graph pruning empowered by machine learning to post-process call graphs constructed by static analysis. A machine learning model is built to capture inform…
▽ More
Constructing a static call graph requires trade-offs between soundness and precision. Program analysis techniques for constructing call graphs are unfortunately usually imprecise. To address this problem, researchers have recently proposed call graph pruning empowered by machine learning to post-process call graphs constructed by static analysis. A machine learning model is built to capture information from the call graph by extracting structural features for use in a random forest classifier. It then removes edges that are predicted to be false positives. Despite the improvements shown by machine learning models, they are still limited as they do not consider the source code semantics and thus often are not able to effectively distinguish true and false positives. In this paper, we present a novel call graph pruning technique, AutoPruner, for eliminating false positives in call graphs via both statistical semantic and structural analysis. Given a call graph constructed by traditional static analysis tools, AutoPruner takes a Transformer-based approach to capture the semantic relationships between the caller and callee functions associated with each edge in the call graph. To do so, AutoPruner fine-tunes a model of code that was pre-trained on a large corpus to represent source code based on descriptions of its semantics. Next, the model is used to extract semantic features from the functions related to each edge in the call graph. AutoPruner uses these semantic features together with the structural features extracted from the call graph to classify each edge via a feed-forward neural network. Our empirical evaluation on a benchmark dataset of real-world programs shows that AutoPruner outperforms the state-of-the-art baselines, improving on F-measure by up to 13% in identifying false-positive edges in a static call graph.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
ADTR: Anomaly Detection Transformer with Feature Reconstruction
Authors:
Zhiyuan You,
Kai Yang,
Wenhan Luo,
Lei Cui,
Yu Zheng,
Xinyi Le
Abstract:
Anomaly detection with only prior knowledge from normal samples attracts more attention because of the lack of anomaly samples. Existing CNN-based pixel reconstruction approaches suffer from two concerns. First, the reconstruction source and target are raw pixel values that contain indistinguishable semantic information. Second, CNN tends to reconstruct both normal samples and anomalies well, maki…
▽ More
Anomaly detection with only prior knowledge from normal samples attracts more attention because of the lack of anomaly samples. Existing CNN-based pixel reconstruction approaches suffer from two concerns. First, the reconstruction source and target are raw pixel values that contain indistinguishable semantic information. Second, CNN tends to reconstruct both normal samples and anomalies well, making them still hard to distinguish. In this paper, we propose Anomaly Detection TRansformer (ADTR) to apply a transformer to reconstruct pre-trained features. The pre-trained features contain distinguishable semantic information. Also, the adoption of transformer limits to reconstruct anomalies well such that anomalies could be detected easily once the reconstruction fails. Moreover, we propose novel loss functions to make our approach compatible with the normal-sample-only case and the anomaly-available case with both image-level and pixel-level labeled anomalies. The performance could be further improved by adding simple synthetic or external irrelevant anomalies. Extensive experiments are conducted on anomaly detection datasets including MVTec-AD and CIFAR-10. Our method achieves superior performance compared with all baselines.
△ Less
Submitted 9 December, 2022; v1 submitted 5 September, 2022;
originally announced September 2022.
-
Inference skipping for more efficient real-time speech enhancement with parallel RNNs
Authors:
Xiaohuai Le,
Tong Lei,
Kai Chen,
Jing Lu
Abstract:
Deep neural network (DNN) based speech enhancement models have attracted extensive attention due to their promising performance. However, it is difficult to deploy a powerful DNN in real-time applications because of its high computational cost. Typical compression methods such as pruning and quantization do not make good use of the data characteristics. In this paper, we introduce the Skip-RNN str…
▽ More
Deep neural network (DNN) based speech enhancement models have attracted extensive attention due to their promising performance. However, it is difficult to deploy a powerful DNN in real-time applications because of its high computational cost. Typical compression methods such as pruning and quantization do not make good use of the data characteristics. In this paper, we introduce the Skip-RNN strategy into speech enhancement models with parallel RNNs. The states of the RNNs update intermittently without interrupting the update of the output mask, which leads to significant reduction of computational load without evident audio artifacts. To better leverage the difference between the voice and the noise, we further regularize the skipping strategy with voice activity detection (VAD) guidance, saving more computational load. Experiments on a high-performance speech enhancement model, dual-path convolutional recurrent network (DPCRN), show the superiority of our strategy over strategies like network pruning or directly training a smaller model. We also validate the generalization of the proposed strategy on two other competitive speech enhancement models.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
A light-weight full-band speech enhancement model
Authors:
Qinwen Hu,
Zhongshu Hou,
Xiaohuai Le,
Jing Lu
Abstract:
Deep neural network based full-band speech enhancement systems face challenges of high demand of computational resources and imbalanced frequency distribution. In this paper, a light-weight full-band model is proposed with two dedicated strategies, i.e., a learnable spectral compression mapping for more effective high-band spectral information compression, and the utilization of the multi-head att…
▽ More
Deep neural network based full-band speech enhancement systems face challenges of high demand of computational resources and imbalanced frequency distribution. In this paper, a light-weight full-band model is proposed with two dedicated strategies, i.e., a learnable spectral compression mapping for more effective high-band spectral information compression, and the utilization of the multi-head attention mechanism for more effective modeling of the global spectral pattern. Experiments validate the efficacy of the proposed strategies and show that the proposed model achieves competitive performance with only 0.89M parameters.
△ Less
Submitted 3 July, 2022; v1 submitted 29 June, 2022;
originally announced June 2022.
-
A Unified Model for Multi-class Anomaly Detection
Authors:
Zhiyuan You,
Lei Cui,
Yujun Shen,
Kai Yang,
Xin Lu,
Yu Zheng,
Xinyi Le
Abstract:
Despite the rapid advance of unsupervised anomaly detection, existing methods require to train separate models for different objects. In this work, we present UniAD that accomplishes anomaly detection for multiple classes with a unified framework. Under such a challenging setting, popular reconstruction networks may fall into an "identical shortcut", where both normal and anomalous samples can be…
▽ More
Despite the rapid advance of unsupervised anomaly detection, existing methods require to train separate models for different objects. In this work, we present UniAD that accomplishes anomaly detection for multiple classes with a unified framework. Under such a challenging setting, popular reconstruction networks may fall into an "identical shortcut", where both normal and anomalous samples can be well recovered, and hence fail to spot outliers. To tackle this obstacle, we make three improvements. First, we revisit the formulations of fully-connected layer, convolutional layer, as well as attention layer, and confirm the important role of query embedding (i.e., within attention layer) in preventing the network from learning the shortcut. We therefore come up with a layer-wise query decoder to help model the multi-class distribution. Second, we employ a neighbor masked attention module to further avoid the information leak from the input feature to the reconstructed output feature. Third, we propose a feature jittering strategy that urges the model to recover the correct message even with noisy inputs. We evaluate our algorithm on MVTec-AD and CIFAR-10 datasets, where we surpass the state-of-the-art alternatives by a sufficiently large margin. For example, when learning a unified model for 15 categories in MVTec-AD, we surpass the second competitor on the tasks of both anomaly detection (from 88.1% to 96.5%) and anomaly localization (from 89.5% to 96.8%). Code is available at https://github.com/zhiyuanyou/UniAD.
△ Less
Submitted 25 October, 2022; v1 submitted 8 June, 2022;
originally announced June 2022.
-
The Z-axis, X-axis, Weight and Disambiguation Methods for Constructing Local Reference Frame in 3D Registration: An Evaluation
Authors:
Bao Zhao,
Xianyong Fang,
Jiahui Yue,
Xiaobo Chen,
Xinyi Le,
Chanjuan Zhao
Abstract:
The local reference frame (LRF), as an independent coordinate system generated on a local 3D surface, is widely used in 3D local feature descriptor construction and 3D transformation estimation which are two key steps in the local method-based surface matching. There are numerous LRF methods have been proposed in literatures. In these methods, the x- and z-axis are commonly generated by different…
▽ More
The local reference frame (LRF), as an independent coordinate system generated on a local 3D surface, is widely used in 3D local feature descriptor construction and 3D transformation estimation which are two key steps in the local method-based surface matching. There are numerous LRF methods have been proposed in literatures. In these methods, the x- and z-axis are commonly generated by different methods or strategies, and some x-axis methods are implemented on the basis of a z-axis being given. In addition, the weight and disambiguation methods are commonly used in these LRF methods. In existing evaluations of LRF, each LRF method is evaluated with a complete form. However, the merits and demerits of the z-axis, x-axis, weight and disambiguation methods in LRF construction are unclear. In this paper, we comprehensively analyze the z-axis, x-axis, weight and disambiguation methods in existing LRFs, and obtain six z-axis and eight x-axis, five weight and two disambiguation methods. The performance of these methods are comprehensively evaluated on six standard datasets with different application scenarios and nuisances. Considering the evaluation outcomes, the merits and demerits of different weight, disambiguation, z- and x-axis methods are analyzed and summarized. The experimental result also shows that some new designed LRF axes present superior performance compared with the state-of-the-art ones.
△ Less
Submitted 17 April, 2022;
originally announced April 2022.
-
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
Authors:
Yuchao Wang,
Haochen Wang,
Yujun Shen,
Jingjing Fei,
Wei Li,
Guoqiang Jin,
Liwei Wu,
Rui Zhao,
Xinyi Le
Abstract:
The crux of semi-supervised semantic segmentation is to assign adequate pseudo-labels to the pixels of unlabeled images. A common practice is to select the highly confident predictions as the pseudo ground-truth, but it leads to a problem that most pixels may be left unused due to their unreliability. We argue that every pixel matters to the model training, even its prediction is ambiguous. Intuit…
▽ More
The crux of semi-supervised semantic segmentation is to assign adequate pseudo-labels to the pixels of unlabeled images. A common practice is to select the highly confident predictions as the pseudo ground-truth, but it leads to a problem that most pixels may be left unused due to their unreliability. We argue that every pixel matters to the model training, even its prediction is ambiguous. Intuitively, an unreliable prediction may get confused among the top classes (i.e., those with the highest probabilities), however, it should be confident about the pixel not belonging to the remaining classes. Hence, such a pixel can be convincingly treated as a negative sample to those most unlikely categories. Based on this insight, we develop an effective pipeline to make sufficient use of unlabeled data. Concretely, we separate reliable and unreliable pixels via the entropy of predictions, push each unreliable pixel to a category-wise queue that consists of negative samples, and manage to train the model with all candidate pixels. Considering the training evolution, where the prediction becomes more and more accurate, we adaptively adjust the threshold for the reliable-unreliable partition. Experimental results on various benchmarks and training settings demonstrate the superiority of our approach over the state-of-the-art alternatives.
△ Less
Submitted 14 March, 2022; v1 submitted 8 March, 2022;
originally announced March 2022.
-
Few-shot Object Counting with Similarity-Aware Feature Enhancement
Authors:
Zhiyuan You,
Kai Yang,
Wenhan Luo,
Xin Lu,
Lei Cui,
Xinyi Le
Abstract:
This work studies the problem of few-shot object counting, which counts the number of exemplar objects (i.e., described by one or several support images) occurring in the query image. The major challenge lies in that the target objects can be densely packed in the query image, making it hard to recognize every single one. To tackle the obstacle, we propose a novel learning block, equipped with a s…
▽ More
This work studies the problem of few-shot object counting, which counts the number of exemplar objects (i.e., described by one or several support images) occurring in the query image. The major challenge lies in that the target objects can be densely packed in the query image, making it hard to recognize every single one. To tackle the obstacle, we propose a novel learning block, equipped with a similarity comparison module and a feature enhancement module. Concretely, given a support image and a query image, we first derive a score map by comparing their projected features at every spatial position. The score maps regarding all support images are collected together and normalized across both the exemplar dimension and the spatial dimensions, producing a reliable similarity map. We then enhance the query feature with the support features by employing the developed point-wise similarities as the weighting coefficients. Such a design encourages the model to inspect the query image by focusing more on the regions akin to the support images, leading to much clearer boundaries between different objects. Extensive experiments on various benchmarks and training setups suggest that we surpass the state-of-the-art methods by a sufficiently large margin. For instance, on a recent large-scale FSC-147 dataset, we surpass the state-of-the-art method by improving the mean absolute error from 22.08 to 14.32 (35%$\uparrow$). Code has been released in https://github.com/zhiyuanyou/SAFECount.
△ Less
Submitted 10 September, 2022; v1 submitted 21 January, 2022;
originally announced January 2022.
-
Usability and Aesthetics: Better Together for Automated Repair of Web Pages
Authors:
Thanh Le-Cong,
Xuan Bach D. Le,
Quyet-Thang Huynh,
Phi-Le Nguyen
Abstract:
With the recent explosive growth of mobile devices such as smartphones or tablets, guaranteeing consistent web appearance across all environments has become a significant problem. This happens simply because it is hard to keep track of the web appearance on different sizes and types of devices that render the web pages. Therefore, fixing the inconsistent appearance of web pages can be difficult, a…
▽ More
With the recent explosive growth of mobile devices such as smartphones or tablets, guaranteeing consistent web appearance across all environments has become a significant problem. This happens simply because it is hard to keep track of the web appearance on different sizes and types of devices that render the web pages. Therefore, fixing the inconsistent appearance of web pages can be difficult, and the cost incurred can be huge, e.g., poor user experience and financial loss due to it. Recently, automated web repair techniques have been proposed to automatically resolve inconsistent web page appearance, focusing on improving usability. However, generated patches tend to disrupt the webpage's layout, rendering the repaired webpage aesthetically unpleasing, e.g., distorted images or misalignment of components.
In this paper, we propose an automated repair approach for web pages based on meta-heuristic algorithms that can assure both usability and aesthetics. The key novelty that empowers our approach is a novel fitness function that allows us to optimistically evolve buggy web pages to find the best solution that optimizes both usability and aesthetics at the same time. Empirical evaluations show that our approach is able to successfully resolve mobile-friendly problems in 94% of the evaluation subjects, significantly outperforming state-of-the-art baseline techniques in terms of both usability and aesthetics.
△ Less
Submitted 1 January, 2022;
originally announced January 2022.
-
Toward the Analysis of Graph Neural Networks
Authors:
Thanh-Dat Nguyen,
Thanh Le-Cong,
ThanhVu H. Nguyen,
Xuan-Bach D. Le,
Quyet-Thang Huynh
Abstract:
Graph Neural Networks (GNNs) have recently emerged as a robust framework for graph-structured data. They have been applied to many problems such as knowledge graph analysis, social networks recommendation, and even Covid19 detection and vaccine developments. However, unlike other deep neural networks such as Feed Forward Neural Networks (FFNNs), few analyses such as verification and property infer…
▽ More
Graph Neural Networks (GNNs) have recently emerged as a robust framework for graph-structured data. They have been applied to many problems such as knowledge graph analysis, social networks recommendation, and even Covid19 detection and vaccine developments. However, unlike other deep neural networks such as Feed Forward Neural Networks (FFNNs), few analyses such as verification and property inferences exist, potentially due to dynamic behaviors of GNNs, which can take arbitrary graphs as input, whereas FFNNs which only take fixed size numerical vectors as inputs.
This paper proposes an approach to analyze GNNs by converting them into FFNNs and reusing existing FFNNs analyses. We discuss various designs to ensure the scalability and accuracy of the conversions. We illustrate our method on a study case of node classification. We believe that our approach opens new research directions for understanding and analyzing GNNs.
△ Less
Submitted 31 December, 2021;
originally announced January 2022.
-
DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement
Authors:
Xiaohuai Le,
Hongsheng Chen,
Kai Chen,
Jing Lu
Abstract:
The dual-path RNN (DPRNN) was proposed to more effectively model extremely long sequences for speech separation in the time domain. By splitting long sequences to smaller chunks and applying intra-chunk and inter-chunk RNNs, the DPRNN reached promising performance in speech separation with a limited model size. In this paper, we combine the DPRNN module with Convolution Recurrent Network (CRN) and…
▽ More
The dual-path RNN (DPRNN) was proposed to more effectively model extremely long sequences for speech separation in the time domain. By splitting long sequences to smaller chunks and applying intra-chunk and inter-chunk RNNs, the DPRNN reached promising performance in speech separation with a limited model size. In this paper, we combine the DPRNN module with Convolution Recurrent Network (CRN) and design a model called Dual-Path Convolution Recurrent Network (DPCRN) for speech enhancement in the time-frequency domain. We replace the RNNs in the CRN with DPRNN modules, where the intra-chunk RNNs are used to model the spectrum pattern in a single frame and the inter-chunk RNNs are used to model the dependence between consecutive frames. With only 0.8M parameters, the submitted DPCRN model achieves an overall mean opinion score (MOS) of 3.57 in the wide band scenario track of the Interspeech 2021 Deep Noise Suppression (DNS) challenge. Evaluations on some other test sets also show the efficacy of our model.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
Complexity Analysis of Tree Share Structure
Authors:
Xuan-Bach Le,
Aquinas Hobor,
Anthony W. Lin
Abstract:
The tree share structure proposed by Dockins et al. is an elegant model for tracking disjoint ownership in concurrent separation logic, but decision procedures for tree shares are hard to implement due to a lack of a systematic theoretical study. We show that the first-order theory of the full Boolean algebra of tree shares (that is, with all tree-share constants) is decidable and has the same com…
▽ More
The tree share structure proposed by Dockins et al. is an elegant model for tracking disjoint ownership in concurrent separation logic, but decision procedures for tree shares are hard to implement due to a lack of a systematic theoretical study. We show that the first-order theory of the full Boolean algebra of tree shares (that is, with all tree-share constants) is decidable and has the same complexity as of the first-order theory of Countable Atomless Boolean Algebras. We prove that combining this additive structure with a constant-restricted unary multiplicative "relativization" operator has a non-elementary lower bound. We examine the consequences of this lower bound and prove that it comes from the combination of both theories by proving an upper bound on a generalization of the restricted multiplicative theory in isolation.
△ Less
Submitted 5 October, 2020;
originally announced October 2020.
-
Transfer learning with class-weighted and focal loss function for automatic skin cancer classification
Authors:
Duyen N. T. Le,
Hieu X. Le,
Lua T. Ngo,
Hoan T. Ngo
Abstract:
Skin cancer is by far in top-3 of the world's most common cancer. Among different skin cancer types, melanoma is particularly dangerous because of its ability to metastasize. Early detection is the key to success in skin cancer treatment. However, skin cancer diagnosis is still a challenge, even for experienced dermatologists, due to strong resemblances between benign and malignant lesions. To aid…
▽ More
Skin cancer is by far in top-3 of the world's most common cancer. Among different skin cancer types, melanoma is particularly dangerous because of its ability to metastasize. Early detection is the key to success in skin cancer treatment. However, skin cancer diagnosis is still a challenge, even for experienced dermatologists, due to strong resemblances between benign and malignant lesions. To aid dermatologists in skin cancer diagnosis, we developed a deep learning system that can effectively and automatically classify skin lesions into one of the seven classes: (1) Actinic Keratoses, (2) Basal Cell Carcinoma, (3) Benign Keratosis, (4) Dermatofibroma, (5) Melanocytic nevi, (6) Melanoma, (7) Vascular Skin Lesion. The HAM10000 dataset was used to train the system. An end-to-end deep learning process, transfer learning technique, utilizing multiple pre-trained models, combining with class-weighted and focal loss were applied for the classification process. The result was that our ensemble of modified ResNet50 models can classify skin lesions into one of the seven classes with top-1, top-2 and top-3 accuracy 93%, 97% and 99%, respectively. This deep learning system can potentially be integrated into computer-aided diagnosis systems that support dermatologists in skin cancer diagnosis.
△ Less
Submitted 13 September, 2020;
originally announced September 2020.
-
Interpretation of smartphone-captured radiographs utilizing a deep learning-based approach
Authors:
Hieu X. Le,
Phuong D. Nguyen,
Thang H. Nguyen,
Khanh N. Q. Le,
Thanh T. Nguyen
Abstract:
Recently, computer-aided diagnostic systems (CADs) that could automatically interpret medical images effectively have been the emerging subject of recent academic attention. For radiographs, several deep learning-based systems or models have been developed to study the multi-label diseases recognition tasks. However, none of them have been trained to work on smartphone-captured chest radiographs.…
▽ More
Recently, computer-aided diagnostic systems (CADs) that could automatically interpret medical images effectively have been the emerging subject of recent academic attention. For radiographs, several deep learning-based systems or models have been developed to study the multi-label diseases recognition tasks. However, none of them have been trained to work on smartphone-captured chest radiographs. In this study, we proposed a system that comprises a sequence of deep learning-based neural networks trained on the newly released CheXphoto dataset to tackle this issue. The proposed approach achieved promising results of 0.684 in AUC and 0.699 in average F1 score. To the best of our knowledge, this is the first published study that showed to be capable of processing smartphone-captured radiographs.
△ Less
Submitted 13 September, 2020;
originally announced September 2020.
-
A novel approach to remove foreign objects from chest X-ray images
Authors:
Hieu X. Le,
Phuong D. Nguyen,
Thang H. Nguyen,
Khanh N. Q. Le,
Thanh T. Nguyen
Abstract:
We initially proposed a deep learning approach for foreign objects inpainting in smartphone-camera captured chest radiographs utilizing the cheXphoto dataset. Foreign objects which can significantly affect the quality of a computer-aided diagnostic prediction are captured under various settings. In this paper, we used multi-method to tackle both removal and inpainting chest radiographs. Firstly, a…
▽ More
We initially proposed a deep learning approach for foreign objects inpainting in smartphone-camera captured chest radiographs utilizing the cheXphoto dataset. Foreign objects which can significantly affect the quality of a computer-aided diagnostic prediction are captured under various settings. In this paper, we used multi-method to tackle both removal and inpainting chest radiographs. Firstly, an object detection model is trained to separate the foreign objects from the given image. Subsequently, the binary mask of each object is extracted utilizing a segmentation model. Each pair of the binary mask and the extracted object are then used for inpainting purposes. Finally, the in-painted regions are now merged back to the original image, resulting in a clean and non-foreign-object-existing output. To conclude, we achieved state-of-the-art accuracy. The experimental results showed a new approach to the possible applications of this method for chest X-ray images detection.
△ Less
Submitted 15 August, 2020;
originally announced August 2020.
-
PF-Net: Point Fractal Network for 3D Point Cloud Completion
Authors:
Zitian Huang,
Yikuan Yu,
Jiawen Xu,
Feng Ni,
Xinyi Le
Abstract:
In this paper, we propose a Point Fractal Network (PF-Net), a novel learning-based approach for precise and high-fidelity point cloud completion. Unlike existing point cloud completion networks, which generate the overall shape of the point cloud from the incomplete point cloud and always change existing points and encounter noise and geometrical loss, PF-Net preserves the spatial arrangements of…
▽ More
In this paper, we propose a Point Fractal Network (PF-Net), a novel learning-based approach for precise and high-fidelity point cloud completion. Unlike existing point cloud completion networks, which generate the overall shape of the point cloud from the incomplete point cloud and always change existing points and encounter noise and geometrical loss, PF-Net preserves the spatial arrangements of the incomplete point cloud and can figure out the detailed geometrical structure of the missing region(s) in the prediction. To succeed at this task, PF-Net estimates the missing point cloud hierarchically by utilizing a feature-points-based multi-scale generating network. Further, we add up multi-stage completion loss and adversarial loss to generate more realistic missing region(s). The adversarial loss can better tackle multiple modes in the prediction. Our experiments demonstrate the effectiveness of our method for several challenging point cloud completion tasks.
△ Less
Submitted 1 March, 2020;
originally announced March 2020.
-
Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks
Authors:
Yihui He,
Jianing Qian,
Jianren Wang,
Cindy X. Le,
Congrui Hetang,
Qi Lyu,
Wenping Wang,
Tianwei Yue
Abstract:
Very deep convolutional neural networks (CNNs) have been firmly established as the primary methods for many computer vision tasks. However, most state-of-the-art CNNs are large, which results in high inference latency. Recently, depth-wise separable convolution has been proposed for image recognition tasks on computationally limited platforms such as robotics and self-driving cars. Though it is mu…
▽ More
Very deep convolutional neural networks (CNNs) have been firmly established as the primary methods for many computer vision tasks. However, most state-of-the-art CNNs are large, which results in high inference latency. Recently, depth-wise separable convolution has been proposed for image recognition tasks on computationally limited platforms such as robotics and self-driving cars. Though it is much faster than its counterpart, regular convolution, accuracy is sacrificed. In this paper, we propose a novel decomposition approach based on SVD, namely depth-wise decomposition, for expanding regular convolutions into depthwise separable convolutions while maintaining high accuracy. We show our approach can be further generalized to the multi-channel and multi-layer cases, based on Generalized Singular Value Decomposition (GSVD) [59]. We conduct thorough experiments with the latest ShuffleNet V2 model [47] on both random synthesized dataset and a large-scale image recognition dataset: ImageNet [10]. Our approach outperforms channel decomposition [73] on all datasets. More importantly, our approach improves the Top-1 accuracy of ShuffleNet V2 by ~2%.
△ Less
Submitted 23 September, 2023; v1 submitted 21 October, 2019;
originally announced October 2019.
-
Adaptive neural network based dynamic surface control for uncertain dual arm robots
Authors:
Dung Tien Pham,
Thai Van Nguyen,
Hai Xuan Le,
Linh Nguyen,
Nguyen Huu Thai,
Tuan Anh Phan,
Hai Tuan Pham,
Anh Hoai Duong
Abstract:
The paper discusses an adaptive strategy to effectively control nonlinear manipulation motions of a dual arm robot (DAR) under system uncertainties including parameter variations, actuator nonlinearities and external disturbances. It is proposed that the control scheme is first derived from the dynamic surface control (DSC) method, which allows the robot's end-effectors to robustly track the desir…
▽ More
The paper discusses an adaptive strategy to effectively control nonlinear manipulation motions of a dual arm robot (DAR) under system uncertainties including parameter variations, actuator nonlinearities and external disturbances. It is proposed that the control scheme is first derived from the dynamic surface control (DSC) method, which allows the robot's end-effectors to robustly track the desired trajectories. Moreover, since exactly determining the DAR system's dynamics is impractical due to the system uncertainties, the uncertain system parameters are then proposed to be adaptively estimated by the use of the radial basis function network (RBFN). The adaptation mechanism is derived from the Lyapunov theory, which theoretically guarantees stability of the closed-loop control system. The effectiveness of the proposed RBFN-DSC approach is demonstrated by implementing the algorithm in a synthetic environment with realistic parameters, where the obtained results are highly promising.
△ Less
Submitted 8 May, 2019;
originally announced May 2019.
-
Monadic Decomposability of Regular Relations
Authors:
Pablo Barcelo,
Chih-Duo Hong,
Xuan-Bach Le,
Anthony W. Lin,
Reino Niskanen
Abstract:
Monadic decomposibility --- the ability to determine whether a formula in a given logical theory can be decomposed into a boolean combination of monadic formulas --- is a powerful tool for devising a decision procedure for a given logical theory. In this paper, we revisit a classical decision problem in automata theory: given a regular (a.k.a. synchronized rational) relation, determine whether it…
▽ More
Monadic decomposibility --- the ability to determine whether a formula in a given logical theory can be decomposed into a boolean combination of monadic formulas --- is a powerful tool for devising a decision procedure for a given logical theory. In this paper, we revisit a classical decision problem in automata theory: given a regular (a.k.a. synchronized rational) relation, determine whether it is recognizable, i.e., it has a monadic decomposition (that is, a representation as a boolean combination of cartesian products of regular languages). Regular relations are expressive formalisms which, using an appropriate string encoding, can capture relations definable in Presburger Arithmetic. In fact, their expressive power coincide with relations definable in a universal automatic structure; equivalently, those definable by finite set interpretations in WS1S (Weak Second Order Theory of One Successor). Determining whether a regular relation admits a recognizable relation was known to be decidable (and in exponential time for binary relations), but its precise complexity still hitherto remains open. Our main contribution is to fully settle the complexity of this decision problem by developing new techniques employing infinite Ramsey theory. The complexity for DFA (resp. NFA) representations of regular relations is shown to be NLOGSPACE-complete (resp. \PSPACE-complete).
△ Less
Submitted 8 May, 2019; v1 submitted 2 March, 2019;
originally announced March 2019.
-
A Comprehensive Performance Evaluation for 3D Transformation Estimation Techniques
Authors:
Bao Zhao,
Xiaobo Chen,
Xinyi Le,
Juntong Xi
Abstract:
3D local feature extraction and matching is the basis for solving many tasks in the area of computer vision, such as 3D registration, modeling, recognition and retrieval. However, this process commonly draws into false correspondences, due to noise, limited features, occlusion, incomplete surface and etc. In order to estimate accurate transformation based on these corrupted correspondences, numero…
▽ More
3D local feature extraction and matching is the basis for solving many tasks in the area of computer vision, such as 3D registration, modeling, recognition and retrieval. However, this process commonly draws into false correspondences, due to noise, limited features, occlusion, incomplete surface and etc. In order to estimate accurate transformation based on these corrupted correspondences, numerous transformation estimation techniques have been proposed. However, the merits, demerits and appropriate application for these methods are unclear owing to that no comprehensive evaluation for the performance of these methods has been conducted. This paper evaluates eleven state-of-the-art transformation estimation proposals on both descriptor based and synthetic correspondences. On descriptor based correspondences, several evaluation items (including the performance on different datasets, robustness to different overlap ratios and the performance of these technique combined with Iterative Closest Point (ICP), different local features and LRF/A techniques) of these methods are tested on four popular datasets acquired with different devices. On synthetic correspondences, the robustness of these methods to varying percentages of correct correspondences (PCC) is evaluated. In addition, we also evaluate the efficiencies of these methods. Finally, the merits, demerits and application guidance of these tested transformation estimation methods are summarized.
△ Less
Submitted 15 January, 2019;
originally announced January 2019.
-
On Reliability of Patch Correctness Assessment
Authors:
Xuan Bach D. Le,
Lingfeng Bao,
David Lo,
Xin Xia,
Shanping Li
Abstract:
Current state-of-the-art automatic software repair (ASR) techniques rely heavily on incomplete specifications, e.g., test suites, to generate repairs. This, however, may render ASR tools to generate incorrect repairs that do not generalize. To assess patch correctness, researchers have been following two typical ways separately: (1) Automated annotation, wherein patches are automatically labeled b…
▽ More
Current state-of-the-art automatic software repair (ASR) techniques rely heavily on incomplete specifications, e.g., test suites, to generate repairs. This, however, may render ASR tools to generate incorrect repairs that do not generalize. To assess patch correctness, researchers have been following two typical ways separately: (1) Automated annotation, wherein patches are automatically labeled by an independent test suite (ITS) - a patch passing the ITS is regarded as correct or generalizable, and incorrect otherwise, (2) Author annotation, wherein authors of ASR techniques annotate correctness labels of patches generated by their and competing tools by themselves. While automated annotation fails to prove that a patch is actually correct, author annotation is prone to subjectivity. This concern has caused an on-going debate on appropriate ways to assess the effectiveness of numerous ASR techniques proposed recently. To address this concern, we propose to assess reliability of author and automated annotations on patch correctness assessment. We do this by first constructing a gold set of correctness labels for 189 randomly selected patches generated by 8 state-of-the-art ASR techniques through a user study involving 35 professional developers as independent annotators. By measuring inter-rater agreement as a proxy for annotation quality - as commonly done in the literature - we demonstrate that our constructed gold set is on par with other high-quality gold sets. We then compare labels generated by author and automated annotations with this gold set to assess reliability of the patch assessment methodologies. We subsequently report several findings and highlight implications for future studies.
△ Less
Submitted 27 June, 2018; v1 submitted 15 May, 2018;
originally announced May 2018.
-
A Novel SDASS Descriptor for Fully Encoding the Information of 3D Local Surface
Authors:
Bao Zhao,
Xinyi Le,
Juntong Xi
Abstract:
Local feature description is a fundamental yet challenging task in 3D computer vision. This paper proposes a novel descriptor, named Statistic of Deviation Angles on Subdivided Space (SDASS), of encoding geometrical and spatial information of local surface on Local Reference Axis (LRA). In terms of encoding geometrical information, considering that surface normals, which are usually used for encod…
▽ More
Local feature description is a fundamental yet challenging task in 3D computer vision. This paper proposes a novel descriptor, named Statistic of Deviation Angles on Subdivided Space (SDASS), of encoding geometrical and spatial information of local surface on Local Reference Axis (LRA). In terms of encoding geometrical information, considering that surface normals, which are usually used for encoding geometrical information of local surface, are vulnerable to various nuisances (e.g., noise, varying mesh resolutions etc.), we propose a robust geometrical attribute, called Local Minimum Axis (LMA), to replace the normals for generating the geometrical feature in our SDASS descriptor. For encoding spatial information, we use two spatial features for fully encoding the spatial information of a local surface based on LRA which usually presents high overall repeatability than Local Reference Axis (LRF). Besides, an improved LRA is proposed for increasing the robustness of our SDASS to noise and varying mesh resolutions. The performance of the SDASS descriptor is rigorously tested on four popular datasets. The results show that our descriptor has a high descriptiveness and strong robustness, and its performance outperform existing algorithms by a large margin. Finally, the proposed descriptor is applied to 3D registration. The accurate result further confirms the effectiveness of our SDASS method.
△ Less
Submitted 26 June, 2018; v1 submitted 14 November, 2017;
originally announced November 2017.
-
Path planning and Obstacle avoidance approaches for Mobile robot
Authors:
Hoc Thai Nguyen,
Hai Xuan Le
Abstract:
A new path planning method for Mobile Robots (MR) has been developed and implemented. On the one hand, based on the shortest path from the start point to the goal point, this path planner can choose the best moving directions of the MR, which helps to reach the target point as soon as possible. On the other hand, with an intelligent obstacle avoidance, our method can find the target point with the…
▽ More
A new path planning method for Mobile Robots (MR) has been developed and implemented. On the one hand, based on the shortest path from the start point to the goal point, this path planner can choose the best moving directions of the MR, which helps to reach the target point as soon as possible. On the other hand, with an intelligent obstacle avoidance, our method can find the target point with the near-shortest path length while avoiding some infinite loop traps of several obstacles in unknown environments. The combination of two approaches helps the MR to reach the target point with a very reliable algorithm. Moreover, by continuous updates of the on-board sensors information, this approach can generate the MRs trajectory both in static and dynamic environments. A large number of simulations in some similar studies environments demonstrate the power of the proposed path planning algorithm.
△ Less
Submitted 7 September, 2016;
originally announced September 2016.