subscribe to arXiv mailings

arXiv:2403.19374 [pdf, other]

A noise-tolerant, resource-saving probabilistic binary neural network implemented by the SOT-MRAM compute-in-memory system

Authors: Yu Gu, Puyang Huang, Tianhao Chen, Chenyi Fu, Aitian Chen, Shouzhong Peng, Xixiang Zhang, Xufeng Kou

Abstract: We report a spin-orbit torque(SOT) magnetoresistive random-access memory(MRAM)-based probabilistic binary neural network(PBNN) for resource-saving and hardware noise-tolerant computing applications. With the presence of thermal fluctuation, the non-destructive SOT-driven magnetization switching characteristics lead to a random weight matrix with controllable probability distribution. In the meanwh… ▽ More We report a spin-orbit torque(SOT) magnetoresistive random-access memory(MRAM)-based probabilistic binary neural network(PBNN) for resource-saving and hardware noise-tolerant computing applications. With the presence of thermal fluctuation, the non-destructive SOT-driven magnetization switching characteristics lead to a random weight matrix with controllable probability distribution. In the meanwhile, the proposed CIM architecture allows for the concurrent execution of the probabilistic vector-matrix multiplication (PVMM) and binarization. Furthermore, leveraging the effectiveness of random binary cells to propagate multi-bit probabilistic information, our SOT-MRAM-based PBNN system achieves a 97.78\% classification accuracy under a 7.01\% weight variation on the MNIST database through 10 sampling cycles, and the number of bit-level computation operations is reduced by a factor of 6.9 compared to that of the full-precision LeNet-5 network. Our work provides a compelling framework for the design of reliable neural networks tailored to the applications with low power consumption and limited computational resources. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 5 pages, 10 figures

MSC Class: 94C60 ACM Class: B.2.4; B.3.0

arXiv:2311.11572 [pdf, other]

Cryogenic quasi-static embedded DRAM for energy-efficient compute-in-memory applications

Authors: Yuhao Shu, Hongtu Zhang, Hao Sun, Mengru Zhang, Wenfeng Zhao, Qi Deng, Zhidong Tang, Yumeng Yuan, Yongqi Hu, Yu Gu, Xufeng Kou, Yajun Ha

Abstract: Compute-in-memory (CIM) presents an attractive approach for energy-efficient computing in data-intensive applications. However, the development of suitable memory designs to achieve high-performance CIM remains a challenging task. Here, we propose a cryogenic quasi-static embedded DRAM to address the logic-memory mismatch of CIM. Guided by the re-calibrated cryogenic device model, the designed fou… ▽ More Compute-in-memory (CIM) presents an attractive approach for energy-efficient computing in data-intensive applications. However, the development of suitable memory designs to achieve high-performance CIM remains a challenging task. Here, we propose a cryogenic quasi-static embedded DRAM to address the logic-memory mismatch of CIM. Guided by the re-calibrated cryogenic device model, the designed four-transistor bit-cell achieves full-swing data storage, low power consumption, and extended retention time at cryogenic temperatures. Combined with the adoption of cryogenic write bitline biasing technique and readout circuitry optimization, our 4Kb cryogenic eDRAM chip demonstrates a 1.37$\times$10$^6$ times improvement in retention time, while achieving a 75 times improvement in retention variability, compared to room-temperature operation. Moreover, it also achieves outstanding power performance with a retention power of 112 fW and a dynamic power of 108 $μ$W at 4.2 K, which can be further decreased by 7.1% and 13.6% using the dynamic voltage scaling technique. This work reveals the great potential of cryogenic CMOS for high-density data storage and lays a solid foundation for energy-efficient CIM implementations. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2111.10078 [pdf, other]

Defeating Catastrophic Forgetting via Enhanced Orthogonal Weights Modification

Authors: Yanni Li, Bing Liu, Kaicheng Yao, Xiaoli Kou, Pengfan Lv, Yueshen Xu, Jiangtao Cui

Abstract: The ability of neural networks (NNs) to learn and remember multiple tasks sequentially is facing tough challenges in achieving general artificial intelligence due to their catastrophic forgetting (CF) issues. Fortunately, the latest OWM Orthogonal Weights Modification) and other several continual learning (CL) methods suggest some promising ways to overcome the CF issue. However, none of existing… ▽ More The ability of neural networks (NNs) to learn and remember multiple tasks sequentially is facing tough challenges in achieving general artificial intelligence due to their catastrophic forgetting (CF) issues. Fortunately, the latest OWM Orthogonal Weights Modification) and other several continual learning (CL) methods suggest some promising ways to overcome the CF issue. However, none of existing CL methods explores the following three crucial questions for effectively overcoming the CF issue: that is, what knowledge does it contribute to the effective weights modification of the NN during its sequential tasks learning? When the data distribution of a new learning task changes corresponding to the previous learned tasks, should a uniform/specific weight modification strategy be adopted or not? what is the upper bound of the learningable tasks sequentially for a given CL method? ect. To achieve this, in this paper, we first reveals the fact that of the weight gradient of a new learning task is determined by both the input space of the new task and the weight space of the previous learned tasks sequentially. On this observation and the recursive least square optimal method, we propose a new efficient and effective continual learning method EOWM via enhanced OWM. And we have theoretically and definitively given the upper bound of the learningable tasks sequentially of our EOWM. Extensive experiments conducted on the benchmarks demonstrate that our EOWM is effectiveness and outperform all of the state-of-the-art CL baselines. △ Less

Submitted 19 November, 2021; originally announced November 2021.

arXiv:2111.10075 [pdf, other]

Enhanced countering adversarial attacks via input denoising and feature restoring

Authors: Yanni Li, Wenhui Zhang, Jiawei Liu, Xiaoli Kou, Hui Li, Jiangtao Cui

Abstract: Despite the fact that deep neural networks (DNNs) have achieved prominent performance in various applications, it is well known that DNNs are vulnerable to adversarial examples/samples (AEs) with imperceptible perturbations in clean/original samples. To overcome the weakness of the existing defense methods against adversarial attacks, which damages the information on the original samples, leading… ▽ More Despite the fact that deep neural networks (DNNs) have achieved prominent performance in various applications, it is well known that DNNs are vulnerable to adversarial examples/samples (AEs) with imperceptible perturbations in clean/original samples. To overcome the weakness of the existing defense methods against adversarial attacks, which damages the information on the original samples, leading to the decrease of the target classifier accuracy, this paper presents an enhanced countering adversarial attack method IDFR (via Input Denoising and Feature Restoring). The proposed IDFR is made up of an enhanced input denoiser (ID) and a hidden lossy feature restorer (FR) based on the convex hull optimization. Extensive experiments conducted on benchmark datasets show that the proposed IDFR outperforms the various state-of-the-art defense methods, and is highly effective for protecting target models against various adversarial black-box or white-box attacks. \footnote{Souce code is released at: \href{https://github.com/ID-FR/IDFR}{https://github.com/ID-FR/IDFR}} △ Less

Submitted 19 November, 2021; originally announced November 2021.

arXiv:2109.05922 [pdf, other]

r-GAT: Relational Graph Attention Network for Multi-Relational Graphs

Authors: Meiqi Chen, Yuan Zhang, Xiaoyu Kou, Yuntao Li, Yan Zhang

Abstract: Graph Attention Network (GAT) focuses on modelling simple undirected and single relational graph data only. This limits its ability to deal with more general and complex multi-relational graphs that contain entities with directed links of different labels (e.g., knowledge graphs). Therefore, directly applying GAT on multi-relational graphs leads to sub-optimal solutions. To tackle this issue, we p… ▽ More Graph Attention Network (GAT) focuses on modelling simple undirected and single relational graph data only. This limits its ability to deal with more general and complex multi-relational graphs that contain entities with directed links of different labels (e.g., knowledge graphs). Therefore, directly applying GAT on multi-relational graphs leads to sub-optimal solutions. To tackle this issue, we propose r-GAT, a relational graph attention network to learn multi-channel entity representations. Specifically, each channel corresponds to a latent semantic aspect of an entity. This enables us to aggregate neighborhood information for the current aspect using relation features. We further propose a query-aware attention mechanism for subsequent tasks to select useful aspects. Extensive experiments on link prediction and entity classification tasks show that our r-GAT can model multi-relational graphs effectively. Also, we show the interpretability of our approach by case study. △ Less

Submitted 13 September, 2021; originally announced September 2021.

arXiv:2102.11033 [pdf]

IFoodCloud: A Platform for Real-time Sentiment Analysis of Public Opinion about Food Safety in China

Authors: Dachuan Zhang, Haoyang Zhang, Zhisheng Wei, Yan Li, Zhiheng Mao, Chunmeng He, Haorui Ma, Xin Zeng, Xiaoling Xie, Xingran Kou, Bingwen Zhang

Abstract: The Internet contains a wealth of public opinion on food safety, including views on food adulteration, food-borne diseases, agricultural pollution, irregular food distribution, and food production issues. In order to systematically collect and analyse public opinion on food safety, we developed IFoodCloud, a platform for the real-time sentiment analysis of public opinion on food safety in China. I… ▽ More The Internet contains a wealth of public opinion on food safety, including views on food adulteration, food-borne diseases, agricultural pollution, irregular food distribution, and food production issues. In order to systematically collect and analyse public opinion on food safety, we developed IFoodCloud, a platform for the real-time sentiment analysis of public opinion on food safety in China. It collects data from more than 3,100 public sources that can be used to explore public opinion trends, public sentiment, and regional attention differences of food safety incidents. At the same time, we constructed a sentiment classification model using multiple lexicon-based and deep learning-based algorithms integrated with IFoodCloud that provide an unprecedented rapid means of understanding the public sentiment toward specific food safety incidents. Our best model's F1-score achieved 0.9737. Further, three real-world cases are presented to demonstrate the application and robustness. IFoodCloud could be considered a valuable tool for promote scientisation of food safety supervision and risk communication. △ Less

Submitted 16 February, 2021; originally announced February 2021.

arXiv:2010.14730

DisenE: Disentangling Knowledge Graph Embeddings

Authors: Xiaoyu Kou, Yankai Lin, Yuntao Li, Jiahao Xu, Peng Li, Jie Zhou, Yan Zhang

Abstract: Knowledge graph embedding (KGE), aiming to embed entities and relations into low-dimensional vectors, has attracted wide attention recently. However, the existing research is mainly based on the black-box neural models, which makes it difficult to interpret the learned representation. In this paper, we introduce DisenE, an end-to-end framework to learn disentangled knowledge graph embeddings. Spec… ▽ More Knowledge graph embedding (KGE), aiming to embed entities and relations into low-dimensional vectors, has attracted wide attention recently. However, the existing research is mainly based on the black-box neural models, which makes it difficult to interpret the learned representation. In this paper, we introduce DisenE, an end-to-end framework to learn disentangled knowledge graph embeddings. Specially, we introduce an attention-based mechanism that enables the model to explicitly focus on relevant components of entity embeddings according to a given relation. Furthermore, we introduce two novel regularizers to encourage each component of the entity representation to independently reflect an isolated semantic aspect. Experimental results demonstrate that our proposed DisenE investigates a perspective to address the interpretability of KGE and is proved to be an effective way to improve the performance of link prediction tasks. △ Less

Submitted 12 November, 2020; v1 submitted 27 October, 2020; originally announced October 2020.

Comments: There are some mistakes in the paper

arXiv:2010.02565 [pdf, other]

Disentangle-based Continual Graph Representation Learning

Authors: Xiaoyu Kou, Yankai Lin, Shaobo Liu, Peng Li, Jie Zhou, Yan Zhang

Abstract: Graph embedding (GE) methods embed nodes (and/or edges) in graph into a low-dimensional semantic space, and have shown its effectiveness in modeling multi-relational data. However, existing GE models are not practical in real-world applications since it overlooked the streaming nature of incoming data. To address this issue, we study the problem of continual graph representation learning which aim… ▽ More Graph embedding (GE) methods embed nodes (and/or edges) in graph into a low-dimensional semantic space, and have shown its effectiveness in modeling multi-relational data. However, existing GE models are not practical in real-world applications since it overlooked the streaming nature of incoming data. To address this issue, we study the problem of continual graph representation learning which aims to continually train a GE model on new data to learn incessantly emerging multi-relational data while avoiding catastrophically forgetting old learned knowledge. Moreover, we propose a disentangle-based continual graph representation learning (DiCGRL) framework inspired by the human's ability to learn procedural knowledge. The experimental results show that DiCGRL could effectively alleviate the catastrophic forgetting problem and outperform state-of-the-art continual learning models. △ Less

Submitted 24 November, 2020; v1 submitted 6 October, 2020; originally announced October 2020.

arXiv:2008.07723 [pdf, other]

NASE: Learning Knowledge Graph Embedding for Link Prediction via Neural Architecture Search

Authors: Xiaoyu Kou, Bingfeng Luo, Huang Hu, Yan Zhang

Abstract: Link prediction is the task of predicting missing connections between entities in the knowledge graph (KG). While various forms of models are proposed for the link prediction task, most of them are designed based on a few known relation patterns in several well-known datasets. Due to the diversity and complexity nature of the real-world KGs, it is inherently difficult to design a model that fits a… ▽ More Link prediction is the task of predicting missing connections between entities in the knowledge graph (KG). While various forms of models are proposed for the link prediction task, most of them are designed based on a few known relation patterns in several well-known datasets. Due to the diversity and complexity nature of the real-world KGs, it is inherently difficult to design a model that fits all datasets well. To address this issue, previous work has tried to use Automated Machine Learning (AutoML) to search for the best model for a given dataset. However, their search space is limited only to bilinear model families. In this paper, we propose a novel Neural Architecture Search (NAS) framework for the link prediction task. First, the embeddings of the input triplet are refined by the Representation Search Module. Then, the prediction score is searched within the Score Function Search Module. This framework entails a more general search space, which enables us to take advantage of several mainstream model families, and thus it can potentially achieve better performance. We relax the search space to be continuous so that the architecture can be optimized efficiently using gradient-based search strategies. Experimental results on several benchmark datasets demonstrate the effectiveness of our method compared with several state-of-the-art approaches. △ Less

Submitted 17 August, 2020; originally announced August 2020.

Comments: Accepted by CIKM 2020, short paper

arXiv:2004.03808 [pdf, other]

doi 10.1109/ACCESS.2021.3122273

Improving BERT with Self-Supervised Attention

Authors: Yiren Chen, Xiaoyu Kou, Jiangang Bai, Yunhai Tong

Abstract: One of the most popular paradigms of applying large pre-trained NLP models such as BERT is to fine-tune it on a smaller dataset. However, one challenge remains as the fine-tuned model often overfits on smaller datasets. A symptom of this phenomenon is that irrelevant or misleading words in the sentence, which are easy to understand for human beings, can substantially degrade the performance of the… ▽ More One of the most popular paradigms of applying large pre-trained NLP models such as BERT is to fine-tune it on a smaller dataset. However, one challenge remains as the fine-tuned model often overfits on smaller datasets. A symptom of this phenomenon is that irrelevant or misleading words in the sentence, which are easy to understand for human beings, can substantially degrade the performance of these finetuned BERT models. In this paper, we propose a novel technique, called Self-Supervised Attention (SSA) to help facilitate this generalization challenge. Specifically, SSA automatically generates weak, token-level attention labels iteratively by probing the fine-tuned model from the previous iteration. We investigate two different ways of integrating SSA into BERT and propose a hybrid approach to combine their benefits. Empirically, through a variety of public datasets, we illustrate significant performance improvement using our SSA-enhanced BERT model. △ Less

Submitted 22 October, 2021; v1 submitted 8 April, 2020; originally announced April 2020.

arXiv:1912.10729 [pdf, other]

TextNAS: A Neural Architecture Search Space tailored for Text Representation

Authors: Yujing Wang, Yaming Yang, Yiren Chen, Jing Bai, Ce Zhang, Guinan Su, Xiaoyu Kou, Yunhai Tong, Mao Yang, Lidong Zhou

Abstract: Learning text representation is crucial for text classification and other language related tasks. There are a diverse set of text representation networks in the literature, and how to find the optimal one is a non-trivial problem. Recently, the emerging Neural Architecture Search (NAS) techniques have demonstrated good potential to solve the problem. Nevertheless, most of the existing works of NAS… ▽ More Learning text representation is crucial for text classification and other language related tasks. There are a diverse set of text representation networks in the literature, and how to find the optimal one is a non-trivial problem. Recently, the emerging Neural Architecture Search (NAS) techniques have demonstrated good potential to solve the problem. Nevertheless, most of the existing works of NAS focus on the search algorithms and pay little attention to the search space. In this paper, we argue that the search space is also an important human prior to the success of NAS in different applications. Thus, we propose a novel search space tailored for text representation. Through automatic search, the discovered network architecture outperforms state-of-the-art models on various public datasets on text classification and natural language inference tasks. Furthermore, some of the design principles found in the automatic network agree well with human intuition. △ Less

Submitted 23 December, 2019; originally announced December 2019.

arXiv:1906.03821 [pdf, other]

doi 10.1145/3292500.3330680

Time-Series Anomaly Detection Service at Microsoft

Authors: Hansheng Ren, Bixiong Xu, Yujing Wang, Chao Yi, Congrui Huang, Xiaoyu Kou, Tony Xing, Mao Yang, Jie Tong, Qi Zhang

Abstract: Large companies need to monitor various metrics (for example, Page Views and Revenue) of their applications and services in real time. At Microsoft, we develop a time-series anomaly detection service which helps customers to monitor the time-series continuously and alert for potential incidents on time. In this paper, we introduce the pipeline and algorithm of our anomaly detection service, which… ▽ More Large companies need to monitor various metrics (for example, Page Views and Revenue) of their applications and services in real time. At Microsoft, we develop a time-series anomaly detection service which helps customers to monitor the time-series continuously and alert for potential incidents on time. In this paper, we introduce the pipeline and algorithm of our anomaly detection service, which is designed to be accurate, efficient and general. The pipeline consists of three major modules, including data ingestion, experimentation platform and online compute. To tackle the problem of time-series anomaly detection, we propose a novel algorithm based on Spectral Residual (SR) and Convolutional Neural Network (CNN). Our work is the first attempt to borrow the SR model from visual saliency detection domain to time-series anomaly detection. Moreover, we innovatively combine SR and CNN together to improve the performance of SR model. Our approach achieves superior experimental results compared with state-of-the-art baselines on both public datasets and Microsoft production data. △ Less

Submitted 10 June, 2019; originally announced June 2019.

Comments: KDD 2019

Showing 1–12 of 12 results for author: Kou, X