Skip to main content

Showing 1–33 of 33 results for author: Tu, M

  1. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chen Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2406.07349  [pdf, other

    cs.CR

    Erasing Radio Frequency Fingerprints via Active Adversarial Perturbation

    Authors: Zhaoyi Lu, Wenchao Xu, Ming Tu, Xin Xie, Cunqing Hua, Nan Cheng

    Abstract: Radio Frequency (RF) fingerprinting is to identify a wireless device from its uniqueness of the analog circuitry or hardware imperfections. However, unlike the MAC address which can be modified, such hardware feature is inevitable for the signal emitted to air, which can possibly reveal device whereabouts, e.g., a sniffer can use a pre-trained model to identify a nearby device when receiving its s… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  3. arXiv:2404.06674  [pdf, other

    cs.SD cs.AI eess.AS

    VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

    Authors: Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma

    Abstract: We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre. Previous works have been constrained to specialized models that can only edit these attributes individually and suffer from the following pitfalls: the magnitude of the conversion… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  4. Object Detection in Thermal Images Using Deep Learning for Unmanned Aerial Vehicles

    Authors: Minh Dang Tu, Kieu Trang Le, Manh Duong Phung

    Abstract: This work presents a neural network model capable of recognizing small and tiny objects in thermal images collected by unmanned aerial vehicles. Our model consists of three parts, the backbone, the neck, and the prediction head. The backbone is developed based on the structure of YOLOv5 combined with the use of a transformer encoder at the end. The neck includes a BI-FPN block combined with the us… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Published in: 2024 IEEE/SICE International Symposium on System Integration (SII)

  5. arXiv:2310.13028  [pdf, other

    cs.CL cs.AI

    Reliable Academic Conference Question Answering: A Study Based on Large Language Model

    Authors: Zhiwei Huang, Long Jin, Junjie Wang, Mingchen Tu, Yin Hua, Zhiqiang Liu, Jiawei Meng, Huajun Chen, Wen Zhang

    Abstract: The rapid growth of computer science has led to a proliferation of research presented at academic conferences, fostering global scholarly communication. Researchers consistently seek accurate, current information about these events at all stages. This data surge necessitates an intelligent question-answering system to efficiently address researchers' queries and ensure awareness of the latest adva… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 10 pages, 4 figures, 2 tables

  6. arXiv:2309.00597  [pdf, other

    cs.CE cs.DC cs.ET q-bio.NC quant-ph

    The QUATRO Application Suite: Quantum Computing for Models of Human Cognition

    Authors: Raghavendra Pradyumna Pothukuchi, Leon Lufkin, Yu Jun Shen, Alejandro Simon, Rome Thorstenson, Bernardo Eilert Trevisan, Michael Tu, Mudi Yang, Ben Foxman, Viswanatha Srinivas Pothukuchi, Gunnar Epping, Thi Ha Kyaw, Bryant J Jongkees, Yongshan Ding, Jerome R Busemeyer, Jonathan D Cohen, Abhishek Bhattacharjee

    Abstract: Research progress in quantum computing has, thus far, focused on a narrow set of application domains. Expanding the suite of quantum application domains is vital for the discovery of new software toolchains and architectural abstractions. In this work, we unlock a new class of applications ripe for quantum computing research -- computational cognitive modeling. Cognitive models are critical to und… ▽ More

    Submitted 8 December, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

  7. arXiv:2308.10173  [pdf, other

    cs.CL

    FoodGPT: A Large Language Model in Food Testing Domain with Incremental Pre-training and Knowledge Graph Prompt

    Authors: Zhixiao Qi, Yijiong Yu, Meiqi Tu, Junyi Tan, Yongfeng Huang

    Abstract: Currently, the construction of large language models in specific domains is done by fine-tuning on a base model. Some models also incorporate knowledge bases without the need for pre-training. This is because the base model already contains domain-specific knowledge during the pre-training process. We build a large language model for food testing. Unlike the above approach, a significant amount of… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

  8. arXiv:2305.18551  [pdf

    astro-ph.IM cs.SD eess.AS

    Multi-Band Acoustic Monitoring of Aerial Signatures

    Authors: Andrew Mead, Sarah Little, Paul Sail, Michelle Tu, Wesley Andrés Watters, Abigail White, Richard Cloete

    Abstract: The Galileo Project's acoustic monitoring, omni-directional system (AMOS) aids in the detection and characterization of aerial phenomena. It uses a multi-band microphone suite spanning infrasonic to ultrasonic frequencies, providing an independent signal modality for validation and characterization of detected objects. The system utilizes infrasonic, audible, and ultrasonic systems to cover a wide… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Journal ref: Journal of Astronomical Instrumentation, 12(1), 2340005 (2023)

  9. arXiv:2305.15719  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Efficient Neural Music Generation

    Authors: Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yuping Wang, Yuxuan Wang

    Abstract: Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  10. arXiv:2305.11576  [pdf, other

    eess.AS cs.CL cs.SD

    Language-universal phonetic encoder for low-resource speech recognition

    Authors: Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

    Abstract: Multilingual training is effective in improving low-resource ASR, which may partially be explained by phonetic representation sharing between languages. In end-to-end (E2E) ASR systems, graphemes are often used as basic modeling units, however graphemes may not be ideal for multilingual phonetic sharing. In this paper, we leverage International Phonetic Alphabet (IPA) based language-universal phon… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted for publication in INTERSPEECH 2023

  11. arXiv:2305.11569  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition

    Authors: Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

    Abstract: We improve low-resource ASR by integrating the ideas of multilingual training and self-supervised learning. Concretely, we leverage an International Phonetic Alphabet (IPA) multilingual model to create frame-level pseudo labels for unlabeled speech, and use these pseudo labels to guide hidden-unit BERT (HuBERT) based speech pretraining in a phonetically-informed manner. The experiments on the Mult… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted for publication in INTERSPEECH 2023

  12. arXiv:2305.05226  [pdf, other

    cs.CL

    Multi-Teacher Knowledge Distillation For Text Image Machine Translation

    Authors: Cong Ma, Yaping Zhang, Mei Tu, Yang Zhao, Yu Zhou, Chengqing Zong

    Abstract: Text image machine translation (TIMT) has been widely used in various real-world applications, which translates source language texts in images into another target language sentence. Existing methods on TIMT are mainly divided into two categories: the recognition-then-translation pipeline model and the end-to-end model. However, how to transfer knowledge from the pipeline model into the end-to-end… ▽ More

    Submitted 9 May, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted at The 17th International Conference on Document Analysis and Recognition (ICDAR 2023)

  13. arXiv:2305.05166  [pdf, other

    cs.CL

    E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation

    Authors: Cong Ma, Yaping Zhang, Mei Tu, Yang Zhao, Yu Zhou, Chengqing Zong

    Abstract: Text image machine translation (TIMT) aims to translate texts embedded in images from one source language to another target language. Existing methods, both two-stage cascade and one-stage end-to-end architectures, suffer from different issues. The cascade models can benefit from the large-scale optical character recognition (OCR) and MT datasets but the two-stage architecture is redundant. The en… ▽ More

    Submitted 9 May, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted at The 17th International Conference on Document Analysis and Recognition (ICDAR 2023)

  14. arXiv:2305.03949  [pdf, other

    cs.CL

    Label-Free Multi-Domain Machine Translation with Stage-wise Training

    Authors: Fan Zhang, Mei Tu, Sangha Kim, Song Liu, Jinyao Yan

    Abstract: Most multi-domain machine translation models rely on domain-annotated data. Unfortunately, domain labels are usually unavailable in both training processes and real translation scenarios. In this work, we propose a label-free multi-domain machine translation model which requires only a few or no domain-annotated data in training and no domain labels in inference. Our model is composed of three par… ▽ More

    Submitted 6 May, 2023; originally announced May 2023.

  15. arXiv:2303.09279  [pdf, other

    cs.CR cs.MM

    Privacy-Preserving Video Conferencing via Thermal-Generative Images

    Authors: Sheng-Yang Chiu, Yu-Ting Huang, Chieh-Ting Lin, Yu-Chee Tseng, Jen-Jee Chen, Meng-Hsuan Tu, Bo-Chen Tung, YuJou Nieh

    Abstract: Due to the COVID-19 epidemic, video conferencing has evolved as a new paradigm of communication and teamwork. However, private and personal information can be easily leaked through cameras during video conferencing. This includes leakage of a person's appearance as well as the contents in the background. This paper proposes a novel way of using online low-resolution thermal images as conditions to… ▽ More

    Submitted 28 March, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted for publication at IEEE International Conference on Robotics and Automation (ICRA) 2023

  16. arXiv:2301.00066  [pdf, other

    cs.CL eess.AS

    Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

    Authors: Yukun Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

    Abstract: Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The long-tail prediction problems have been widely studied in many applications, but only been addressed by a few studies for ASR and LMs. In this paper, we propose a n… ▽ More

    Submitted 30 December, 2022; originally announced January 2023.

    Comments: Submitted to ICASSP 2023

  17. Attentive Deep Neural Networks for Legal Document Retrieval

    Authors: Ha-Thanh Nguyen, Manh-Kien Phi, Xuan-Bach Ngo, Vu Tran, Le-Minh Nguyen, Minh-Phuong Tu

    Abstract: Legal text retrieval serves as a key component in a wide range of legal text processing tasks such as legal question answering, legal case entailment, and statute law retrieval. The performance of legal text retrieval depends, to a large extent, on the representation of text, both query and legal documents. Based on good representations, a legal text retrieval model can effectively match the query… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: Preprint version. The official version will be published in Artificial Intelligence and Law journal

  18. arXiv:2210.15158  [pdf, other

    eess.AS cs.SD

    Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance

    Authors: Yuanzhe Chen, Ming Tu, Tang Li, Xin Li, Qiuqiang Kong, Jiaxin Li, Zhichao Wang, Qiao Tian, Yuping Wang, Yuxuan Wang

    Abstract: Streaming voice conversion (VC) is the task of converting the voice of one person to another in real-time. Previous streaming VC methods use phonetic posteriorgrams (PPGs) extracted from automatic speech recognition (ASR) systems to represent speaker-independent information. However, PPGs lack the prosody and vocalization information of the source speaker, and streaming PPGs contain undesired leak… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: The paper has been submitted to ICASSP2023

  19. arXiv:2210.03887  [pdf, other

    cs.CL

    Improving End-to-End Text Image Translation From the Auxiliary Text Translation Task

    Authors: Cong Ma, Yaping Zhang, Mei Tu, Xu Han, Linghui Wu, Yang Zhao, Yu Zhou

    Abstract: End-to-end text image translation (TIT), which aims at translating the source language embedded in images to the target language, has attracted intensive attention in recent research. However, data sparsity limits the performance of end-to-end text image translation. Multi-task learning is a non-trivial way to alleviate this problem via exploring knowledge from complementary related tasks. In this… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted at the 26TH International Conference on Pattern Recognition (ICPR 2022)

  20. arXiv:2209.10475  [pdf, other

    cs.DB

    Designing PIDs for Reproducible Science Using Time-Series Data

    Authors: Wen Ting Maria Tu, Stephen Makonin

    Abstract: As part of the investigation done by the IEEE Standards Association P2957 Working Group, called Big Data Governance and Metadata Management, the use of persistent identifiers (PIDs) is looked at for tackling the problem of reproducible research and science. This short paper proposes a preliminary method using PIDs to reproduce research results using time-series data. Furthermore, we feel it is pos… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: Submitted to MTSR 2022 - 16th International Conference on Metadata and Semantics Research

  21. arXiv:2207.08525  [pdf, other

    cs.CV

    Angular Gap: Reducing the Uncertainty of Image Difficulty through Model Calibration

    Authors: Bohua Peng, Mobarakol Islam, Mei Tu

    Abstract: Curriculum learning needs example difficulty to proceed from easy to hard. However, the credibility of image difficulty is rarely investigated, which can seriously affect the effectiveness of curricula. In this work, we propose Angular Gap, a measure of difficulty based on the difference in angular distance between feature embeddings and class-weight embeddings built by hyperspherical learning. To… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: 13 pages

  22. arXiv:2110.03347  [pdf, ps, other

    eess.AS cs.HC cs.SD

    Cloning one's voice using very limited data in the wild

    Authors: Dongyang Dai, Yuanzhe Chen, Li Chen, Ming Tu, Lu Liu, Rui Xia, Qiao Tian, Yuping Wang, Yuxuan Wang

    Abstract: With the increasing popularity of speech synthesis products, the industry has put forward more requirements for personalized speech synthesis: (1) How to use low-resource, easily accessible data to clone a person's voice. (2) How to clone a person's voice while controlling the style and prosody. To solve the above two problems, we proposed the Hieratron model framework in which the prosody and tim… ▽ More

    Submitted 8 October, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

  23. arXiv:2004.02001  [pdf, other

    cs.CL cs.AI cs.LG

    Graph Sequential Network for Reasoning over Sequences

    Authors: Ming Tu, Jing Huang, Xiaodong He, Bowen Zhou

    Abstract: Recently Graph Neural Network (GNN) has been applied successfully to various NLP tasks that require reasoning, such as multi-hop machine reading comprehension. In this paper, we consider a novel case where reasoning is needed over graphs built from sequences, i.e. graph nodes with sequence data. Existing GNN models fulfill this goal by first summarizing the node sequences into fixed-dimensional ve… ▽ More

    Submitted 4 April, 2020; originally announced April 2020.

    Comments: Part of this paper was presented at NeurIPS 2019 Workshop on Graph Representation Learning

  24. arXiv:1911.01533  [pdf, other

    eess.AS cs.LG cs.SD

    Speaker-invariant Affective Representation Learning via Adversarial Training

    Authors: Haoqi Li, Ming Tu, Jing Huang, Shrikanth Narayanan, Panayiotis Georgiou

    Abstract: Representation learning for speech emotion recognition is challenging due to labeled data sparsity issue and lack of gold standard references. In addition, there is much variability from input speech signals, human subjective perception of the signals and emotion label ambiguity. In this paper, we propose a machine learning framework to obtain speech emotion representations by limiting the effect… ▽ More

    Submitted 12 August, 2021; v1 submitted 4 November, 2019; originally announced November 2019.

    Comments: Accepted by ICASSP 2020; 5 pages

  25. arXiv:1911.00484  [pdf, other

    cs.CL

    Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents

    Authors: Ming Tu, Kevin Huang, Guangtao Wang, Jing Huang, Xiaodong He, Bowen Zhou

    Abstract: Interpretable multi-hop reading comprehension (RC) over multiple documents is a challenging problem because it demands reasoning over multiple information sources and explaining the answer prediction by providing supporting evidences. In this paper, we propose an effective and interpretable Select, Answer and Explain (SAE) system to solve the multi-document RC problem. Our system first filters out… ▽ More

    Submitted 10 February, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

    Comments: Accepted to AAAI 2020

  26. arXiv:1906.04881  [pdf, other

    cs.LG stat.ML

    Multiple instance learning with graph neural networks

    Authors: Ming Tu, Jing Huang, Xiaodong He, Bowen Zhou

    Abstract: Multiple instance learning (MIL) aims to learn the mapping between a bag of instances and the bag-level label. In this paper, we propose a new end-to-end graph neural network (GNN) based algorithm for MIL: we treat each bag as a graph and use GNN to learn the bag embedding, in order to explore the useful structural information among instances in bags. The final graph representation is fed into a c… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: Accepted to ICML 2019 Workshop on Learning and Reasoning with Graph-Structured Representations

  27. arXiv:1905.07374  [pdf, other

    cs.CL

    Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs

    Authors: Ming Tu, Guangtao Wang, Jing Huang, Yun Tang, Xiaodong He, Bowen Zhou

    Abstract: Multi-hop reading comprehension (RC) across documents poses new challenge over single-document RC because it requires reasoning over multiple documents to reach the final answer. In this paper, we propose a new model to tackle the multi-hop RC problem. We introduce a heterogeneous graph with different types of nodes and edges, which is named as Heterogeneous Document-Entity (HDE) graph. The advant… ▽ More

    Submitted 4 June, 2019; v1 submitted 17 May, 2019; originally announced May 2019.

    Comments: To appear in ACL 2019

  28. arXiv:1904.07386  [pdf, other

    eess.AS cs.CL cs.SD

    I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

    Authors: Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda , et al. (21 additional authors not shown)

    Abstract: The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the res… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: 5 pages

  29. arXiv:1903.09606  [pdf, other

    eess.AS cs.SD

    Towards adversarial learning of speaker-invariant representation for speech emotion recognition

    Authors: Ming Tu, Yun Tang, Jing Huang, Xiaodong He, Bowen Zhou

    Abstract: Speech emotion recognition (SER) has attracted great attention in recent years due to the high demand for emotionally intelligent speech interfaces. Deriving speaker-invariant representations for speech emotion recognition is crucial. In this paper, we propose to apply adversarial training to SER to learn speaker-invariant representations. Our model consists of three parts: a representation learni… ▽ More

    Submitted 22 March, 2019; originally announced March 2019.

  30. arXiv:1812.03594   

    cs.IT

    New Perfect Nonlinear Functions over Finite Fields

    Authors: Jinquan Luo, Junru Ma, Min Tu

    Abstract: In this paper we present a new class of perfect nonlinear %Dembowski-Ostrom polynomials over $\mathbb{F}_{p^{2k}}$ for any odd prime $p$. In addition, we show that the new perfect nonlinear functions are CCZ-inequivalent to all the previously known perfect nonlinear functions in general.

    Submitted 3 May, 2019; v1 submitted 9 December, 2018; originally announced December 2018.

    Comments: This result is not new. It has been found by other researchers many years ago

  31. arXiv:1807.01738  [pdf, other

    eess.AS cs.SD

    Investigating the role of L1 in automatic pronunciation evaluation of L2 speech

    Authors: Ming Tu, Anna Grabek, Julie Liss, Visar Berisha

    Abstract: Automatic pronunciation evaluation plays an important role in pronunciation training and second language education. This field draws heavily on concepts from automatic speech recognition (ASR) to quantify how close the pronunciation of non-native speech is to native-like pronunciation. However, it is known that the formation of accent is related to pronunciation patterns of both the target languag… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

    Comments: To appear in Interspeech 2018

  32. arXiv:1804.08663  [pdf, other

    eess.AS cs.SD

    A Discriminative Acoustic-Prosodic Approach for Measuring Local Entrainment

    Authors: Megan M. Willi, Stephanie A. Borrie, Tyson S. Barrett, Ming Tu, Visar Berisha

    Abstract: Acoustic-prosodic entrainment describes the tendency of humans to align or adapt their speech acoustics to each other in conversation. This alignment of spoken behavior has important implications for conversational success. However, modeling the subtle nature of entrainment in spoken dialogue continues to pose a challenge. In this paper, we propose a straightforward definition for local entrainmen… ▽ More

    Submitted 12 July, 2018; v1 submitted 23 April, 2018; originally announced April 2018.

  33. arXiv:1605.04859  [pdf, other

    cs.LG cs.NE

    Reducing the Model Order of Deep Neural Networks Using Information Theory

    Authors: Ming Tu, Visar Berisha, Yu Cao, Jae-sun Seo

    Abstract: Deep neural networks are typically represented by a much larger number of parameters than shallow models, making them prohibitive for small footprint devices. Recent research shows that there is considerable redundancy in the parameter space of deep neural networks. In this paper, we propose a method to compress deep neural networks by using the Fisher Information metric, which we estimate through… ▽ More

    Submitted 16 May, 2016; originally announced May 2016.

    Comments: To appear in ISVLSI 2016 special session