Skip to main content

Showing 1–50 of 99 results for author: Tu, M

  1. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2406.07349  [pdf, other

    cs.CR

    Erasing Radio Frequency Fingerprints via Active Adversarial Perturbation

    Authors: Zhaoyi Lu, Wenchao Xu, Ming Tu, Xin Xie, Cunqing Hua, Nan Cheng

    Abstract: Radio Frequency (RF) fingerprinting is to identify a wireless device from its uniqueness of the analog circuitry or hardware imperfections. However, unlike the MAC address which can be modified, such hardware feature is inevitable for the signal emitted to air, which can possibly reveal device whereabouts, e.g., a sniffer can use a pre-trained model to identify a nearby device when receiving its s… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  3. arXiv:2404.06674  [pdf, other

    cs.SD cs.AI eess.AS

    VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

    Authors: Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma

    Abstract: We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre. Previous works have been constrained to specialized models that can only edit these attributes individually and suffer from the following pitfalls: the magnitude of the conversion… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  4. Object Detection in Thermal Images Using Deep Learning for Unmanned Aerial Vehicles

    Authors: Minh Dang Tu, Kieu Trang Le, Manh Duong Phung

    Abstract: This work presents a neural network model capable of recognizing small and tiny objects in thermal images collected by unmanned aerial vehicles. Our model consists of three parts, the backbone, the neck, and the prediction head. The backbone is developed based on the structure of YOLOv5 combined with the use of a transformer encoder at the end. The neck includes a BI-FPN block combined with the us… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Published in: 2024 IEEE/SICE International Symposium on System Integration (SII)

  5. arXiv:2310.13028  [pdf, other

    cs.CL cs.AI

    Reliable Academic Conference Question Answering: A Study Based on Large Language Model

    Authors: Zhiwei Huang, Long Jin, Junjie Wang, Mingchen Tu, Yin Hua, Zhiqiang Liu, Jiawei Meng, Huajun Chen, Wen Zhang

    Abstract: The rapid growth of computer science has led to a proliferation of research presented at academic conferences, fostering global scholarly communication. Researchers consistently seek accurate, current information about these events at all stages. This data surge necessitates an intelligent question-answering system to efficiently address researchers' queries and ensure awareness of the latest adva… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 10 pages, 4 figures, 2 tables

  6. arXiv:2309.00597  [pdf, other

    cs.CE cs.DC cs.ET q-bio.NC quant-ph

    The QUATRO Application Suite: Quantum Computing for Models of Human Cognition

    Authors: Raghavendra Pradyumna Pothukuchi, Leon Lufkin, Yu Jun Shen, Alejandro Simon, Rome Thorstenson, Bernardo Eilert Trevisan, Michael Tu, Mudi Yang, Ben Foxman, Viswanatha Srinivas Pothukuchi, Gunnar Epping, Thi Ha Kyaw, Bryant J Jongkees, Yongshan Ding, Jerome R Busemeyer, Jonathan D Cohen, Abhishek Bhattacharjee

    Abstract: Research progress in quantum computing has, thus far, focused on a narrow set of application domains. Expanding the suite of quantum application domains is vital for the discovery of new software toolchains and architectural abstractions. In this work, we unlock a new class of applications ripe for quantum computing research -- computational cognitive modeling. Cognitive models are critical to und… ▽ More

    Submitted 8 December, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

  7. arXiv:2308.10173  [pdf, other

    cs.CL

    FoodGPT: A Large Language Model in Food Testing Domain with Incremental Pre-training and Knowledge Graph Prompt

    Authors: Zhixiao Qi, Yijiong Yu, Meiqi Tu, Junyi Tan, Yongfeng Huang

    Abstract: Currently, the construction of large language models in specific domains is done by fine-tuning on a base model. Some models also incorporate knowledge bases without the need for pre-training. This is because the base model already contains domain-specific knowledge during the pre-training process. We build a large language model for food testing. Unlike the above approach, a significant amount of… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

  8. arXiv:2305.18566  [pdf

    astro-ph.IM eess.SP

    The Scientific Investigation of Unidentified Aerial Phenomena (UAP) Using Multimodal Ground-Based Observatories

    Authors: Wesley Andrés Watters, Abraham Loeb, Frank Laukien, Richard Cloete, Alex Delacroix, Sergei Dobroshinsky, Benjamin Horvath, Ezra Kelderman, Sarah Little, Eric Masson, Andrew Mead, Mitch Randall, Forrest Schultz, Matthew Szenher, Foteini Vervelidou, Abigail White, Angelique Ahlström, Carol Cleland, Spencer Dockal, Natasha Donahue, Mark Elowitz, Carson Ezell, Alex Gersznowicz, Nicholas Gold, Michael G. Hercz , et al. (13 additional authors not shown)

    Abstract: (Abridged) Unidentified Aerial Phenomena (UAP) have resisted explanation and have received little formal scientific attention for 75 years. A primary objective of the Galileo Project is to build an integrated software and instrumentation system designed to conduct a multimodal census of aerial phenomena and to recognize anomalies. Here we present key motivations for the study of UAP and address hi… ▽ More

    Submitted 31 May, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: This paper is published in the Journal of Astronomical Instrumentation, 12(1), 2340006 (2023) https://doi.org/10.1142/S2251171723400068

    Journal ref: Journal of Astronomical Instrumentation, 12(1), 2340006 (2023)

  9. arXiv:2305.18551  [pdf

    astro-ph.IM cs.SD eess.AS

    Multi-Band Acoustic Monitoring of Aerial Signatures

    Authors: Andrew Mead, Sarah Little, Paul Sail, Michelle Tu, Wesley Andrés Watters, Abigail White, Richard Cloete

    Abstract: The Galileo Project's acoustic monitoring, omni-directional system (AMOS) aids in the detection and characterization of aerial phenomena. It uses a multi-band microphone suite spanning infrasonic to ultrasonic frequencies, providing an independent signal modality for validation and characterization of detected objects. The system utilizes infrasonic, audible, and ultrasonic systems to cover a wide… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Journal ref: Journal of Astronomical Instrumentation, 12(1), 2340005 (2023)

  10. arXiv:2305.15719  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Efficient Neural Music Generation

    Authors: Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yuping Wang, Yuxuan Wang

    Abstract: Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  11. arXiv:2305.11576  [pdf, other

    eess.AS cs.CL cs.SD

    Language-universal phonetic encoder for low-resource speech recognition

    Authors: Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

    Abstract: Multilingual training is effective in improving low-resource ASR, which may partially be explained by phonetic representation sharing between languages. In end-to-end (E2E) ASR systems, graphemes are often used as basic modeling units, however graphemes may not be ideal for multilingual phonetic sharing. In this paper, we leverage International Phonetic Alphabet (IPA) based language-universal phon… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted for publication in INTERSPEECH 2023

  12. arXiv:2305.11569  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition

    Authors: Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

    Abstract: We improve low-resource ASR by integrating the ideas of multilingual training and self-supervised learning. Concretely, we leverage an International Phonetic Alphabet (IPA) multilingual model to create frame-level pseudo labels for unlabeled speech, and use these pseudo labels to guide hidden-unit BERT (HuBERT) based speech pretraining in a phonetically-informed manner. The experiments on the Mult… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted for publication in INTERSPEECH 2023

  13. arXiv:2305.05226  [pdf, other

    cs.CL

    Multi-Teacher Knowledge Distillation For Text Image Machine Translation

    Authors: Cong Ma, Yaping Zhang, Mei Tu, Yang Zhao, Yu Zhou, Chengqing Zong

    Abstract: Text image machine translation (TIMT) has been widely used in various real-world applications, which translates source language texts in images into another target language sentence. Existing methods on TIMT are mainly divided into two categories: the recognition-then-translation pipeline model and the end-to-end model. However, how to transfer knowledge from the pipeline model into the end-to-end… ▽ More

    Submitted 9 May, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted at The 17th International Conference on Document Analysis and Recognition (ICDAR 2023)

  14. arXiv:2305.05166  [pdf, other

    cs.CL

    E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation

    Authors: Cong Ma, Yaping Zhang, Mei Tu, Yang Zhao, Yu Zhou, Chengqing Zong

    Abstract: Text image machine translation (TIMT) aims to translate texts embedded in images from one source language to another target language. Existing methods, both two-stage cascade and one-stage end-to-end architectures, suffer from different issues. The cascade models can benefit from the large-scale optical character recognition (OCR) and MT datasets but the two-stage architecture is redundant. The en… ▽ More

    Submitted 9 May, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted at The 17th International Conference on Document Analysis and Recognition (ICDAR 2023)

  15. arXiv:2305.03949  [pdf, other

    cs.CL

    Label-Free Multi-Domain Machine Translation with Stage-wise Training

    Authors: Fan Zhang, Mei Tu, Sangha Kim, Song Liu, Jinyao Yan

    Abstract: Most multi-domain machine translation models rely on domain-annotated data. Unfortunately, domain labels are usually unavailable in both training processes and real translation scenarios. In this work, we propose a label-free multi-domain machine translation model which requires only a few or no domain-annotated data in training and no domain labels in inference. Our model is composed of three par… ▽ More

    Submitted 6 May, 2023; originally announced May 2023.

  16. arXiv:2303.09279  [pdf, other

    cs.CR cs.MM

    Privacy-Preserving Video Conferencing via Thermal-Generative Images

    Authors: Sheng-Yang Chiu, Yu-Ting Huang, Chieh-Ting Lin, Yu-Chee Tseng, Jen-Jee Chen, Meng-Hsuan Tu, Bo-Chen Tung, YuJou Nieh

    Abstract: Due to the COVID-19 epidemic, video conferencing has evolved as a new paradigm of communication and teamwork. However, private and personal information can be easily leaked through cameras during video conferencing. This includes leakage of a person's appearance as well as the contents in the background. This paper proposes a novel way of using online low-resolution thermal images as conditions to… ▽ More

    Submitted 28 March, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted for publication at IEEE International Conference on Robotics and Automation (ICRA) 2023

  17. arXiv:2301.00066  [pdf, other

    cs.CL eess.AS

    Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

    Authors: Yukun Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

    Abstract: Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The long-tail prediction problems have been widely studied in many applications, but only been addressed by a few studies for ASR and LMs. In this paper, we propose a n… ▽ More

    Submitted 30 December, 2022; originally announced January 2023.

    Comments: Submitted to ICASSP 2023

  18. Attentive Deep Neural Networks for Legal Document Retrieval

    Authors: Ha-Thanh Nguyen, Manh-Kien Phi, Xuan-Bach Ngo, Vu Tran, Le-Minh Nguyen, Minh-Phuong Tu

    Abstract: Legal text retrieval serves as a key component in a wide range of legal text processing tasks such as legal question answering, legal case entailment, and statute law retrieval. The performance of legal text retrieval depends, to a large extent, on the representation of text, both query and legal documents. Based on good representations, a legal text retrieval model can effectively match the query… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: Preprint version. The official version will be published in Artificial Intelligence and Law journal

  19. arXiv:2210.15158  [pdf, other

    eess.AS cs.SD

    Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance

    Authors: Yuanzhe Chen, Ming Tu, Tang Li, Xin Li, Qiuqiang Kong, Jiaxin Li, Zhichao Wang, Qiao Tian, Yuping Wang, Yuxuan Wang

    Abstract: Streaming voice conversion (VC) is the task of converting the voice of one person to another in real-time. Previous streaming VC methods use phonetic posteriorgrams (PPGs) extracted from automatic speech recognition (ASR) systems to represent speaker-independent information. However, PPGs lack the prosody and vocalization information of the source speaker, and streaming PPGs contain undesired leak… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: The paper has been submitted to ICASSP2023

  20. arXiv:2210.03887  [pdf, other

    cs.CL

    Improving End-to-End Text Image Translation From the Auxiliary Text Translation Task

    Authors: Cong Ma, Yaping Zhang, Mei Tu, Xu Han, Linghui Wu, Yang Zhao, Yu Zhou

    Abstract: End-to-end text image translation (TIT), which aims at translating the source language embedded in images to the target language, has attracted intensive attention in recent research. However, data sparsity limits the performance of end-to-end text image translation. Multi-task learning is a non-trivial way to alleviate this problem via exploring knowledge from complementary related tasks. In this… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted at the 26TH International Conference on Pattern Recognition (ICPR 2022)

  21. arXiv:2209.10475  [pdf, other

    cs.DB

    Designing PIDs for Reproducible Science Using Time-Series Data

    Authors: Wen Ting Maria Tu, Stephen Makonin

    Abstract: As part of the investigation done by the IEEE Standards Association P2957 Working Group, called Big Data Governance and Metadata Management, the use of persistent identifiers (PIDs) is looked at for tackling the problem of reproducible research and science. This short paper proposes a preliminary method using PIDs to reproduce research results using time-series data. Furthermore, we feel it is pos… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: Submitted to MTSR 2022 - 16th International Conference on Metadata and Semantics Research

  22. arXiv:2208.04805  [pdf, other

    cond-mat.mes-hall quant-ph

    Exploiting anisotropic Rashba effects on real-time photocurrents and spin polarization for transient symmetry breaking

    Authors: Matisse Wei-Yuan Tu, Jyh-Pin Chou, Chih-Wei Luo

    Abstract: We theoretically investigate the real-time transient responses of a two-dimensional (2D) electron gas with anisotropic Rashba spin-orbit coupling (SOC) to laser pulses. Through explicitly monitoring the time-dependent photocurrents and spin polarization under different linear polarizations of the laser pulse, we find that the transient breaking of the mirror symmetry in combination with the anisot… ▽ More

    Submitted 9 August, 2022; originally announced August 2022.

  23. arXiv:2207.08525  [pdf, other

    cs.CV

    Angular Gap: Reducing the Uncertainty of Image Difficulty through Model Calibration

    Authors: Bohua Peng, Mobarakol Islam, Mei Tu

    Abstract: Curriculum learning needs example difficulty to proceed from easy to hard. However, the credibility of image difficulty is rarely investigated, which can seriously affect the effectiveness of curricula. In this work, we propose Angular Gap, a measure of difficulty based on the difference in angular distance between feature embeddings and class-weight embeddings built by hyperspherical learning. To… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: 13 pages

  24. arXiv:2205.08036  [pdf, ps, other

    stat.ME math.ST

    On Semiparametric Efficiency of an Emerging Class of Regression Models for Between-subject Attributes

    Authors: Jinyuan Liu, Tuo Lin, Tian Chen, Xinlian Zhang, Xin M. Tu

    Abstract: The semiparametric regression models have attracted increasing attention owing to their robustness compared to their parametric counterparts. This paper discusses the efficiency bound for functional response models (FRM), an emerging class of semiparametric regression that serves as a timely solution for research questions involving pairwise observations. This new paradigm is especially appealing… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

  25. arXiv:2110.03347  [pdf, ps, other

    eess.AS cs.HC cs.SD

    Cloning one's voice using very limited data in the wild

    Authors: Dongyang Dai, Yuanzhe Chen, Li Chen, Ming Tu, Lu Liu, Rui Xia, Qiao Tian, Yuping Wang, Yuxuan Wang

    Abstract: With the increasing popularity of speech synthesis products, the industry has put forward more requirements for personalized speech synthesis: (1) How to use low-resource, easily accessible data to clone a person's voice. (2) How to clone a person's voice while controlling the style and prosody. To solve the above two problems, we proposed the Hieratron model framework in which the prosody and tim… ▽ More

    Submitted 8 October, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

  26. arXiv:2104.14147  [pdf, other

    cond-mat.mes-hall quant-ph

    Revealing the non-adiabatic and non-Abelian multiple-band effects via anisotropic valley Hall conduction in bilayer graphene

    Authors: Ci Li, Matisse Wei-Yuan Tu, Wang Yao

    Abstract: Many quantum materials of interest, ex., bilayer graphene, possess a number of closely spaced but not fully degenerate bands near the Fermi level, where the coupling to the far detuned remote bands can induce Berry curvatures of the non-Abelian character in this active multiple-band manifold for transport effects. Under finite electric fields, non-adiabatic interband transition processes are expec… ▽ More

    Submitted 14 July, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: 9 pages, 5 figures

    Journal ref: Ci Li, Matisse Wei-Yuan Tu, and Wang Yao, 2D Mater. 8 045012 (2021)

  27. arXiv:2103.02575  [pdf

    cond-mat.mtrl-sci

    Quantifying Photoinduced Polaronic Distortions in Inorganic Lead Halide Perovskites Nanocrystals

    Authors: Oliviero Cannelli, Nicola Colonna, Michele Puppin, Thomas Rossi, Dominik Kinschel, Ludmila Leroy, Janina Loeffler, Anne Marie March, Gilles Doumy, Andre Al Haddad, Ming-Feng Tu, Yoshiaki Kumagai, Donald Walko, Grigory Smolentsev, Franziska Krieg, Simon C. Boehme, Maksym V. Kovalenko, Majed Chergui, Giulia F. Mancini

    Abstract: The development of next generation perovskite-based optoelectronic devices relies critically on the understanding of the interaction between charge carriers and the polar lattice in out-of-equilibrium conditions. While it has become increasingly evident for CsPbBr3 perovskites that the Pb-Br framework flexibility plays a key role in their light-activated functionality, the corresponding local stru… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: Main: 27 pages, 4 figures SI: 16 pages, 8 figures

  28. arXiv:2102.10818  [pdf

    cond-mat.mes-hall cond-mat.mtrl-sci

    Spin Photovoltaic Effect in Magnetic van der Waals Heterostructures

    Authors: Tiancheng Song, Eric Anderson, Matisse Wei-Yuan Tu, Kyle Seyler, Takashi Taniguchi, Kenji Watanabe, Michael A. McGuire, Xiaosong Li, Ting Cao, Di Xiao, Wang Yao, Xiaodong Xu

    Abstract: The development of van der Waals (vdW) crystals and their heterostructures has created a fascinating platform for exploring optoelectronic properties in the two-dimensional (2D) limit. With the recent discovery of 2D magnets, the control of the spin degree of freedom can be integrated to realize 2D spin-optoelectronics with spontaneous time-reversal symmetry breaking. Here, we report spin photovol… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

  29. Giant spin transfer torque in atomically thin magnetic bilayers

    Authors: Weihao Cao, Matisse Wei-Yuan Tu, Jiang Xiao, Wang Yao

    Abstract: In cavity quantum electrodynamics, the multiple reflections of a photon between two mirrors defining a cavity is exploited to enhance the light-coupling of an intra-cavity atom. We show that this paradigm for enhancing the interaction of a flying particle with a localized object can be generalized to spintronics based on van der Waals 2D magnets. Upon tunneling through a magnetic bilayer, we find… ▽ More

    Submitted 27 September, 2020; v1 submitted 14 September, 2020; originally announced September 2020.

    Comments: Published as an Express Letter on Chinese Physics Letters

    Journal ref: Chin. Phys. Lett. 37, 107201 (2020)

  30. arXiv:2005.14286  [pdf, other

    q-bio.BM q-bio.QM

    Generative network complex for the automated generation of druglike molecules

    Authors: Kaifu Gao, Duc D Nguyen, Meihua Tu, Guo-Wei Wei

    Abstract: Current drug discovery is expensive and time-consuming. It remains a challenging task to create a wide variety of novel compounds with desirable pharmacological properties and cheaply available to low-income people. In this work, we develop a generative network complex (GNC) to generate new drug-like molecules based on the multi-property optimization via the gradient descent in the latent space of… ▽ More

    Submitted 28 May, 2020; originally announced May 2020.

    Comments: 27 pages, 2 tables and 19 figures

  31. arXiv:2004.06279  [pdf, other

    cond-mat.mes-hall physics.chem-ph

    Theory of wavepacket transport under narrow gaps and spatial textures: non-adiabaticity and semiclassicality

    Authors: Matisse Wei-Yuan Tu, Ci Li, Wang Yao

    Abstract: We generalise the celebrated semiclassical wavepacket approach from the adiabatic to the non-adiabatic regime. A unified description covering both of these regimes is particularly desired for systems with spatially varying band structures where band gaps of various sizes are simultaneously present, e.g. in moiré patterns. For a single wavepacket, alternative to the previous derivation by Lagrangia… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

    Comments: 16 pages, 1 figure

    Journal ref: Phys. Rev. B 102, 045423 (2020)

  32. arXiv:2004.02001  [pdf, other

    cs.CL cs.AI cs.LG

    Graph Sequential Network for Reasoning over Sequences

    Authors: Ming Tu, Jing Huang, Xiaodong He, Bowen Zhou

    Abstract: Recently Graph Neural Network (GNN) has been applied successfully to various NLP tasks that require reasoning, such as multi-hop machine reading comprehension. In this paper, we consider a novel case where reasoning is needed over graphs built from sequences, i.e. graph nodes with sequence data. Existing GNN models fulfill this goal by first summarizing the node sequences into fixed-dimensional ve… ▽ More

    Submitted 4 April, 2020; originally announced April 2020.

    Comments: Part of this paper was presented at NeurIPS 2019 Workshop on Graph Representation Learning

  33. arXiv:2004.01326  [pdf, other

    cond-mat.mes-hall

    Non-adiabatic Hall effect at Berry curvature hot spot

    Authors: Matisse Wei-Yuan Tu, Ci Li, Hongyi Yu, Wang Yao

    Abstract: Hot spot of Berry curvature is usually found at Bloch band anti-crossings, where the Hall effect due to the Berry phase can be most pronounced. With small gaps there, the adiabatic limit for the existing formulations of Hall current can be exceeded in a moderate electric field. Here we present a theory of non-adiabatic Hall effect, capturing non-perturbatively the across gap electron-hole excitati… ▽ More

    Submitted 2 April, 2020; originally announced April 2020.

    Comments: submitted, 5 pages, 2 figures

    Journal ref: 2D Mater. 2020

  34. RIXS Reveals Hidden Local Transitions of the Aqueous OH Radical

    Authors: L. Kjellsson, K. Nanda, J. -E. Rubensson, G. Doumy, S. H. Southworth, P. J. Ho, A. M. March, A. Al Haddad, Y. Kumagai, M. -F. Tu, R. Schaller, T. Debnath, M. S. Bin Mohd Yusof, C. Arnold, W. F. Schlotter, S. Moeller, G. Coslovich, J. D. Koralek, M. P. Minitti, M. L. Vidal, M. Simon, R. Santra, Z. -H. Loh, vS. Coriani, A. I. Krylov , et al. (1 additional authors not shown)

    Abstract: Resonant inelastic x-ray scattering (RIXS) provides remarkable opportunities to interrogate ultrafast dynamics in liquids. Here we use RIXS to study the fundamentally and practically important hydroxyl radical in liquid water, OH(aq). Impulsive ionization of pure liquid water produced a short-lived population of OH(aq), which was probed using femtosecond x-rays from an x-ray free-electron laser. W… ▽ More

    Submitted 8 March, 2020; originally announced March 2020.

    Comments: 40 pages, 10 figures

    Journal ref: Phys. Rev. Lett. 124, 236001 (2020)

  35. arXiv:1911.01533  [pdf, other

    eess.AS cs.LG cs.SD

    Speaker-invariant Affective Representation Learning via Adversarial Training

    Authors: Haoqi Li, Ming Tu, Jing Huang, Shrikanth Narayanan, Panayiotis Georgiou

    Abstract: Representation learning for speech emotion recognition is challenging due to labeled data sparsity issue and lack of gold standard references. In addition, there is much variability from input speech signals, human subjective perception of the signals and emotion label ambiguity. In this paper, we propose a machine learning framework to obtain speech emotion representations by limiting the effect… ▽ More

    Submitted 12 August, 2021; v1 submitted 4 November, 2019; originally announced November 2019.

    Comments: Accepted by ICASSP 2020; 5 pages

  36. Are 2D fingerprints still valuable for drug discovery?

    Authors: Kaifu Gao, Duc Duy Nguyen, Vishnu Sresht, Alan M. Mathiowetz, Meihua Tu, Guo-Wei Wei

    Abstract: Recently, molecular fingerprints extracted from three-dimensional (3D) structures using advanced mathematics, such as algebraic topology, differential geometry, and graph theory have been paired with efficient machine learning, especially deep learning algorithms to outperform other methods in drug discovery applications and competitions. This raises the question of whether classical 2D fingerprin… ▽ More

    Submitted 3 November, 2019; originally announced November 2019.

  37. arXiv:1911.00484  [pdf, other

    cs.CL

    Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents

    Authors: Ming Tu, Kevin Huang, Guangtao Wang, Jing Huang, Xiaodong He, Bowen Zhou

    Abstract: Interpretable multi-hop reading comprehension (RC) over multiple documents is a challenging problem because it demands reasoning over multiple information sources and explaining the answer prediction by providing supporting evidences. In this paper, we propose an effective and interpretable Select, Answer and Explain (SAE) system to solve the multi-document RC problem. Our system first filters out… ▽ More

    Submitted 10 February, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

    Comments: Accepted to AAAI 2020

  38. arXiv:1906.04881  [pdf, other

    cs.LG stat.ML

    Multiple instance learning with graph neural networks

    Authors: Ming Tu, Jing Huang, Xiaodong He, Bowen Zhou

    Abstract: Multiple instance learning (MIL) aims to learn the mapping between a bag of instances and the bag-level label. In this paper, we propose a new end-to-end graph neural network (GNN) based algorithm for MIL: we treat each bag as a graph and use GNN to learn the bag embedding, in order to explore the useful structural information among instances in bags. The final graph representation is fed into a c… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: Accepted to ICML 2019 Workshop on Learning and Reasoning with Graph-Structured Representations

  39. arXiv:1905.07374  [pdf, other

    cs.CL

    Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs

    Authors: Ming Tu, Guangtao Wang, Jing Huang, Yun Tang, Xiaodong He, Bowen Zhou

    Abstract: Multi-hop reading comprehension (RC) across documents poses new challenge over single-document RC because it requires reasoning over multiple documents to reach the final answer. In this paper, we propose a new model to tackle the multi-hop RC problem. We introduce a heterogeneous graph with different types of nodes and edges, which is named as Heterogeneous Document-Entity (HDE) graph. The advant… ▽ More

    Submitted 4 June, 2019; v1 submitted 17 May, 2019; originally announced May 2019.

    Comments: To appear in ACL 2019

  40. arXiv:1904.07386  [pdf, other

    eess.AS cs.CL cs.SD

    I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

    Authors: Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda , et al. (21 additional authors not shown)

    Abstract: The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the res… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: 5 pages

  41. arXiv:1903.09606  [pdf, other

    eess.AS cs.SD

    Towards adversarial learning of speaker-invariant representation for speech emotion recognition

    Authors: Ming Tu, Yun Tang, Jing Huang, Xiaodong He, Bowen Zhou

    Abstract: Speech emotion recognition (SER) has attracted great attention in recent years due to the high demand for emotionally intelligent speech interfaces. Deriving speaker-invariant representations for speech emotion recognition is crucial. In this paper, we propose to apply adversarial training to SER to learn speaker-invariant representations. Our model consists of three parts: a representation learni… ▽ More

    Submitted 22 March, 2019; originally announced March 2019.

  42. arXiv:1812.03594   

    cs.IT

    New Perfect Nonlinear Functions over Finite Fields

    Authors: Jinquan Luo, Junru Ma, Min Tu

    Abstract: In this paper we present a new class of perfect nonlinear %Dembowski-Ostrom polynomials over $\mathbb{F}_{p^{2k}}$ for any odd prime $p$. In addition, we show that the new perfect nonlinear functions are CCZ-inequivalent to all the previously known perfect nonlinear functions in general.

    Submitted 3 May, 2019; v1 submitted 9 December, 2018; originally announced December 2018.

    Comments: This result is not new. It has been found by other researchers many years ago

  43. arXiv:1812.01834  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci

    Gate tuning from exciton superfluid to quantum anomalous Hall in van der Waals heterobilayer

    Authors: Qizhong Zhu, Matisse Wei-Yuan Tu, Qingjun Tong, Wang Yao

    Abstract: Van der Waals heterostructures of 2D materials provide a powerful approach towards engineering various quantum phases of matters. Examples include topological matters such as quantum spin Hall (QSH) insulator, and correlated matters such as exciton superfluid. It can be of great interest to realize these vastly different quantum matters on a common platform, however, their distinct origins tend to… ▽ More

    Submitted 5 December, 2018; originally announced December 2018.

    Comments: To appear in Science Advances

    Journal ref: Sci. Adv. 5, eaau6120 (2019)

  44. arXiv:1807.05285  [pdf

    cond-mat.mes-hall cond-mat.mtrl-sci

    Voltage Control of a van der Waals Spin-Filter Magnetic Tunnel Junction

    Authors: Tiancheng Song, Matisse Wei-Yuan Tu, Caitlin Carnahan, Xinghan Cai, Takashi Taniguchi, Kenji Watanabe, Michael A. McGuire, David H. Cobden, Di Xiao, Wang Yao, Xiaodong Xu

    Abstract: Atomically thin chromium triiodide (CrI3) has recently been identified as a layered antiferromagnetic insulator, in which adjacent ferromagnetic monolayers are antiferromagnetically coupled. This unusual magnetic structure naturally comprises a series of anti-aligned spin filters which can be utilized to make spin-filter magnetic tunnel junctions with very large tunneling magnetoresistance (TMR).… ▽ More

    Submitted 13 July, 2018; originally announced July 2018.

  45. arXiv:1807.01738  [pdf, other

    eess.AS cs.SD

    Investigating the role of L1 in automatic pronunciation evaluation of L2 speech

    Authors: Ming Tu, Anna Grabek, Julie Liss, Visar Berisha

    Abstract: Automatic pronunciation evaluation plays an important role in pronunciation training and second language education. This field draws heavily on concepts from automatic speech recognition (ASR) to quantify how close the pronunciation of non-native speech is to native-like pronunciation. However, it is known that the formation of accent is related to pronunciation patterns of both the target languag… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

    Comments: To appear in Interspeech 2018

  46. arXiv:1804.10325  [pdf, other

    eess.AS

    Simulating dysarthric speech for training data augmentation in clinical speech applications

    Authors: Yishan Jiao, Ming Tu, Visar Berisha, Julie Liss

    Abstract: Training machine learning algorithms for speech applications requires large, labeled training data sets. This is problematic for clinical applications where obtaining such data is prohibitively expensive because of privacy concerns or lack of access. As a result, clinical speech applications are typically developed using small data sets with only tens of speakers. In this paper, we propose a metho… ▽ More

    Submitted 26 April, 2018; originally announced April 2018.

    Comments: Will appear in Proc. of ICASSP 2018

  47. arXiv:1804.08663  [pdf, other

    eess.AS cs.SD

    A Discriminative Acoustic-Prosodic Approach for Measuring Local Entrainment

    Authors: Megan M. Willi, Stephanie A. Borrie, Tyson S. Barrett, Ming Tu, Visar Berisha

    Abstract: Acoustic-prosodic entrainment describes the tendency of humans to align or adapt their speech acoustics to each other in conversation. This alignment of spoken behavior has important implications for conversational success. However, modeling the subtle nature of entrainment in spoken dialogue continues to pose a challenge. In this paper, we propose a straightforward definition for local entrainmen… ▽ More

    Submitted 12 July, 2018; v1 submitted 23 April, 2018; originally announced April 2018.

  48. arXiv:1801.08679  [pdf

    cond-mat.mes-hall

    Giant Tunneling Magnetoresistance in Spin-Filter van der Waals Heterostructures

    Authors: Tiancheng Song, Xinghan Cai, Matisse Wei-Yuan Tu, Xiaoou Zhang, Bevin Huang, Nathan P. Wilson, Kyle L. Seyler, Lin Zhu, Takashi Taniguchi, Kenji Watanabe, Michael A. McGuire, David H. Cobden, Di Xiao, Wang Yao, Xiaodong Xu

    Abstract: Magnetic multilayer devices that exploit magnetoresistance are the backbone of magnetic sensing and data storage technologies. Here we report novel multiple-spin-filter magnetic tunnel junctions (sf-MTJs) based on van der Waals (vdW) heterostructures in which atomically thin chromium triiodide (CrI3) acts as a spin-filter tunnel barrier sandwiched between graphene contacts. We demonstrate tunnelin… ▽ More

    Submitted 26 January, 2018; originally announced January 2018.

    Comments: Submitted

  49. arXiv:1712.05370  [pdf, other

    cond-mat.mtrl-sci

    Stabilizing a high-pressure phase in InSb at ambient conditions with a laser-driven pressure pulse

    Authors: A. Jarnac, Xiaocui Wang, A. U. J Bengtsson, M. Burza, J. C. Ekstrom, H. Enquist, A. Jurgilaitis, N. Kretzschmar, A. I. H. Persson, C. M. Tu, M. Wulff, F. Dorchies, J. Larsson

    Abstract: In this letter, we describe the stabilization of indium antimonide (InSb) in the high-pressure orthorhombic phase (InSb-III) at ambient conditions. Until now, InSb-III has only been observed above 9 GPa, or at around 3 GPa as a metastable structure during the phase transition from cubic zinc blende (InSb-I) to orthorhombic InSb-IV. The crystalline phase transition from InSb-I to InSb-III was drive… ▽ More

    Submitted 14 December, 2017; originally announced December 2017.

  50. arXiv:1706.04813  [pdf, other

    cond-mat.mes-hall quant-ph

    Switchable valley functionalities of an $n-n^{-}-n$ junction in 2D semiconductors

    Authors: Matisse Wei-Yuan Tu, Wang Yao

    Abstract: We show that an $n-n^{-}-n$ junction in 2D semiconductors can flexibly realize two basic valleytronic functions, i.e. valley filter and valley source, with gate controlled switchability between the two. Upon carrier flux passing through the junction, the valley filter and valley source functions are enabled respectively by intra- and inter-valley scatterings, and the two functions dominate respect… ▽ More

    Submitted 15 June, 2017; originally announced June 2017.

    Journal ref: 2D Mater. 4 (2017) 025109