Skip to main content

Showing 1–50 of 61 results for author: Shao, C

  1. arXiv:2407.03835  [pdf, other

    cs.CV

    7th ABAW Competition: Multi-Task Learning and Compound Expression Recognition

    Authors: Dimitrios Kollias, Stefanos Zafeiriou, Irene Kotsia, Abhinav Dhall, Shreya Ghosh, Chunchang Shao, Guanyu Hu

    Abstract: This paper describes the 7th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with ECCV 2024. The 7th ABAW Competition addresses novel challenges in understanding human expressions and behaviors, crucial for the development of human-centered technologies. The Competition comprises of two sub-challenges: i) Multi-Task Learning… ▽ More

    Submitted 8 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2407.03833  [pdf, ps, other

    quant-ph cs.DS

    Quantum spectral method for gradient and Hessian estimation

    Authors: Yuxin Zhang, Changpeng Shao

    Abstract: Gradient descent is one of the most basic algorithms for solving continuous optimization problems. In [Jordan, PRL, 95(5):050501, 2005], Jordan proposed the first quantum algorithm for estimating gradients of functions close to linear, with exponential speedup in the black-box model. This quantum algorithm was greatly enhanced and developed by [Gilyén, Arunachalam, and Wiebe, SODA, pp. 1425-1444,… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 46 pages, 1 figure

  3. arXiv:2405.18922  [pdf, other

    cs.CL

    Understanding and Addressing the Under-Translation Problem from the Perspective of Decoding Objective

    Authors: Chenze Shao, Fandong Meng, Jiali Zeng, Jie Zhou

    Abstract: Neural Machine Translation (NMT) has made remarkable progress over the past years. However, under-translation and over-translation remain two challenging problems in state-of-the-art NMT systems. In this work, we conduct an in-depth analysis on the underlying cause of under-translation in NMT, providing an explanation from the perspective of decoding objective. To optimize the beam search objectiv… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: ACL 2024 main conference

  4. arXiv:2405.18906  [pdf, other

    cs.CL cs.LG

    Language Generation with Strictly Proper Scoring Rules

    Authors: Chenze Shao, Fandong Meng, Yijin Liu, Jie Zhou

    Abstract: Language generation based on maximum likelihood estimation (MLE) has become the fundamental approach for text generation. Maximum likelihood estimation is typically performed by minimizing the log-likelihood loss, also known as the logarithmic score in statistical decision theory. The logarithmic score is strictly proper in the sense that it encourages honest forecasts, where the expected score is… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  5. arXiv:2405.09557  [pdf, other

    eess.SP cs.LG

    Machine Learning in Short-Reach Optical Systems: A Comprehensive Survey

    Authors: Chen Shao, Elias Giacoumidis, Syed Moktacim Billah, Shi Li, Jialei Li, Prashasti Sahu, Andre Richter, Tobias Kaefer, Michael Faerber

    Abstract: In recent years, extensive research has been conducted to explore the utilization of machine learning algorithms in various direct-detected and self-coherent short-reach communication applications. These applications encompass a wide range of tasks, including bandwidth request prediction, signal quality monitoring, fault detection, traffic prediction, and digital signal processing (DSP)-based equa… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 23 pages, 2 figure, 3 tables, Accepted as MDPI Photonics Journal Speical Issue Machine Learning Applied to Optical Communication Systems

  6. arXiv:2405.02609  [pdf, other

    cs.LG

    Advanced Equalization in 112 Gb/s Upstream PON Using a Novel Fourier Convolution-based Network

    Authors: Chen Shao, Elias Giacoumidis, Patrick Matalla, Jialei Li, Shi Li, Sebastian Randel, Andre Richter, Michael Faerber, Tobias Kaefer

    Abstract: We experimentally demonstrate a novel, low-complexity Fourier Convolution-based Network (FConvNet) based equalizer for 112 Gb/s upstream PAM4-PON. At a BER of 0.005, FConvNet enhances the receiver sensitivity by 2 and 1 dB compared to a 51-tap Sato equalizer and benchmark machine learning algorithms respectively.

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 4 pages, 5 figures

  7. arXiv:2405.00720  [pdf, other

    eess.SP cs.LG

    A Novel Machine Learning-based Equalizer for a Downstream 100G PAM-4 PON

    Authors: Chen Shao, Elias Giacoumidis, Shi Li, Jialei Li, Michael Faerber, Tobias Kaefer, Andre Richter

    Abstract: A frequency-calibrated SCINet (FC-SCINet) equalizer is proposed for down-stream 100G PON with 28.7 dB path loss. At 5 km, FC-SCINet improves the BER by 88.87% compared to FFE and a 3-layer DNN with 10.57% lower complexity.

    Submitted 25 April, 2024; originally announced May 2024.

    Comments: 3 pages, 6 figures, accepted by Optical Fiber Communications Conference and Exhibition 2024

  8. arXiv:2404.13278  [pdf, other

    cs.LG cs.AI cs.DC eess.SP

    Federated Transfer Learning with Task Personalization for Condition Monitoring in Ultrasonic Metal Welding

    Authors: Ahmadreza Eslaminia, Yuquan Meng, Klara Nahrstedt, Chenhui Shao

    Abstract: Ultrasonic metal welding (UMW) is a key joining technology with widespread industrial applications. Condition monitoring (CM) capabilities are critically needed in UMW applications because process anomalies significantly deteriorate the joining quality. Recently, machine learning models emerged as a promising tool for CM in many manufacturing applications due to their ability to learn complex patt… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 37 pages, 8 figures

  9. arXiv:2402.19344  [pdf, other

    cs.CV

    The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition

    Authors: Dimitrios Kollias, Panagiotis Tzirakis, Alan Cowen, Stefanos Zafeiriou, Irene Kotsia, Alice Baird, Chris Gagne, Chunchang Shao, Guanyu Hu

    Abstract: This paper describes the 6th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with IEEE CVPR 2024. The 6th ABAW Competition addresses contemporary challenges in understanding human emotions and behaviors, crucial for the development of human-centered technologies. In more detail, the Competition focuses on affect related bench… ▽ More

    Submitted 12 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  10. arXiv:2402.15686  [pdf, other

    quant-ph cs.CC

    Lower bounds for quantum-inspired classical algorithms via communication complexity

    Authors: Nikhil S. Mande, Changpeng Shao

    Abstract: Quantum-inspired classical algorithms provide us with a new way to understand the computational power of quantum computers for practically-relevant problems, especially in machine learning. In the past several years, numerous efficient algorithms for various tasks have been found, while an analysis of lower bounds is still missing. Using communication complexity, in this work we propose the first… ▽ More

    Submitted 9 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: 23 pages, the paper is modified to make the results more clear

  11. arXiv:2402.11922  [pdf, other

    cs.LG

    Spatio-Temporal Few-Shot Learning via Diffusive Neural Network Generation

    Authors: Yuan Yuan, Chenyang Shao, Jingtao Ding, Depeng Jin, Yong Li

    Abstract: Spatio-temporal modeling is foundational for smart city applications, yet it is often hindered by data scarcity in many cities and regions. To bridge this gap, we propose a novel generative pre-training framework, GPD, for spatio-temporal few-shot learning with urban knowledge transfer. Unlike conventional approaches that heavily rely on common feature extraction or intricate few-shot learning des… ▽ More

    Submitted 25 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  12. arXiv:2402.09836  [pdf, other

    cs.AI

    Chain-of-Planned-Behaviour Workflow Elicits Few-Shot Mobility Generation in LLMs

    Authors: Chenyang Shao, Fengli Xu, Bingbing Fan, Jingtao Ding, Yuan Yuan, Meng Wang, Yong Li

    Abstract: The powerful reasoning capabilities of large language models (LLMs) have brought revolutionary changes to many fields, but their performance in human behaviour generation has not yet been extensively explored. This gap likely emerges because the internal processes governing behavioral intentions cannot be solely explained by abstract reasoning. Instead, they are also influenced by a multitude of f… ▽ More

    Submitted 5 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  13. arXiv:2401.10316  [pdf, other

    cs.IR cs.AI cs.LG

    Improving One-class Recommendation with Multi-tasking on Various Preference Intensities

    Authors: Chu-Jen Shao, Hao-Ming Fu, Pu-Jen Cheng

    Abstract: In the one-class recommendation problem, it's required to make recommendations basing on users' implicit feedback, which is inferred from their action and inaction. Existing works obtain representations of users and items by encoding positive and negative interactions observed from training data. However, these efforts assume that all positive signals from implicit feedback reflect a fixed prefere… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: RecSys 2020 (ACM Conference on Recommender Systems 2020)

    Journal ref: RecSys 2020: Proceedings of the 14th ACM Conference on Recommender Systems, Pages 498 to 502

  14. arXiv:2311.07941  [pdf, other

    cs.CL cs.AI

    Non-autoregressive Machine Translation with Probabilistic Context-free Grammar

    Authors: Shangtong Gui, Chenze Shao, Zhengrui Ma, Xishan Zhang, Yunji Chen, Yang Feng

    Abstract: Non-autoregressive Transformer(NAT) significantly accelerates the inference of neural machine translation. However, conventional NAT models suffer from limited expression power and performance degradation compared to autoregressive (AT) models due to the assumption of conditional independence among target tokens. To address these limitations, we propose a novel approach called PCFG-NAT, which leve… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023

  15. arXiv:2311.06999  [pdf, other

    quant-ph cs.CC

    Quantum and classical query complexities of functions of matrices

    Authors: Ashley Montanaro, Changpeng Shao

    Abstract: Let $A$ be an $s$-sparse Hermitian matrix, $f(x)$ be a univariate function, and $i, j$ be two indices. In this work, we investigate the query complexity of approximating $\bra{i} f(A) \ket{j}$. We show that for any continuous function $f(x):[-1,1]\rightarrow [-1,1]$, the quantum query complexity of computing $\bra{i} f(A) \ket{j}\pm \varepsilon/4$ is lower bounded by… ▽ More

    Submitted 20 February, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

    Comments: 35 pages, key results are enhanced

  16. arXiv:2310.17217  [pdf, other

    cs.CL cs.AI cs.LG

    Beyond MLE: Convex Learning for Text Generation

    Authors: Chenze Shao, Zhengrui Ma, Min Zhang, Yang Feng

    Abstract: Maximum likelihood estimation (MLE) is a statistical method used to estimate the parameters of a probability distribution that best explain the observed data. In the context of text generation, MLE is often used to train generative language models, which can then be used to generate new text. However, we argue that MLE is not always necessary and optimal, especially for closed-ended text generatio… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  17. arXiv:2310.14883  [pdf, other

    cs.CL cs.AI

    Non-autoregressive Streaming Transformer for Simultaneous Translation

    Authors: Zhengrui Ma, Shaolei Zhang, Shoutao Guo, Chenze Shao, Min Zhang, Yang Feng

    Abstract: Simultaneous machine translation (SiMT) models are trained to strike a balance between latency and translation quality. However, training these models to achieve high quality while maintaining low latency often leads to a tendency for aggressive anticipation. We argue that such issue stems from the autoregressive architecture upon which most existing SiMT models are built. To address those issues,… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 main conference; Source code is available at https://github.com/ictnlp/NAST

    ACM Class: I.2.7

  18. arXiv:2310.05377  [pdf, other

    cs.NE

    Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization

    Authors: Qiqi Duan, Chang Shao, Guochen Zhou, Minghan Zhang, Qi Zhao, Yuhui Shi

    Abstract: In the post-Moore era, main performance gains of black-box optimizers are increasingly depending on parallelism, especially for large-scale optimization (LSO). Here we propose to parallelize the well-established covariance matrix adaptation evolution strategy (CMA-ES) and in particular its one latest LSO variant called limited-memory CMA-ES (LM-CMA). To achieve efficiency while approximating its p… ▽ More

    Submitted 2 November, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  19. arXiv:2308.05756  [pdf, other

    eess.SP cs.LG

    WeldMon: A Cost-effective Ultrasonic Welding Machine Condition Monitoring System

    Authors: Beitong Tian, Kuan-Chieh Lu, Ahmadreza Eslaminia, Yaohui Wang, Chenhui Shao, Klara Nahrstedt

    Abstract: Ultrasonic welding machines play a critical role in the lithium battery industry, facilitating the bonding of batteries with conductors. Ensuring high-quality welding is vital, making tool condition monitoring systems essential for early-stage quality control. However, existing monitoring methods face challenges in cost, downtime, and adaptability. In this paper, we present WeldMon, an affordable… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: 9 pages, 5 figures

  20. arXiv:2308.01861  [pdf, other

    cs.CL cs.AI

    ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation

    Authors: Xueying Du, Mingwei Liu, Kaixin Wang, Hanlin Wang, Junwei Liu, Yixuan Chen, Jiayi Feng, Chaofeng Sha, Xin Peng, Yiling Lou

    Abstract: In this work, we make the first attempt to evaluate LLMs in a more challenging code generation scenario, i.e. class-level code generation. We first manually construct the first class-level code generation benchmark ClassEval of 100 class-level Python code generation tasks with approximately 500 person-hours. Based on it, we then perform the first study of 11 state-of-the-art LLMs on class-level co… ▽ More

    Submitted 14 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

  21. Spatio-temporal Diffusion Point Processes

    Authors: Yuan Yuan, Jingtao Ding, Chenyang Shao, Depeng Jin, Yong Li

    Abstract: Spatio-temporal point process (STPP) is a stochastic collection of events accompanied with time and space. Due to computational complexities, existing solutions for STPPs compromise with conditional independence between time and space, which consider the temporal and spatial distributions separately. The failure to model the joint distribution leads to limited capacities in characterizing the spat… ▽ More

    Submitted 24 June, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted by KDD23

  22. arXiv:2305.04266  [pdf, other

    cs.IT eess.SP

    Joint Analog Encoder Design for Multi-Task Oriented Wireless Communication

    Authors: Chenmin Sha, Shidong Zhou

    Abstract: In this paper we study multi-task oriented communication system via studying analog encoding method for multiple estimation tasks. The basic idea is to utilize the correlation among interested information required by different tasks and the feature of broadcast channel. For linear estimation tasks, we provide a low complexity design for multi-user multi-task system based on orthogonal decompositio… ▽ More

    Submitted 17 May, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

  23. arXiv:2304.05020  [pdf, other

    cs.NE

    Cooperative Coevolution for Non-Separable Large-Scale Black-Box Optimization: Convergence Analyses and Distributed Accelerations

    Authors: Qiqi Duan, Chang Shao, Guochen Zhou, Haobin Yang, Qi Zhao, Yuhui Shi

    Abstract: Given the ubiquity of non-separable optimization problems in real worlds, in this paper we analyze and extend the large-scale version of the well-known cooperative coevolution (CC), a divide-and-conquer black-box optimization framework, on non-separable functions. First, we reveal empirical reasons of when decomposition-based methods are preferred or not in practice on some non-separable large-sca… ▽ More

    Submitted 14 May, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

  24. arXiv:2303.06662  [pdf, other

    cs.CL

    Fuzzy Alignments in Directed Acyclic Graph for Non-Autoregressive Machine Translation

    Authors: Zhengrui Ma, Chenze Shao, Shangtong Gui, Min Zhang, Yang Feng

    Abstract: Non-autoregressive translation (NAT) reduces the decoding latency but suffers from performance degradation due to the multi-modality problem. Recently, the structure of directed acyclic graph has achieved great success in NAT, which tackles the multi-modality problem by introducing dependency between vertices. However, training it with negative log-likelihood loss implicitly requires a strict alig… ▽ More

    Submitted 17 July, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

    Comments: ICLR 2023

    ACM Class: I.2.7

  25. arXiv:2302.11430  [pdf

    physics.comp-ph cs.AI physics.bio-ph

    Differentiable Rotamer Sampling with Molecular Force Fields

    Authors: Congzhou M. Sha, Jian Wang, Nikolay V. Dokholyan

    Abstract: Molecular dynamics is the primary computational method by which modern structural biology explores macromolecule structure and function. Boltzmann generators have been proposed as an alternative to molecular dynamics, by replacing the integration of molecular systems over time with the training of generative neural networks. This neural network approach to MD samples rare events at a higher rate t… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: 41 pages, 1 graphical abstract, 5 figures

  26. arXiv:2301.06107  [pdf, other

    quant-ph cs.CC

    Quantum speedup of leverage score sampling and its application

    Authors: Changpeng Shao

    Abstract: Leverage score sampling is crucial to the design of randomized algorithms for large-scale matrix problems, while the computation of leverage scores is a bottleneck of many applications. In this paper, we propose a quantum algorithm to accelerate this useful method. The speedup is at least quadratic and could be exponential for well-conditioned matrices. We also prove some quantum lower bounds, whi… ▽ More

    Submitted 16 September, 2023; v1 submitted 15 January, 2023; originally announced January 2023.

    Comments: 23 pages, the paper is shortened and the main results are stated more clearly

  27. arXiv:2301.01448  [pdf, other

    eess.IV cs.CV

    A deep local attention network for pre-operative lymph node metastasis prediction in pancreatic cancer via multiphase CT imaging

    Authors: Zhilin Zheng, Xu Fang, Jiawen Yao, Mengmeng Zhu, Le Lu, Lingyun Huang, Jing Xiao, Yu Shi, Hong Lu, Jianping Lu, Ling Zhang, Chengwei Shao, Yun Bian

    Abstract: Lymph node (LN) metastasis status is one of the most critical prognostic and cancer staging factors for patients with resectable pancreatic ductal adenocarcinoma (PDAC), or in general, for any types of solid malignant tumors. Preoperative prediction of LN metastasis from non-invasive CT imaging is highly desired, as it might be straightforwardly used to guide the following neoadjuvant treatment de… ▽ More

    Submitted 4 January, 2023; originally announced January 2023.

    Comments: 14 pages,5 figures

  28. arXiv:2212.05652  [pdf, other

    cs.NE

    PyPop7: A Pure-Python Library for Population-Based Black-Box Optimization

    Authors: Qiqi Duan, Guochen Zhou, Chang Shao, Zhuowei Wang, Mingyang Feng, Yuwei Huang, Yajing Tan, Yijun Yang, Qi Zhao, Yuhui Shi

    Abstract: In this paper, we present an open-source pure-Python library called PyPop7 for black-box optimization (BBO). As population-based methods (e.g., evolutionary algorithms, swarm intelligence, and pattern search) become increasingly popular for BBO, the design goal of PyPop7 is to provide a unified API and elegant implementations for them, particularly in challenging high-dimensional scenarios. Since… ▽ More

    Submitted 5 July, 2024; v1 submitted 11 December, 2022; originally announced December 2022.

    Comments: 28 pages

  29. arXiv:2211.16863  [pdf, other

    cs.CL

    Rephrasing the Reference for Non-Autoregressive Machine Translation

    Authors: Chenze Shao, Jinchao Zhang, Jie Zhou, Yang Feng

    Abstract: Non-autoregressive neural machine translation (NAT) models suffer from the multi-modality problem that there may exist multiple possible translations of a source sentence, so the reference sentence may be inappropriate for the training when the NAT output is closer to other translations. In response to this problem, we introduce a rephraser to provide a better training target for NAT by rephrasing… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

    Comments: AAAI 2023

  30. arXiv:2211.13979  [pdf, other

    cs.LG q-bio.BM

    BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular Representation

    Authors: Zhen Wang, Zheng Feng, Yanjun Li, Bowen Li, Yongrui Wang, Chulin Sha, Min He, Xiaolin Li

    Abstract: Although substantial efforts have been made using graph neural networks (GNNs) for AI-driven drug discovery (AIDD), effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream m… ▽ More

    Submitted 5 November, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: 19 pages, 6 figures, Accepted by Briefings in Bioinformatics in 17-Oct-2023

  31. arXiv:2210.09517  [pdf, other

    cs.CE q-bio.QM

    Graph neural networks to learn joint representations of disjoint molecular graphs

    Authors: Chen Shao, Zhou Chen, Pascal Friederich

    Abstract: Graph neural networks are widely used to learn global representations of graphs, which are then used for regression or classification tasks. Typically, the graphs in such data sets are connected, i.e. each training sample consists of a single internally connected graph associated with a global label. However, there is a wide variety of yet unconsidered but application-relevant tasks, where labels… ▽ More

    Submitted 30 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: 5 pages, 4 figures

  32. arXiv:2210.05193  [pdf, other

    cs.CL

    Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation

    Authors: Chenze Shao, Zhengrui Ma, Yang Feng

    Abstract: Non-autoregressive models achieve significant decoding speedup in neural machine translation but lack the ability to capture sequential dependency. Directed Acyclic Transformer (DA-Transformer) was recently proposed to model sequential dependency with a directed acyclic graph. Consequently, it has to apply a sequential decision process at inference time, which harms the global translation accuracy… ▽ More

    Submitted 2 March, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  33. arXiv:2210.03953  [pdf, other

    cs.CL

    Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation

    Authors: Chenze Shao, Yang Feng

    Abstract: Non-autoregressive translation (NAT) models are typically trained with the cross-entropy loss, which forces the model outputs to be aligned verbatim with the target sentence and will highly penalize small shifts in word positions. Latent alignment models relax the explicit alignment by marginalizing out all monotonic latent alignments with the CTC loss. However, they cannot handle non-monotonic al… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  34. arXiv:2210.01601  [pdf, other

    quant-ph cs.CC

    Quantum communication complexity of linear regression

    Authors: Ashley Montanaro, Changpeng Shao

    Abstract: Quantum computers may achieve speedups over their classical counterparts for solving linear algebra problems. However, in some cases -- such as for low-rank matrices -- dequantized algorithms demonstrate that there cannot be an exponential quantum speedup. In this work, we show that quantum computers have provable polynomial and exponential speedups in terms of communication complexity for some fu… ▽ More

    Submitted 14 May, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: 34 pages, updated some minor typos, and added one new section on the connection between dequantized algorithms and communication complexity

  35. arXiv:2208.09481  [pdf, other

    physics.chem-ph cond-mat.mtrl-sci cs.LG

    Graph neural networks for materials science and chemistry

    Authors: Patrick Reiser, Marlen Neubert, André Eberhard, Luca Torresi, Chen Zhou, Chen Shao, Houssam Metni, Clint van Hoesel, Henrik Schopmans, Timo Sommer, Pascal Friederich

    Abstract: Machine learning plays an increasingly important role in many areas of chemistry and materials science, e.g. to predict materials properties, to accelerate simulations, to design new materials, and to predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.

    Comments: 37 pages, 2 figures

  36. arXiv:2205.14333  [pdf, other

    cs.CL

    One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation

    Authors: Chenze Shao, Xuanfu Wu, Yang Feng

    Abstract: Non-autoregressive neural machine translation (NAT) suffers from the multi-modality problem: the source sentence may have multiple correct translations, but the loss function is calculated only according to the reference sentence. Sequence-level knowledge distillation makes the target more deterministic by replacing the target with the output from an autoregressive model. However, the multi-modali… ▽ More

    Submitted 28 May, 2022; originally announced May 2022.

    Comments: NAACL 2022 main conference

    ACM Class: I.2.7

  37. arXiv:2203.03910  [pdf, other

    cs.CL cs.AI

    Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced Training for Neural Machine Translation

    Authors: Chenze Shao, Yang Feng

    Abstract: Neural networks tend to gradually forget the previously learned knowledge when learning multiple tasks sequentially from dynamic data distributions. This problem is called \textit{catastrophic forgetting}, which is a fundamental challenge in the continual learning of neural networks. In this work, we observe that catastrophic forgetting not only occurs in continual learning but also affects the tr… ▽ More

    Submitted 18 March, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: ACL 2022 main conference

    ACM Class: I.2.7

  38. Reinforcement Learning Based Sparse Black-box Adversarial Attack on Video Recognition Models

    Authors: Zeyuan Wang, Chaofeng Sha, Su Yang

    Abstract: We explore the black-box adversarial attack on video recognition models. Attacks are only performed on selected key regions and key frames to reduce the high computation cost of searching adversarial perturbations on a video due to its high dimensionality. To select key frames, one way is to use heuristic algorithms to evaluate the importance of each frame and choose the essential ones. However, i… ▽ More

    Submitted 29 August, 2021; originally announced August 2021.

    Comments: Accepted as a conference paper of IJCAI-21 (the 30th International Joint Conference on Artificial Intelligence)

    Journal ref: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21), pages 3162-3168, 2021

  39. arXiv:2106.08122  [pdf, other

    cs.CL cs.LG

    Sequence-Level Training for Non-Autoregressive Neural Machine Translation

    Authors: Chenze Shao, Yang Feng, Jinchao Zhang, Fandong Meng, Jie Zhou

    Abstract: In recent years, Neural Machine Translation (NMT) has achieved notable results in various translation tasks. However, the word-by-word generation manner determined by the autoregressive mechanism leads to high translation latency of the NMT and restricts its low-latency applications. Non-Autoregressive Neural Machine Translation (NAT) removes the autoregressive mechanism and achieves significant d… ▽ More

    Submitted 1 September, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: Computational Linguistics Journal

    ACM Class: I.2.7

  40. arXiv:2106.06751  [pdf, other

    cs.CL

    Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation

    Authors: Yang Feng, Shuhao Gu, Dengji Guo, Zhengxin Yang, Chenze Shao

    Abstract: Although teacher forcing has become the main training paradigm for neural machine translation, it usually makes predictions only conditioned on past information, and hence lacks global planning for the future. To address this problem, we introduce another decoder, called seer decoder, into the encoder-decoder framework during training, which involves future information in target predictions. Meanw… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: Accepted by ACL-IJCNLP 2021 main conference

  41. arXiv:2104.11897  [pdf, other

    cs.CL

    Modeling Coverage for Non-Autoregressive Neural Machine Translation

    Authors: Yong Shan, Yang Feng, Chenze Shao

    Abstract: Non-Autoregressive Neural Machine Translation (NAT) has achieved significant inference speedup by generating all tokens simultaneously. Despite its high efficiency, NAT usually suffers from two kinds of translation errors: over-translation (e.g. repeated tokens) and under-translation (e.g. missing translations), which eventually limits the translation quality. In this paper, we argue that these is… ▽ More

    Submitted 24 April, 2021; originally announced April 2021.

    Comments: Accepted by the 2021 International Joint Conference on Neural Networks (IJCNN 2021)

  42. arXiv:2104.00746  [pdf, other

    cs.ET cs.LG quant-ph

    Drug Discovery Approaches using Quantum Machine Learning

    Authors: Junde Li, Mahabubul Alam, Congzhou M Sha, Jian Wang, Nikolay V. Dokholyan, Swaroop Ghosh

    Abstract: Traditional drug discovery pipeline takes several years and cost billions of dollars. Deep generative and predictive models are widely adopted to assist in drug development. Classical machines cannot efficiently produce atypical patterns of quantum computers which might improve the training quality of learning tasks. We propose a suite of quantum machine learning techniques e.g., generative advers… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: Li and Alam contributed equally to this work. arXiv admin note: text overlap with arXiv:2101.03438

  43. arXiv:2012.15161  [pdf, other

    physics.soc-ph cs.SI physics.med-ph

    Universal Urban Spreading Pattern of COVID-19 and Its Underlying Mechanism

    Authors: Yongtao Zhang, Hongshen Zhang, Mincheng Wu, Shibo He, Yi Fang, Yanggang Cheng, Zhiguo Shi, Cunqi Shao, Chao Li, Songmin Ying, Zhenyu Gong, Yu Liu, Xinjiang Ye, Jinlai Chen, Youxian Sun, Jiming Chen, H. Eugene Stanley

    Abstract: Currently, the global situation of COVID-19 is aggravating, pressingly calling for efficient control and prevention measures. Understanding spreading pattern of COVID-19 has been widely recognized as a vital step for implementing non-pharmaceutical measures. Previous studies investigated such an issue in large-scale (e.g., inter-country or inter-state) scenarios while urban spreading pattern still… ▽ More

    Submitted 30 December, 2020; originally announced December 2020.

  44. arXiv:2011.08611  [pdf, other

    quant-ph cs.LG

    Quantum algorithms for learning a hidden graph and beyond

    Authors: Ashley Montanaro, Changpeng Shao

    Abstract: We study the problem of learning an unknown graph provided via an oracle using a quantum algorithm. We consider three query models. In the first model ("OR queries"), the oracle returns whether a given subset of the vertices contains any edges. In the second ("parity queries"), the oracle returns the parity of the number of edges in a subset. In the third model, we are given copies of the graph st… ▽ More

    Submitted 23 January, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

    Comments: 24 pages, some typos are fixed, the title is changed a little bit

  45. arXiv:2011.06475  [pdf, other

    quant-ph cs.AI cs.DS cs.LG

    Quantum algorithms for spectral sums

    Authors: Alessandro Luongo, Changpeng Shao

    Abstract: We propose new quantum algorithms for estimating spectral sums of positive semi-definite (PSD) matrices. The spectral sum of an PSD matrix $A$, for a function $f$, is defined as $ \text{Tr}[f(A)] = \sum_j f(λ_j)$, where $λ_j$ are the eigenvalues of $A$. Typical examples of spectral sums are the von Neumann entropy, the trace of $A^{-1}$, the log-determinant, and the Schatten $p$-norm, where the la… ▽ More

    Submitted 10 June, 2024; v1 submitted 12 November, 2020; originally announced November 2020.

  46. arXiv:2010.08178  [pdf, other

    cs.CL cs.AI

    Generating Diverse Translation from Model Distribution with Dropout

    Authors: Xuanfu Wu, Yang Feng, Chenze Shao

    Abstract: Despite the improvement of translation quality, neural machine translation (NMT) often suffers from the lack of diversity in its generation. In this paper, we propose to generate diverse translations by deriving a large number of possible models with Bayesian modelling and sampling models from them for inference. The possible models are obtained by applying concrete dropout to the NMT model and ea… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

  47. arXiv:2006.10633  [pdf, other

    cs.SI

    A Multi-View Approach Based on Naming Behavioral Modeling for Aligning Chinese User Accounts across Multiple Networks

    Authors: Junxing Zhu, Xiang Wang, Qiang Liu, Xiaoyong Li, Chengcheng Shao, Bin Zhou

    Abstract: Hundreds of millions of Chinese people have become social network users in recent years, and aligning the accounts of common Chinese users across multiple social networks is valuable to many inter-network applications, e.g., cross-network recommendation, cross-network link prediction. Many methods have explored the proper ways of utilizing account name information into aligning the common English… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

  48. arXiv:1912.00178  [pdf, other

    cs.CL cs.LG

    Modeling Fluency and Faithfulness for Diverse Neural Machine Translation

    Authors: Yang Feng, Wanying Xie, Shuhao Gu, Chenze Shao, Wen Zhang, Zhengxin Yang, Dong Yu

    Abstract: Neural machine translation models usually adopt the teacher forcing strategy for training which requires the predicted sequence matches ground truth word by word and forces the probability of each prediction to approach a 0-1 distribution. However, the strategy casts all the portion of the distribution to the ground truth word and ignores other words in the target vocabulary even when the ground t… ▽ More

    Submitted 30 November, 2019; originally announced December 2019.

  49. arXiv:1911.09320  [pdf, other

    cs.CL cs.LG

    Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation

    Authors: Chenze Shao, Jinchao Zhang, Yang Feng, Fandong Meng, Jie Zhou

    Abstract: Non-Autoregressive Neural Machine Translation (NAT) achieves significant decoding speedup through generating target words independently and simultaneously. However, in the context of non-autoregressive translation, the word-level cross-entropy loss cannot model the target-side sequential dependency properly, leading to its weak correlation with the translation quality. As a result, NAT tends to ge… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

    Comments: AAAI 2020

    ACM Class: I.2.7

  50. arXiv:1910.14025  [pdf, ps, other

    cs.IR cs.CL cs.LG stat.ML

    Graph Neural News Recommendation with Long-term and Short-term Interest Modeling

    Authors: Linmei Hu, Chen Li, Chuan Shi, Cheng Yang, Chao Shao

    Abstract: With the information explosion of news articles, personalized news recommendation has become important for users to quickly find news that they are interested in. Existing methods on news recommendation mainly include collaborative filtering methods which rely on direct user-item interactions and content based methods which characterize the content of user reading history. Although these methods h… ▽ More

    Submitted 7 November, 2019; v1 submitted 30 October, 2019; originally announced October 2019.