subscribe to arXiv mailings

C$^3$P-VoxelMap: Compact, Cumulative and Coalescible Probabilistic Voxel Mapping

Authors: Xu Yang, Wenhao Li, Qijie Ge, Lulu Suo, Weijie Tang, Zhengyu Wei, Longxiang Huang, Bo Wang

Abstract: This work presents a compact, cumulative and coalescible probabilistic voxel mapping method to enhance performance, accuracy and memory efficiency in LiDAR odometry. Probabilistic voxel mapping requires storing past point clouds and re-iterating on them to update the uncertainty every iteration, which consumes large memory space and CPU cycles. To solve this problem, we propose a two-folded strate… ▽ More This work presents a compact, cumulative and coalescible probabilistic voxel mapping method to enhance performance, accuracy and memory efficiency in LiDAR odometry. Probabilistic voxel mapping requires storing past point clouds and re-iterating on them to update the uncertainty every iteration, which consumes large memory space and CPU cycles. To solve this problem, we propose a two-folded strategy. First, we introduce a compact point-free representation for probabilistic voxels and derive a cumulative update of the planar uncertainty without caching original point clouds. Our voxel structure only keeps track of a predetermined set of statistics for points that lie inside it. This method reduces the runtime complexity from $O(MN)$ to $O(N)$ and the space complexity from $O(N)$ to $O(1)$ where $M$ is the number of iterations and $N$ is the number of points. Second, to further minimize memory usage and enhance mapping accuracy, we provide a strategy to dynamically merge voxels associated with the same physical planes by taking advantage of the geometric features in the real world. Rather than scanning for these coalescible voxels constantly at every iteration, our merging strategy accumulates voxels in a locality-sensitive hash and triggers merging lazily. On-demand merging not only reduces memory footprint with minimal computational overhead but also improves localization accuracy thanks to cross-voxel denoising. Experiments exhibit 20% higher accuracy, 20% faster performance and 70% lower memory consumption than the state-of-the-art. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2401.17633 [pdf, other]

Navigating the OverKill in Large Language Models

Authors: Chenyu Shi, Xiao Wang, Qiming Ge, Songyang Gao, Xianjun Yang, Tao Gui, Qi Zhang, Xuanjing Huang, Xun Zhao, Dahua Lin

Abstract: Large language models are meticulously aligned to be both helpful and harmless. However, recent research points to a potential overkill which means models may refuse to answer benign queries. In this paper, we investigate the factors for overkill by exploring how models handle and determine the safety of queries. Our findings reveal the presence of shortcuts within models, leading to an over-atten… ▽ More Large language models are meticulously aligned to be both helpful and harmless. However, recent research points to a potential overkill which means models may refuse to answer benign queries. In this paper, we investigate the factors for overkill by exploring how models handle and determine the safety of queries. Our findings reveal the presence of shortcuts within models, leading to an over-attention of harmful words like 'kill' and prompts emphasizing safety will exacerbate overkill. Based on these insights, we introduce Self-Contrastive Decoding (Self-CD), a training-free and model-agnostic strategy, to alleviate this phenomenon. We first extract such over-attention by amplifying the difference in the model's output distributions when responding to system prompts that either include or omit an emphasis on safety. Then we determine the final next-token predictions by downplaying the over-attention from the model via contrastive decoding. Empirical results indicate that our method has achieved an average reduction of the refusal rate by 20\% while having almost no impact on safety. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.11458 [pdf, other]

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

Authors: Songyang Gao, Qiming Ge, Wei Shen, Shihan Dou, Junjie Ye, Xiao Wang, Rui Zheng, Yicheng Zou, Zhi Chen, Hang Yan, Qi Zhang, Dahua Lin

Abstract: The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by complex annotation and training requirements. This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to d… ▽ More The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by complex annotation and training requirements. This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to diverse human preferences. In this work, we introduce \textit{Linear Alignment}, a novel algorithm that aligns language models with human preferences in one single inference step, eliminating the reliance on data annotation and model training. Linear alignment incorporates a new parameterization for policy optimization under divergence constraints, which enables the extraction of optimal policy in a closed-form manner and facilitates the direct estimation of the aligned response. Extensive experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment across diverse scenarios. Our code and dataset is published on \url{https://github.com/Wizardcoast/Linear_Alignment.git}. △ Less

Submitted 1 July, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

Comments: Accepted by ICML2024, I'm still preparing a better vision

arXiv:2311.03275 [pdf, other]

HetCAN: A Heterogeneous Graph Cascade Attention Network with Dual-Level Awareness

Authors: Zeyuan Zhao, Qingqing Ge, Anfeng Cheng, Yiding Liu, Xiang Li, Shuaiqiang Wang

Abstract: Heterogeneous graph neural networks(HGNNs) have recently shown impressive capability in modeling heterogeneous graphs that are ubiquitous in real-world applications. Most existing methods for heterogeneous graphs mainly learn node embeddings by stacking multiple convolutional or attentional layers, which can be considered as capturing the high-order information from node-level aspect. However, dif… ▽ More Heterogeneous graph neural networks(HGNNs) have recently shown impressive capability in modeling heterogeneous graphs that are ubiquitous in real-world applications. Most existing methods for heterogeneous graphs mainly learn node embeddings by stacking multiple convolutional or attentional layers, which can be considered as capturing the high-order information from node-level aspect. However, different types of nodes in heterogeneous graphs have diverse features, it is also necessary to capture interactions among node features, namely the high-order information from feature-level aspect. In addition, most methods first align node features by mapping them into one same low-dimensional space, while they may lose some type information of nodes in this way. To address these problems, in this paper, we propose a novel Heterogeneous graph Cascade Attention Network (HetCAN) composed of multiple cascade blocks. Each cascade block includes two components, the type-aware encoder and the dimension-aware encoder. Specifically, the type-aware encoder compensates for the loss of node type information and aims to make full use of graph heterogeneity. The dimension-aware encoder is able to learn the feature-level high-order information by capturing the interactions among node features. With the assistance of these components, HetCAN can comprehensively encode information of node features, graph heterogeneity and graph structure in node embeddings. Extensive experiments demonstrate the superiority of HetCAN over advanced competitors and also exhibit its efficiency and robustness. △ Less

Submitted 29 May, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: Accepted by ECML-PKDD 2024

arXiv:2311.02116 [pdf, other]

Resist Label Noise with PGM for Graph Neural Networks

Authors: Qingqing Ge, Jianxiang Yu, Zeyuan Zhao, Xiang Li

Abstract: While robust graph neural networks (GNNs) have been widely studied for graph perturbation and attack, those for label noise have received significantly less attention. Most existing methods heavily rely on the label smoothness assumption to correct noisy labels, which adversely affects their performance on heterophilous graphs. Further, they generally perform poorly in high noise-rate scenarios. T… ▽ More While robust graph neural networks (GNNs) have been widely studied for graph perturbation and attack, those for label noise have received significantly less attention. Most existing methods heavily rely on the label smoothness assumption to correct noisy labels, which adversely affects their performance on heterophilous graphs. Further, they generally perform poorly in high noise-rate scenarios. To address these problems, in this paper, we propose a novel probabilistic graphical model (PGM) based framework LNP. Given a noisy label set and a clean label set, our goal is to maximize the likelihood of labels in the clean set. We first present LNP-v1, which generates clean labels based on graphs only in the Bayesian network. To further leverage the information of clean labels in the noisy label set, we put forward LNP-v2, which incorporates the noisy label set into the Bayesian network to generate clean labels. The generative process can then be used to predict labels for unlabeled nodes. We conduct extensive experiments to show the robustness of LNP on varying noise types and rates, and also on graphs with different heterophilies. In particular, we show that LNP can lead to inspiring performance in high noise-rate situations. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2310.17394 [pdf, other]

PSP: Pre-Training and Structure Prompt Tuning for Graph Neural Networks

Authors: Qingqing Ge, Zeyuan Zhao, Yiding Liu, Anfeng Cheng, Xiang Li, Shuaiqiang Wang, Dawei Yin

Abstract: Graph Neural Networks (GNNs) are powerful in learning semantics of graph data. Recently, a new paradigm "pre-train and prompt" has shown promising results in adapting GNNs to various tasks with less supervised data. The success of such paradigm can be attributed to the more consistent objectives of pre-training and task-oriented prompt tuning, where the pre-trained knowledge can be effectively tra… ▽ More Graph Neural Networks (GNNs) are powerful in learning semantics of graph data. Recently, a new paradigm "pre-train and prompt" has shown promising results in adapting GNNs to various tasks with less supervised data. The success of such paradigm can be attributed to the more consistent objectives of pre-training and task-oriented prompt tuning, where the pre-trained knowledge can be effectively transferred to downstream tasks. Most existing methods are based on the class prototype vector framework. However, in the few-shot scenarios, given few labeled data, class prototype vectors are difficult to be accurately constructed or learned. Meanwhile, the structure information of graph is usually exploited during pre-training for learning node representations, while neglected in the prompt tuning stage for learning more accurate prototype vectors. In addition, they generally ignore the impact of heterophilous neighborhoods on node representation and are not suitable for heterophilous graphs. To bridge these gaps, we propose a novel pre-training and structure prompt tuning framework for GNNs, namely PSP, which consistently exploits structure information in both pre-training and prompt tuning stages. In particular, PSP 1) employs a dual-view contrastive learning to align the latent semantic spaces of node attributes and graph structure, and 2) incorporates structure information in prompted graph to construct more accurate prototype vectors and elicit more pre-trained knowledge in prompt tuning. We conduct extensive experiments on node classification and graph classification tasks to evaluate the effectiveness of PSP. We show that PSP can lead to superior performance in few-shot scenarios on both homophilous and heterophilous graphs. The implemented code is available at https://github.com/gqq1210/PSP. △ Less

Submitted 1 June, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.14152 [pdf, other]

Orthogonal Subspace Learning for Language Model Continual Learning

Authors: Xiao Wang, Tianze Chen, Qiming Ge, Han Xia, Rong Bao, Rui Zheng, Qi Zhang, Tao Gui, Xuanjing Huang

Abstract: Benefiting from massive corpora and advanced hardware, large language models (LLMs) exhibit remarkable capabilities in language understanding and generation. However, their performance degrades in scenarios where multiple tasks are encountered sequentially, also known as catastrophic forgetting. In this paper, we propose orthogonal low-rank adaptation (O-LoRA), a simple and efficient approach for… ▽ More Benefiting from massive corpora and advanced hardware, large language models (LLMs) exhibit remarkable capabilities in language understanding and generation. However, their performance degrades in scenarios where multiple tasks are encountered sequentially, also known as catastrophic forgetting. In this paper, we propose orthogonal low-rank adaptation (O-LoRA), a simple and efficient approach for continual learning in language models, effectively mitigating catastrophic forgetting while learning new tasks. Specifically, O-LoRA learns tasks in different (low-rank) vector subspaces that are kept orthogonal to each other in order to minimize interference. Our method induces only marginal additional parameter costs and requires no user data storage for replay. Experimental results on continual learning benchmarks show that our method outperforms state-of-the-art methods. Furthermore, compared to previous approaches, our method excels in preserving the generalization ability of LLMs on unseen tasks. △ Less

Submitted 21 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 findings

arXiv:2302.03498 [pdf, other]

MAC: A unified framework boosting low resource automatic speech recognition

Authors: Zeping Min, Qian Ge, Zhong Li, Weinan E

Abstract: We propose a unified framework for low resource automatic speech recognition tasks named meta audio concatenation (MAC). It is easy to implement and can be carried out in extremely low resource environments. Mathematically, we give a clear description of MAC framework from the perspective of bayesian sampling. In this framework, we leverage a novel concatenative synthesis text-to-speech system to… ▽ More We propose a unified framework for low resource automatic speech recognition tasks named meta audio concatenation (MAC). It is easy to implement and can be carried out in extremely low resource environments. Mathematically, we give a clear description of MAC framework from the perspective of bayesian sampling. In this framework, we leverage a novel concatenative synthesis text-to-speech system to boost the low resource ASR task. By the concatenative synthesis text-to-speech system, we can integrate language pronunciation rules and adjust the TTS process. Furthermore, we propose a broad notion of meta audio set to meet the modeling needs of different languages and different scenes when using the system. Extensive experiments have demonstrated the great effectiveness of MAC on low resource ASR tasks. For CTC greedy search, CTC prefix, attention, and attention rescoring decode mode in Cantonese ASR task, Taiwanese ASR task, and Japanese ASR task the MAC method can reduce the CER by more than 15\%. Furthermore, in the ASR task, MAC beats wav2vec2 (with fine-tuning) on common voice datasets of Cantonese and gets really competitive results on common voice datasets of Taiwanese and Japanese. Among them, it is worth mentioning that we achieve a \textbf{10.9\%} character error rate (CER) on the common voice Cantonese ASR task, bringing about \textbf{30\%} relative improvement compared to the wav2vec2 (with fine-tuning). △ Less

Submitted 15 February, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

arXiv:2212.13847 [pdf, other]

doi 10.1109/TKDE.2024.3377431

Heterogeneous Graph Contrastive Learning with Meta-path Contexts and Adaptively Weighted Negative Samples

Authors: Jianxiang Yu, Qingqing Ge, Xiang Li, Aoying Zhou

Abstract: Heterogeneous graph contrastive learning has received wide attention recently. Some existing methods use meta-paths, which are sequences of object types that capture semantic relationships between objects, to construct contrastive views. However, most of them ignore the rich meta-path context information that describes how two objects are connected by meta-paths. Further, they fail to distinguish… ▽ More Heterogeneous graph contrastive learning has received wide attention recently. Some existing methods use meta-paths, which are sequences of object types that capture semantic relationships between objects, to construct contrastive views. However, most of them ignore the rich meta-path context information that describes how two objects are connected by meta-paths. Further, they fail to distinguish negative samples, which could adversely affect the model performance. To address the problems, we propose MEOW, which considers both meta-path contexts and weighted negative samples. Specifically, MEOW constructs a coarse view and a fine-grained view for contrast. The former reflects which objects are connected by meta-paths, while the latter uses meta-path contexts and characterizes details on how the objects are connected. Then, we theoretically analyze the InfoNCE loss and recognize its limitations for computing gradients of negative samples. To better distinguish negative samples, we learn hard-valued weights for them based on node clustering and use prototypical contrastive learning to pull close embeddings of nodes in the same cluster. In addition, we propose a variant model AdaMEOW that adaptively learns soft-valued weights of negative samples to further improve node representation. Finally, we conduct extensive experiments to show the superiority of MEOW and AdaMEOW against other state-of-the-art methods. △ Less

Submitted 5 April, 2024; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: This paper has been accepted by TKDE as a regular paper

arXiv:2211.10039 [pdf, other]

Why the pseudo label based semi-supervised learning algorithm is effective?

Authors: Zeping Min, Qian Ge, Cheng Tai

Abstract: Recently, pseudo label based semi-supervised learning has achieved great success in many fields. The core idea of the pseudo label based semi-supervised learning algorithm is to use the model trained on the labeled data to generate pseudo labels on the unlabeled data, and then train a model to fit the previously generated pseudo labels. We give a theory analysis for why pseudo label based semi-sup… ▽ More Recently, pseudo label based semi-supervised learning has achieved great success in many fields. The core idea of the pseudo label based semi-supervised learning algorithm is to use the model trained on the labeled data to generate pseudo labels on the unlabeled data, and then train a model to fit the previously generated pseudo labels. We give a theory analysis for why pseudo label based semi-supervised learning is effective in this paper. We mainly compare the generalization error of the model trained under two settings: (1) There are N labeled data. (2) There are N unlabeled data and a suitable initial model. Our analysis shows that, firstly, when the amount of unlabeled data tends to infinity, the pseudo label based semi-supervised learning algorithm can obtain model which have the same generalization error upper bound as model obtained by normally training in the condition of the amount of labeled data tends to infinity. More importantly, we prove that when the amount of unlabeled data is large enough, the generalization error upper bound of the model obtained by pseudo label based semi-supervised learning algorithm can converge to the optimal upper bound with linear convergence rate. We also give the lower bound on sampling complexity to achieve linear convergence rate. Our analysis contributes to understanding the empirical successes of pseudo label-based semi-supervised learning. △ Less

Submitted 24 January, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

arXiv:2210.15285 [pdf, other]

SAN: a robust end-to-end ASR model architecture

Authors: Zeping Min, Qian Ge, Guanhua Huang

Abstract: In this paper, we propose a novel Siamese Adversarial Network (SAN) architecture for automatic speech recognition, which aims at solving the difficulty of fuzzy audio recognition. Specifically, SAN constructs two sub-networks to differentiate the audio feature input and then introduces a loss to unify the output distribution of these sub-networks. Adversarial learning enables the network to captur… ▽ More In this paper, we propose a novel Siamese Adversarial Network (SAN) architecture for automatic speech recognition, which aims at solving the difficulty of fuzzy audio recognition. Specifically, SAN constructs two sub-networks to differentiate the audio feature input and then introduces a loss to unify the output distribution of these sub-networks. Adversarial learning enables the network to capture more essential acoustic features and helps the models achieve better performance when encountering fuzzy audio input. We conduct numerical experiments with the SAN model on several datasets for the automatic speech recognition task. All experimental results show that the siamese adversarial nets significantly reduce the character error rate (CER). Specifically, we achieve a new state of art 4.37 CER without language model on the AISHELL-1 dataset, which leads to around 5% relative CER reduction. To reveal the generality of the siamese adversarial net, we also conduct experiments on the phoneme recognition task, which also shows the superiority of the siamese adversarial network. △ Less

Submitted 27 October, 2022; originally announced October 2022.

arXiv:2210.13067 [pdf, other]

10 hours data is all you need

Authors: Zeping Min, Qian Ge, Zhong Li

Abstract: We propose a novel procedure to generate pseudo mandarin speech data named as CAMP (character audio mix up), which aims at generating audio from a character scale. We also raise a method for building a mandarin character scale audio database adaptive to CAMP named as META-AUDIO, which makes full use of audio data and can greatly increase the data diversity of the database. Experiments show that ou… ▽ More We propose a novel procedure to generate pseudo mandarin speech data named as CAMP (character audio mix up), which aims at generating audio from a character scale. We also raise a method for building a mandarin character scale audio database adaptive to CAMP named as META-AUDIO, which makes full use of audio data and can greatly increase the data diversity of the database. Experiments show that our CAMP method is simple and quite effective. For example, we train models with 10 hours of audio data in AISHELL-1 and pseudo audio data generated by CAMP, and achieve a competitive 11.07 character error rate (CER). Besides, we also perform training with only 10 hours of audio data in AIDATATANG dataset and pseudo audio data generated by CAMP, which again achieves a competitive 8.26 CER. △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2002.11847 [pdf, other]

Echo State Neural Machine Translation

Authors: Ankush Garg, Yuan Cao, Qi Ge

Abstract: We present neural machine translation (NMT) models inspired by echo state network (ESN), named Echo State NMT (ESNMT), in which the encoder and decoder layer weights are randomly generated then fixed throughout training. We show that even with this extremely simple model construction and training procedure, ESNMT can already reach 70-80% quality of fully trainable baselines. We examine how spectra… ▽ More We present neural machine translation (NMT) models inspired by echo state network (ESN), named Echo State NMT (ESNMT), in which the encoder and decoder layer weights are randomly generated then fixed throughout training. We show that even with this extremely simple model construction and training procedure, ESNMT can already reach 70-80% quality of fully trainable baselines. We examine how spectral radius of the reservoir, a key quantity that characterizes the model, determines the model behavior. Our findings indicate that randomized networks can work well even for complicated sequence-to-sequence prediction NLP tasks. △ Less

Submitted 26 February, 2020; originally announced February 2020.

arXiv:1909.05402 [pdf, other]

doi 10.1109/TIV.2023.3255264

Relaxed Actor-Critic with Convergence Guarantees for Continuous-Time Optimal Control of Nonlinear Systems

Authors: Jingliang Duan, Jie Li, Qiang Ge, Shengbo Eben Li, Monimoy Bujarbaruah, Fei Ma, Dezhao Zhang

Abstract: This paper presents the Relaxed Continuous-Time Actor-critic (RCTAC) algorithm, a method for finding the nearly optimal policy for nonlinear continuous-time (CT) systems with known dynamics and infinite horizon, such as the path-tracking control of vehicles. RCTAC has several advantages over existing adaptive dynamic programming algorithms for CT systems. It does not require the ``admissibility" o… ▽ More This paper presents the Relaxed Continuous-Time Actor-critic (RCTAC) algorithm, a method for finding the nearly optimal policy for nonlinear continuous-time (CT) systems with known dynamics and infinite horizon, such as the path-tracking control of vehicles. RCTAC has several advantages over existing adaptive dynamic programming algorithms for CT systems. It does not require the ``admissibility" of the initialized policy or the input-affine nature of controlled systems for convergence. Instead, given any initial policy, RCTAC can converge to an admissible, and subsequently nearly optimal policy for a general nonlinear system with a saturated controller. RCTAC consists of two phases: a warm-up phase and a generalized policy iteration phase. The warm-up phase minimizes the square of the Hamiltonian to achieve admissibility, while the generalized policy iteration phase relaxes the update termination conditions for faster convergence. The convergence and optimality of the algorithm are proven through Lyapunov analysis, and its effectiveness is demonstrated through simulations and real-world path-tracking tasks. △ Less

Submitted 30 March, 2023; v1 submitted 11 September, 2019; originally announced September 2019.

Journal ref: IEEE Transactions on Intelligent Vehicles, 2023 (Early Access)

arXiv:1908.11200 [pdf, other]

A Concert-planning Tool for Independent Musicians by Machine Learning Models

Authors: Xiaohan Yang, Qingyin Ge

Abstract: Our project aims at helping independent musicians to plan their concerts based on the economies of agglomeration in the music industry. Initially, we planned to design an advisory tool for both concert pricing and location selection. Nonetheless, after implementing SGD linear regression and support vector regression models, we realized that concert price does not vary significantly according to di… ▽ More Our project aims at helping independent musicians to plan their concerts based on the economies of agglomeration in the music industry. Initially, we planned to design an advisory tool for both concert pricing and location selection. Nonetheless, after implementing SGD linear regression and support vector regression models, we realized that concert price does not vary significantly according to different music types, concert time, concert location and ticket venues. Therefore, to offer more useful suggestions, we focus on the location choice problem by turning it to a classification task. The overall performance of our classification model is pretty good. After tuning hyperparameters, we discovered the Random Forest gives the best performance, improving the classification result by 316%. This result reveals that we could help independent musicians better locate their concerts to where similar musicians would go, namely a place with higher network effects. △ Less

Submitted 29 August, 2019; originally announced August 2019.

arXiv:1902.08295 [pdf, other]

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Authors: Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob , et al. (66 additional authors not shown)

Abstract: Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly w… ▽ More Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly within the framework, and it contains existing implementations of a large number of utilities, helper functions, and the newest research ideas. Lingvo has been used in collaboration by dozens of researchers in more than 20 papers over the last two years. This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the framework. △ Less

Submitted 21 February, 2019; originally announced February 2019.

arXiv:1810.05345 [pdf, other]

Time Protection: the Missing OS Abstraction

Authors: Qian Ge, Yuval Yarom, Tom Chothia, Gernot Heiser

Abstract: Timing channels enable data leakage that threatens the security of computer systems, from cloud platforms to smartphones and browsers executing untrusted third-party code. Preventing unauthorised information flow is a core duty of the operating system, however, present OSes are unable to prevent timing channels. We argue that OSes must provide time protection in addition to the established memory… ▽ More Timing channels enable data leakage that threatens the security of computer systems, from cloud platforms to smartphones and browsers executing untrusted third-party code. Preventing unauthorised information flow is a core duty of the operating system, however, present OSes are unable to prevent timing channels. We argue that OSes must provide time protection in addition to the established memory protection. We examine the requirements of time protection, present a design and its implementation in the seL4 microkernel, and evaluate its efficacy as well as performance overhead on Arm and x86 processors. △ Less

Submitted 15 October, 2018; v1 submitted 11 October, 2018; originally announced October 2018.

arXiv:1612.04474 [pdf, other]

Your Processor Leaks Information - and There's Nothing You Can Do About It

Authors: Qian Ge, Yuval Yarom, Frank Li, Gernot Heiser

Abstract: Timing channels are information flows, encoded in the relative timing of events, that bypass the system's protection mechanisms. Any microarchitectural state that depends on execution history and affects the rate of progress of later executions potentially establishes a timing channel, unless explicit steps are taken to close it. Such state includes CPU caches, TLBs, branch predictors and prefetch… ▽ More Timing channels are information flows, encoded in the relative timing of events, that bypass the system's protection mechanisms. Any microarchitectural state that depends on execution history and affects the rate of progress of later executions potentially establishes a timing channel, unless explicit steps are taken to close it. Such state includes CPU caches, TLBs, branch predictors and prefetchers; removing the channels requires that the OS can partition such state or flush it on a switch of security domains. We measure the capacities of channels based on these microarchitectural features on several generations of processors across the two mainstream ISAs, x86 and ARM, and investigate the effectiveness of the flushing mechanisms provided by the respective ISA.We find that in all processors we studied, at least one significant channel remains. This implies that closing all timing channels seems impossible on contemporary mainstream processors. △ Less

Submitted 14 September, 2017; v1 submitted 13 December, 2016; originally announced December 2016.

arXiv:1312.3005 [pdf, ps, other]

One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling

Authors: Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, Tony Robinson

Abstract: We propose a new benchmark corpus to be used for measuring progress in statistical language modeling. With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques. We show performance of several well-known types of language models, with the… ▽ More We propose a new benchmark corpus to be used for measuring progress in statistical language modeling. With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques. We show performance of several well-known types of language models, with the best results achieved with a recurrent neural network based language model. The baseline unpruned Kneser-Ney 5-gram model achieves perplexity 67.6; a combination of techniques leads to 35% reduction in perplexity, or 10% reduction in cross-entropy (bits), over that baseline. The benchmark is available as a code.google.com project; besides the scripts needed to rebuild the training/held-out data, it also makes available log-probability values for each word in each of ten held-out data sets, for each of the baseline n-gram models. △ Less

Submitted 4 March, 2014; v1 submitted 10 December, 2013; originally announced December 2013.

Comments: Accompanied by a code.google.com project allowing anyone to generate the benchmark data, and use it to compare their language model against the ones described in the paper

arXiv:1105.5131 [pdf, ps, other]

Improved Inapproximability Results for Counting Independent Sets in the Hard-Core Model

Authors: Andreas Galanis, Qi Ge, Daniel Stefankovic, Eric Vigoda, Linji Yang

Abstract: We study the computational complexity of approximately counting the number of independent sets of a graph with maximum degree Delta. More generally, for an input graph G=(V,E) and an activity lambda>0, we are interested in the quantity Z_G(lambda) defined as the sum over independent sets I weighted as w(I) = lambda^|I|. In statistical physics, Z_G(lambda) is the partition function for the hard-c… ▽ More We study the computational complexity of approximately counting the number of independent sets of a graph with maximum degree Delta. More generally, for an input graph G=(V,E) and an activity lambda>0, we are interested in the quantity Z_G(lambda) defined as the sum over independent sets I weighted as w(I) = lambda^|I|. In statistical physics, Z_G(lambda) is the partition function for the hard-core model, which is an idealized model of a gas where the particles have non-negibile size. Recently, an interesting phase transition was shown to occur for the complexity of approximating the partition function. Weitz showed an FPAS for the partition function for any graph of maximum degree Delta when Delta is constant and lambda< lambda_c(Tree_Delta):=(Delta-1)^(Delta-1)/(Delta-2)^Delta. The quantity lambda_c(Tree_Delta) is the critical point for the so-called uniqueness threshold on the infinite, regular tree of degree Delta. On the other side, Sly proved that there does not exist efficient (randomized) approximation algorithms for lambda_c(Tree_Delta) < lambda < lambda_c(Tree_Delta)+epsilon(Delta), unless NP=RP, for some function epsilon(Delta)>0. We remove the upper bound in the assumptions of Sly's result for Delta not equal to 4 and 5, that is, we show that there does not exist efficient randomized approximation algorithms for all lambda>lambda_c(Tree_Delta) for Delta=3 and Delta>= 6. Sly's inapproximability result uses a clever reduction, combined with a second-moment analysis of Mossel, Weitz and Wormald which prove torpid mixing of the Glauber dynamics for sampling from the associated Gibbs distribution on almost every regular graph of degree Delta for the same range of lambda as in Sly's result. We extend Sly's result by improving upon the technical work of Mossel et al., via a more detailed analysis of independent sets in random regular graphs. △ Less

Submitted 11 December, 2012; v1 submitted 25 May, 2011; originally announced May 2011.

Comments: to appear in Random Structures and Algorithms

ACM Class: F.2.2; G.3

arXiv:1009.5019 [pdf, ps, other]

The Complexity of Counting Eulerian Tours in 4-Regular Graphs

Authors: Qi Ge, Daniel Stefankovic

Abstract: We investigate the complexity of counting Eulerian tours ({\sc #ET}) and its variations from two perspectives---the complexity of exact counting and the complexity w.r.t. approximation-preserving reductions (AP-reductions \cite{MR2044886}). We prove that {\sc #ET} is #P-complete even for planar 4-regular graphs. A closely related problem is that of counting A-trails ({\sc #A-trails}) in graphs w… ▽ More We investigate the complexity of counting Eulerian tours ({\sc #ET}) and its variations from two perspectives---the complexity of exact counting and the complexity w.r.t. approximation-preserving reductions (AP-reductions \cite{MR2044886}). We prove that {\sc #ET} is #P-complete even for planar 4-regular graphs. A closely related problem is that of counting A-trails ({\sc #A-trails}) in graphs with rotational embedding schemes (so called maps). Kotzig \cite{MR0248043} showed that {\sc #A-trails} can be computed in polynomial time for 4-regular plane graphs (embedding in the plane is equivalent to giving a rotational embedding scheme). We show that for 4-regular maps the problem is #P-hard. Moreover, we show that from the approximation viewpoint {\sc #A-trails} in 4-regular maps captures the essence of {\sc #ET}, that is, we give an AP-reduction from {\sc #ET} in general graphs to {\sc #A-trails} in 4-regular maps. The reduction uses a fast mixing result for a card shuffling problem \cite{MR2023023}. In order to understand whether #{\sc A-trails} in 4-regular maps can AP-reduce to #{\sc ET} in 4-regular graphs, we investigate a problem in which transitions in vertices are weighted (this generalizes both #{\sc A-trails} and #{\sc ET}). In the 4-regular case we show that {\sc A-trails} can be used to simulate any vertex weights and provide evidence that {\sc ET} can simulate only a limited set of vertex weights. △ Less

Submitted 25 September, 2010; originally announced September 2010.

arXiv:0911.4732 [pdf, ps, other]

A graph polynomial for independent sets of bipartite graphs

Authors: Qi Ge, Daniel Stefankovic

Abstract: We introduce a new graph polynomial that encodes interesting properties of graphs, for example, the number of matchings and the number of perfect matchings. Most importantly, for bipartite graphs the polynomial encodes the number of independent sets (#BIS). We analyze the complexity of exact evaluation of the polynomial at rational points and show that for most points exact evaluation is #P-ha… ▽ More We introduce a new graph polynomial that encodes interesting properties of graphs, for example, the number of matchings and the number of perfect matchings. Most importantly, for bipartite graphs the polynomial encodes the number of independent sets (#BIS). We analyze the complexity of exact evaluation of the polynomial at rational points and show that for most points exact evaluation is #P-hard (assuming the generalized Riemann hypothesis) and for the rest of the points exact evaluation is trivial. We conjecture that a natural Markov chain can be used to approximately evaluate the polynomial for a range of parameters. The conjecture, if true, would imply an approximate counting algorithm for #BIS, a problem shown, by [Dyer et al. 2004], to be complete (with respect to, so called, AP-reductions) for a rich logically defined sub-class of #P. We give a mild support for our conjecture by proving that the Markov chain is rapidly mixing on trees. As a by-product we show that the "single bond flip" Markov chain for the random cluster model is rapidly mixing on constant tree-width graphs. △ Less

Submitted 10 February, 2010; v1 submitted 24 November, 2009; originally announced November 2009.

Showing 1–22 of 22 results for author: Ge, Q