subscribe to arXiv mailings

Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers

Authors: Chao Lou, Zixia Jia, Zilong Zheng, Kewei Tu

Abstract: Accommodating long sequences efficiently in autoregressive Transformers, especially within an extended context window, poses significant challenges due to the quadratic computational complexity and substantial KV memory requirements inherent in self-attention mechanisms. In this work, we introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome these computational and me… ▽ More Accommodating long sequences efficiently in autoregressive Transformers, especially within an extended context window, poses significant challenges due to the quadratic computational complexity and substantial KV memory requirements inherent in self-attention mechanisms. In this work, we introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome these computational and memory obstacles while maintaining performance. Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query, thereby enabling gradient-based optimization. As a result, SPARSEK Attention offers linear time complexity and constant memory footprint during generation. Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods and provides significant speed improvements during both training and inference, particularly in language modeling and downstream tasks. Furthermore, our method can be seamlessly integrated into pre-trained Large Language Models (LLMs) with minimal fine-tuning, offering a practical solution for effectively managing long-range dependencies in diverse applications. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: preprint

arXiv:2310.17670 [pdf, ps, other]

Unknown Health States Recognition With Collective Decision Based Deep Learning Networks In Predictive Maintenance Applications

Authors: Chuyue Lou, M. Amine Atoui

Abstract: At present, decision making solutions developed based on deep learning (DL) models have received extensive attention in predictive maintenance (PM) applications along with the rapid improvement of computing power. Relying on the superior properties of shared weights and spatial pooling, Convolutional Neural Network (CNN) can learn effective representations of health states from industrial data. Ma… ▽ More At present, decision making solutions developed based on deep learning (DL) models have received extensive attention in predictive maintenance (PM) applications along with the rapid improvement of computing power. Relying on the superior properties of shared weights and spatial pooling, Convolutional Neural Network (CNN) can learn effective representations of health states from industrial data. Many developed CNN-based schemes, such as advanced CNNs that introduce residual learning and multi-scale learning, have shown good performance in health state recognition tasks under the assumption that all the classes are known. However, these schemes have no ability to deal with new abnormal samples that belong to state classes not part of the training set. In this paper, a collective decision framework for different CNNs is proposed. It is based on a One-vs-Rest network (OVRN) to simultaneously achieve classification of known and unknown health states. OVRN learn state-specific discriminative features and enhance the ability to reject new abnormal samples incorporated to different CNNs. According to the validation results on the public dataset of Tennessee Eastman Process (TEP), the proposed CNN-based decision schemes incorporating OVRN have outstanding recognition ability for samples of unknown heath states, while maintaining satisfactory accuracy on known states. The results show that the new DL framework outperforms conventional CNNs, and the one based on residual and multi-scale learning has the best overall performance. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.11964 [pdf, other]

AMR Parsing with Causal Hierarchical Attention and Pointers

Authors: Chao Lou, Kewei Tu

Abstract: Translation-based AMR parsers have recently gained popularity due to their simplicity and effectiveness. They predict linearized graphs as free texts, avoiding explicit structure modeling. However, this simplicity neglects structural locality in AMR graphs and introduces unnecessary tokens to represent coreferences. In this paper, we introduce new target forms of AMR parsing and a novel model, CHA… ▽ More Translation-based AMR parsers have recently gained popularity due to their simplicity and effectiveness. They predict linearized graphs as free texts, avoiding explicit structure modeling. However, this simplicity neglects structural locality in AMR graphs and introduces unnecessary tokens to represent coreferences. In this paper, we introduce new target forms of AMR parsing and a novel model, CHAP, which is equipped with causal hierarchical attention and the pointer mechanism, enabling the integration of structures into the Transformer decoder. We empirically explore various alternative modeling options. Experiments show that our model outperforms baseline models on four out of five benchmarks in the setting of no additional data. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: EMNLP 2023

arXiv:2308.10529 [pdf, other]

SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding

Authors: Tianyu Yu, Chengyue Jiang, Chao Lou, Shen Huang, Xiaobin Wang, Wei Liu, Jiong Cai, Yangning Li, Yinghui Li, Kewei Tu, Hai-Tao Zheng, Ningyu Zhang, Pengjun Xie, Fei Huang, Yong Jiang

Abstract: Large language models (LLMs) have shown impressive ability for open-domain NLP tasks. However, LLMs are sometimes too footloose for natural language understanding (NLU) tasks which always have restricted output and input format. Their performances on NLU tasks are highly related to prompts or demonstrations and are shown to be poor at performing several representative NLU tasks, such as event extr… ▽ More Large language models (LLMs) have shown impressive ability for open-domain NLP tasks. However, LLMs are sometimes too footloose for natural language understanding (NLU) tasks which always have restricted output and input format. Their performances on NLU tasks are highly related to prompts or demonstrations and are shown to be poor at performing several representative NLU tasks, such as event extraction and entity typing. To this end, we present SeqGPT, a bilingual (i.e., English and Chinese) open-source autoregressive model specially enhanced for open-domain natural language understanding. We express all NLU tasks with two atomic tasks, which define fixed instructions to restrict the input and output format but still ``open'' for arbitrarily varied label sets. The model is first instruction-tuned with extremely fine-grained labeled data synthesized by ChatGPT and then further fine-tuned by 233 different atomic tasks from 152 datasets across various domains. The experimental results show that SeqGPT has decent classification and extraction ability, and is capable of performing language understanding tasks on unseen domains. We also conduct empirical studies on the scaling of data and model size as well as on the transfer across tasks. Our model is accessible at https://github.com/Alibaba-NLP/SeqGPT. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: Initial version of SeqGPT

arXiv:2306.02671 [pdf, other]

Improving Grammar-based Sequence-to-Sequence Modeling with Decomposition and Constraints

Authors: Chao Lou, Kewei Tu

Abstract: Neural QCFG is a grammar-based sequence-tosequence (seq2seq) model with strong inductive biases on hierarchical structures. It excels in interpretability and generalization but suffers from expensive inference. In this paper, we study two low-rank variants of Neural QCFG for faster inference with different trade-offs between efficiency and expressiveness. Furthermore, utilizing the symbolic interf… ▽ More Neural QCFG is a grammar-based sequence-tosequence (seq2seq) model with strong inductive biases on hierarchical structures. It excels in interpretability and generalization but suffers from expensive inference. In this paper, we study two low-rank variants of Neural QCFG for faster inference with different trade-offs between efficiency and expressiveness. Furthermore, utilizing the symbolic interface provided by the grammar, we introduce two soft constraints over tree hierarchy and source coverage. We experiment with various datasets and find that our models outperform vanilla Neural QCFG in most settings. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: ACL 2023

arXiv:2304.03285 [pdf, other]

$\text{DC}^2$: Dual-Camera Defocus Control by Learning to Refocus

Authors: Hadi Alzayer, Abdullah Abuolaim, Leung Chun Chan, Yang Yang, Ying Chen Lou, Jia-Bin Huang, Abhishek Kar

Abstract: Smartphone cameras today are increasingly approaching the versatility and quality of professional cameras through a combination of hardware and software advancements. However, fixed aperture remains a key limitation, preventing users from controlling the depth of field (DoF) of captured images. At the same time, many smartphones now have multiple cameras with different fixed apertures -- specifica… ▽ More Smartphone cameras today are increasingly approaching the versatility and quality of professional cameras through a combination of hardware and software advancements. However, fixed aperture remains a key limitation, preventing users from controlling the depth of field (DoF) of captured images. At the same time, many smartphones now have multiple cameras with different fixed apertures -- specifically, an ultra-wide camera with wider field of view and deeper DoF and a higher resolution primary camera with shallower DoF. In this work, we propose $\text{DC}^2$, a system for defocus control for synthetically varying camera aperture, focus distance and arbitrary defocus effects by fusing information from such a dual-camera system. Our key insight is to leverage real-world smartphone camera dataset by using image refocus as a proxy task for learning to control defocus. Quantitative and qualitative evaluations on real-world data demonstrate our system's efficacy where we outperform state-of-the-art on defocus deblurring, bokeh rendering, and image refocus. Finally, we demonstrate creative post-capture defocus control enabled by our method, including tilt-shift and content-based defocus effects. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Comments: CVPR 2023. See the project page at https://defocus-control.github.io

arXiv:2206.04685 [pdf, other]

Predictive Exit: Prediction of Fine-Grained Early Exits for Computation- and Energy-Efficient Inference

Authors: Xiangjie Li, Chenfei Lou, Zhengping Zhu, Yuchi Chen, Yingtao Shen, Yehan Ma, An Zou

Abstract: By adding exiting layers to the deep learning networks, early exit can terminate the inference earlier with accurate results. The passive decision-making of whether to exit or continue the next layer has to go through every pre-placed exiting layer until it exits. In addition, it is also hard to adjust the configurations of the computing platforms alongside the inference proceeds. By incorporating… ▽ More By adding exiting layers to the deep learning networks, early exit can terminate the inference earlier with accurate results. The passive decision-making of whether to exit or continue the next layer has to go through every pre-placed exiting layer until it exits. In addition, it is also hard to adjust the configurations of the computing platforms alongside the inference proceeds. By incorporating a low-cost prediction engine, we propose a Predictive Exit framework for computation- and energy-efficient deep learning applications. Predictive Exit can forecast where the network will exit (i.e., establish the number of remaining layers to finish the inference), which effectively reduces the network computation cost by exiting on time without running every pre-placed exiting layer. Moreover, according to the number of remaining layers, proper computing configurations (i.e., frequency and voltage) are selected to execute the network to further save energy. Extensive experimental results demonstrate that Predictive Exit achieves up to 96.2% computation reduction and 72.9% energy-saving compared with classic deep learning networks; and 12.8% computation reduction and 37.6% energy-saving compared with the early exit under state-of-the-art exiting strategies, given the same inference accuracy and latency. △ Less

Submitted 28 December, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

arXiv:2203.14260 [pdf, other]

Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships

Authors: Chao Lou, Wenjuan Han, Yuhuan Lin, Zilong Zheng

Abstract: Understanding realistic visual scene images together with language descriptions is a fundamental task towards generic visual understanding. Previous works have shown compelling comprehensive results by building hierarchical structures for visual scenes (e.g., scene graphs) and natural languages (e.g., dependency trees), individually. However, how to construct a joint vision-language (VL) structure… ▽ More Understanding realistic visual scene images together with language descriptions is a fundamental task towards generic visual understanding. Previous works have shown compelling comprehensive results by building hierarchical structures for visual scenes (e.g., scene graphs) and natural languages (e.g., dependency trees), individually. However, how to construct a joint vision-language (VL) structure has barely been investigated. More challenging but worthwhile, we introduce a new task that targets on inducing such a joint VL structure in an unsupervised manner. Our goal is to bridge the visual scene graphs and linguistic dependency trees seamlessly. Due to the lack of VL structural data, we start by building a new dataset VLParse. Rather than using labor-intensive labeling from scratch, we propose an automatic alignment procedure to produce coarse structures followed by human refinement to produce high-quality ones. Moreover, we benchmark our dataset by proposing a contrastive learning (CL)-based framework VLGAE, short for Vision-Language Graph Autoencoder. Our model obtains superior performance on two derived tasks, i.e., language grammar induction and VL phrase grounding. Ablations show the effectiveness of both visual cues and dependency relationships on fine-grained VL structure construction. △ Less

Submitted 1 June, 2022; v1 submitted 27 March, 2022; originally announced March 2022.

Comments: Updated

arXiv:2203.04665 [pdf, other]

Nested Named Entity Recognition as Latent Lexicalized Constituency Parsing

Authors: Chao Lou, Songlin Yang, Kewei Tu

Abstract: Nested named entity recognition (NER) has been receiving increasing attention. Recently, (Fu et al, 2021) adapt a span-based constituency parser to tackle nested NER. They treat nested entities as partially-observed constituency trees and propose the masked inside algorithm for partial marginalization. However, their method cannot leverage entity heads, which have been shown useful in entity menti… ▽ More Nested named entity recognition (NER) has been receiving increasing attention. Recently, (Fu et al, 2021) adapt a span-based constituency parser to tackle nested NER. They treat nested entities as partially-observed constituency trees and propose the masked inside algorithm for partial marginalization. However, their method cannot leverage entity heads, which have been shown useful in entity mention detection and entity typing. In this work, we resort to more expressive structures, lexicalized constituency trees in which constituents are annotated by headwords, to model nested entities. We leverage the Eisner-Satta algorithm to perform partial marginalization and inference efficiently. In addition, we propose to use (1) a two-stage strategy (2) a head regularization loss and (3) a head-aware labeling loss in order to enhance the performance. We make a thorough ablation study to investigate the functionality of each component. Experimentally, our method achieves the state-of-the-art performance on ACE2004, ACE2005 and NNE, and competitive performance on GENIA, and meanwhile has a fast inference speed. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: ACL 2022 camera ready

arXiv:2106.03562 [pdf]

Robotic Electrospinning Actuated by Non-Circular Joint Continuum Manipulator for Endoluminal Therapy

Authors: Zicong Wu, Chuqian Lou, Zhu Jin, Shaoping Huang, Ning Liu, Yun Zou, Mirko Kovac, Anzhu Gao, Guang-Zhong Yang

Abstract: Electrospinning has exhibited excellent benefits to treat the trauma for tissue engineering due to its produced micro/nano fibrous structure. It can effectively adhere to the tissue surface for long-term continuous therapy. This paper develops a robotic electrospinning platform for endoluminal therapy. The platform consists of a continuum manipulator, the electrospinning device, and the actuation… ▽ More Electrospinning has exhibited excellent benefits to treat the trauma for tissue engineering due to its produced micro/nano fibrous structure. It can effectively adhere to the tissue surface for long-term continuous therapy. This paper develops a robotic electrospinning platform for endoluminal therapy. The platform consists of a continuum manipulator, the electrospinning device, and the actuation unit. The continuum manipulator has two bending sections to facilitate the steering of the tip needle for a controllable spinning direction. Non-circular joint profile is carefully designed to enable a constant length of the centreline of a continuum manipulator for stable fluid transmission inside it. Experiments are performed on a bronchus phantom, and the steering ability and bending limitation in each direction are also investigated. The endoluminal electrospinning is also fulfilled by a trajectory following and points targeting experiments. The effective adhesive area of the produced fibre is also illustrated. The proposed robotic electrospinning shows its feasibility to precisely spread more therapeutic drug to construct fibrous structure for potential endoluminal treatment. △ Less

Submitted 7 June, 2021; originally announced June 2021.

arXiv:1905.05652 [pdf]

"Tom" pet robot applied to urban autism

Authors: Xingqian Li, Chenwei Lou, Jian Zhao, HuaPeng Wei, Hongwei Zhao

Abstract: With the fast development of network information technology, more and more people are immersed in the virtual community environment brought by the network, ignoring the social interaction in real life. The consequent urban autism problem has become more and more serious. Promoting offline communication between people " and "eliminating loneliness through emotional communication between pet robots… ▽ More With the fast development of network information technology, more and more people are immersed in the virtual community environment brought by the network, ignoring the social interaction in real life. The consequent urban autism problem has become more and more serious. Promoting offline communication between people " and "eliminating loneliness through emotional communication between pet robots and breeders" to solve this problem, and has developed a design called "Tom". "Tom" is a smart pet robot with a pet robot-based social mechanism Called "Tom-Talker". The main contribution of this paper is to propose a social mechanism called "Tom-Talker" that encourages users to socialize offline. And "Tom-Talker" also has a corresponding reward mechanism and a friend recommendation algorithm. It also proposes a pet robot named "Tom" with an emotional interaction algorithm to recognize users' emotions, simulate animal emotions and communicate emotionally with use s. This paper designs experiments and analyzes the results. The results show that our pet robots have a good effect on solving urban autism problems. △ Less

Submitted 14 May, 2019; originally announced May 2019.

arXiv:1506.02990 [pdf, other]

Convolutional-Code-Specific CRC Code Design

Authors: Chung-Yu Lou, Babak Daneshrad, Richard D. Wesel

Abstract: Cyclic redundancy check (CRC) codes check if a codeword is correctly received. This paper presents an algorithm to design CRC codes that are optimized for the code-specific error behavior of a specified feedforward convolutional code. The algorithm utilizes two distinct approaches to computing undetected error probability of a CRC code used with a specific convolutional code. The first approach en… ▽ More Cyclic redundancy check (CRC) codes check if a codeword is correctly received. This paper presents an algorithm to design CRC codes that are optimized for the code-specific error behavior of a specified feedforward convolutional code. The algorithm utilizes two distinct approaches to computing undetected error probability of a CRC code used with a specific convolutional code. The first approach enumerates the error patterns of the convolutional code and tests if each of them is detectable. The second approach reduces complexity significantly by exploiting the equivalence of the undetected error probability to the frame error rate of an equivalent catastrophic convolutional code. The error events of the equivalent convolutional code are exactly the undetectable errors for the original concatenation of CRC and convolutional codes. This simplifies the computation because error patterns do not need to be individually checked for detectability. As an example, we optimize CRC codes for a commonly used 64-state convolutional code for information length k=1024 demonstrating significant reduction in undetected error probability compared to the existing CRC codes with the same degrees. For a fixed target undetected error probability, the optimized CRC codes typically require 2 fewer bits. △ Less

Submitted 9 June, 2015; originally announced June 2015.

Comments: 12 pages, 8 figures, journal paper

arXiv:1410.2904 [pdf, other]

Optimizing Pilot Length for a Go/No-Go Decision in Two-State Block Fading Channels with Feedback

Authors: Chung-Yu Lou, Babak Daneshrad, Richard D. Wesel

Abstract: We propose an approach where each user independently seeks to minimize the amount of time that they occupy the channel. Essentially, we seek to minimize the number of transmitted symbols required to communicate a packet assuming variable-length coding with feedback. Users send a pilot sequence to estimate the channel quality and decide whether to proceed with a transmission or wait for the next op… ▽ More We propose an approach where each user independently seeks to minimize the amount of time that they occupy the channel. Essentially, we seek to minimize the number of transmitted symbols required to communicate a packet assuming variable-length coding with feedback. Users send a pilot sequence to estimate the channel quality and decide whether to proceed with a transmission or wait for the next opportunity. Thus a user may choose to leave the channel even though it has already gained access, in order to increase the network throughput and also save its own energy resources. This paper optimizes the number of pilots and the channel identification threshold to minimize the total number of transmitted symbols (including pilots) required to communicate the packet. We prove a sufficient condition for the optimal pilot length and the channel identification threshold. This optimal parameter pair is solved numerically and the reduction in channel occupancy is shown for various channel settings. △ Less

Submitted 10 October, 2014; originally announced October 2014.

Comments: 6 pages, 3 figures, conference

arXiv:1210.8191 [pdf, other]

doi 10.1109/WCL.2013.012513.120824

Performance Indicator for MIMO MMSE Receivers in the Presence of Channel Estimation Error

Authors: Eren Eraslan, Babak Daneshrad, Chung-Yu Lou

Abstract: We present the derivation of post-processing SNR for Minimum-Mean-Squared-Error (MMSE) receivers with imperfect channel estimates, and show that it is an accurate indicator of the error rate performance of MIMO systems in the presence of channel estimation error. Simulation results show the tightness of the analysis. We present the derivation of post-processing SNR for Minimum-Mean-Squared-Error (MMSE) receivers with imperfect channel estimates, and show that it is an accurate indicator of the error rate performance of MIMO systems in the presence of channel estimation error. Simulation results show the tightness of the analysis. △ Less

Submitted 13 November, 2012; v1 submitted 30 October, 2012; originally announced October 2012.

Comments: 4 pages, 3 figures. Submitted to IEEE Wireless Communications Letters

Showing 1–14 of 14 results for author: Lou, C