subscribe to arXiv mailings

A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models

Authors: Xiujie Song, Mengyue Wu, Kenny Q. Zhu, Chunhao Zhang, Yanyi Chen

Abstract: Large Vision-Language Models (LVLMs), despite their recent success, are hardly comprehensively tested for their cognitive abilities. Inspired by the prevalent use of the "Cookie Theft" task in human cognition test, we propose a novel evaluation benchmark to evaluate high-level cognitive ability of LVLMs using images with rich semantics. It defines eight reasoning capabilities and consists of an im… ▽ More Large Vision-Language Models (LVLMs), despite their recent success, are hardly comprehensively tested for their cognitive abilities. Inspired by the prevalent use of the "Cookie Theft" task in human cognition test, we propose a novel evaluation benchmark to evaluate high-level cognitive ability of LVLMs using images with rich semantics. It defines eight reasoning capabilities and consists of an image description task and a visual question answering task. Our evaluation on well-known LVLMs shows that there is still a large gap in cognitive ability between LVLMs and humans. △ Less

Submitted 14 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.15985 [pdf, other]

Phonetic and Lexical Discovery of a Canine Language using HuBERT

Authors: Xingyuan Li, Sinong Wang, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu

Abstract: This paper delves into the pioneering exploration of potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers, which heavily relies on human priori knowledge on limited datasets to find sound units in dog vocalization. We present a self-supervised approach with HuBERT, enabling the accurate classification of phoneme labels and the identifica… ▽ More This paper delves into the pioneering exploration of potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers, which heavily relies on human priori knowledge on limited datasets to find sound units in dog vocalization. We present a self-supervised approach with HuBERT, enabling the accurate classification of phoneme labels and the identification of vocal patterns that suggest a rudimentary vocabulary within dog vocalizations. Our findings indicate a significant acoustic consistency in these identified canine vocabulary, covering the entirety of observed dog vocalization sequences. We further develop a web-based dog vocalization labeling system. This system can highlight phoneme n-grams, present in the vocabulary, in the dog audio uploaded by users. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.06262 [pdf, other]

On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference

Authors: Siyu Ren, Kenny Q. Zhu

Abstract: Despite the recent success associated with Large Language Models (LLMs), they are notably cost-prohibitive to deploy in resource-constrained environments due to their excessive memory and computational demands. In addition to model parameters, the key-value cache is also stored in GPU memory, growing linearly with batch size and sequence length. As a remedy, recent works have proposed various evic… ▽ More Despite the recent success associated with Large Language Models (LLMs), they are notably cost-prohibitive to deploy in resource-constrained environments due to their excessive memory and computational demands. In addition to model parameters, the key-value cache is also stored in GPU memory, growing linearly with batch size and sequence length. As a remedy, recent works have proposed various eviction policies for maintaining the overhead of key-value cache under a given budget. This paper embarks on the efficacy of existing eviction policies in terms of importance score calculation and eviction scope construction. We identify the deficiency of prior policies in these two aspects and introduce RoCo, a robust cache omission policy based on temporal attention scores and robustness measures. Extensive experimentation spanning prefilling and auto-regressive decoding stages validates the superiority of RoCo. Finally, we release EasyKV, a versatile software package dedicated to user-friendly key-value constrained generative inference. Code available at https://github.com/DRSY/EasyKV. △ Less

Submitted 17 February, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

arXiv:2311.09189 [pdf, other]

PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large Language Models

Authors: Haoan Jin, Siyuan Chen, Dilawaier Dilixiati, Yewei Jiang, Mengyue Wu, Kenny Q. Zhu

Abstract: Evaluating Large Language Models (LLMs) in the mental health domain poses distinct challenged from other domains, given the subtle and highly subjective nature of symptoms that exhibit significant variability among individuals. This paper presents PsyEval, the first comprehensive suite of mental health-related tasks for evaluating LLMs. PsyEval encompasses five sub-tasks that evaluate three critic… ▽ More Evaluating Large Language Models (LLMs) in the mental health domain poses distinct challenged from other domains, given the subtle and highly subjective nature of symptoms that exhibit significant variability among individuals. This paper presents PsyEval, the first comprehensive suite of mental health-related tasks for evaluating LLMs. PsyEval encompasses five sub-tasks that evaluate three critical dimensions of mental health. This comprehensive framework is designed to thoroughly assess the unique challenges and intricacies of mental health-related tasks, making PsyEval a highly specialized and valuable tool for evaluating LLM performance in this domain. We evaluate twelve advanced LLMs using PsyEval. Experiment results not only demonstrate significant room for improvement in current LLMs concerning mental health but also unveil potential directions for future model optimization. △ Less

Submitted 3 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.11648 [pdf, other]

Zero-shot Faithfulness Evaluation for Text Summarization with Foundation Language Model

Authors: Qi Jia, Siyu Ren, Yizhu Liu, Kenny Q. Zhu

Abstract: Despite tremendous improvements in natural language generation, summarization models still suffer from the unfaithfulness issue. Previous work evaluates faithfulness either using models trained on the other tasks or in-domain synthetic data, or prompting a large model such as ChatGPT. This paper proposes to do zero-shot faithfulness evaluation simply with a moderately-sized foundation language mod… ▽ More Despite tremendous improvements in natural language generation, summarization models still suffer from the unfaithfulness issue. Previous work evaluates faithfulness either using models trained on the other tasks or in-domain synthetic data, or prompting a large model such as ChatGPT. This paper proposes to do zero-shot faithfulness evaluation simply with a moderately-sized foundation language model. We introduce a new metric FFLM, which is a combination of probability changes based on the intuition that prefixing a piece of text that is consistent with the output will increase the probability of predicting the output. Experiments show that FFLM performs competitively with or even outperforms ChatGPT on both inconsistency detection and faithfulness rating with 24x fewer parameters. FFLM also achieves improvements over other strong baselines. △ Less

Submitted 14 December, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: Accepted by EMNLP2023

arXiv:2310.08152 [pdf, other]

Context Compression for Auto-regressive Transformers with Sentinel Tokens

Authors: Siyu Ren, Qi Jia, Kenny Q. Zhu

Abstract: The quadratic complexity of the attention module makes it gradually become the bulk of compute in Transformer-based LLMs during generation. Moreover, the excessive key-value cache that arises when dealing with long inputs also brings severe issues on memory footprint and inference latency. In this work, we propose a plug-and-play approach that is able to incrementally compress the intermediate act… ▽ More The quadratic complexity of the attention module makes it gradually become the bulk of compute in Transformer-based LLMs during generation. Moreover, the excessive key-value cache that arises when dealing with long inputs also brings severe issues on memory footprint and inference latency. In this work, we propose a plug-and-play approach that is able to incrementally compress the intermediate activation of a specified span of tokens into compact ones, thereby reducing both memory and computational cost when processing subsequent context. Experiments on both in-domain language modeling and zero-shot open-ended document generation demonstrate the advantage of our approach over sparse attention baselines in terms of fluency, n-gram matching, and semantic similarity. At last, we comprehensively profile the benefit of context compression on improving the system throughout. Code is available at https://github.com/DRSY/KV_Compression. △ Less

Submitted 15 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

Comments: To appear at EMNLP 2023 main conference

arXiv:2310.04691 [pdf, other]

EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling

Authors: Siyu Ren, Zhiyong Wu, Kenny Q. Zhu

Abstract: Neural language models are probabilistic models of human text. They are predominantly trained using maximum likelihood estimation (MLE), which is equivalent to minimizing the forward cross-entropy between the empirical data distribution and the model distribution. However, various degeneration phenomena are still widely observed when decoding from the distributions learned by such models. We estab… ▽ More Neural language models are probabilistic models of human text. They are predominantly trained using maximum likelihood estimation (MLE), which is equivalent to minimizing the forward cross-entropy between the empirical data distribution and the model distribution. However, various degeneration phenomena are still widely observed when decoding from the distributions learned by such models. We establish that the forward cross-entropy is suboptimal as a distance metric for aligning human and model distribution due to its (1) recall-prioritization (2) negative diversity ignorance and (3) train-test mismatch. In this paper, we propose Earth Mover Distance Optimization (EMO) for auto-regressive language modeling. EMO capitalizes on the inherent properties of earth mover distance to address the aforementioned challenges. Due to the high complexity of direct computation, we further introduce a feasible upper bound for EMO to ease end-to-end training. Upon extensive evaluation of language models trained using EMO and MLE. We find that EMO demonstrates a consistently better language modeling performance than MLE across domains. Moreover, EMO demonstrates noteworthy enhancements in downstream performance with minimal fine-tuning on merely 25,000 sentences. This highlights the tremendous potential of EMO as a lightweight calibration method for enhancing large-scale pre-trained language models. △ Less

Submitted 1 February, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

Comments: To appear at ICLR 2024

arXiv:2306.14152 [pdf, other]

Low-Rank Prune-And-Factorize for Language Model Compression

Authors: Siyu Ren, Kenny Q. Zhu

Abstract: The components underpinning PLMs -- large weight matrices -- were shown to bear considerable redundancy. Matrix factorization, a well-established technique from matrix theory, has been utilized to reduce the number of parameters in PLM. However, it fails to retain satisfactory performance under moderate to high compression rate. In this paper, we identify the \textit{full-rankness} of fine-tuned P… ▽ More The components underpinning PLMs -- large weight matrices -- were shown to bear considerable redundancy. Matrix factorization, a well-established technique from matrix theory, has been utilized to reduce the number of parameters in PLM. However, it fails to retain satisfactory performance under moderate to high compression rate. In this paper, we identify the \textit{full-rankness} of fine-tuned PLM as the fundamental bottleneck for the failure of matrix factorization and explore the use of network pruning to extract low-rank sparsity pattern desirable to matrix factorization. We find such low-rank sparsity pattern exclusively exists in models generated by first-order pruning, which motivates us to unite the two approaches and achieve more effective model compression. We further propose two techniques: sparsity-aware SVD and mixed-rank fine-tuning, which improve the initialization and training of the compression procedure, respectively. Experiments on GLUE and question-answering tasks show that the proposed method has superior compression-performance trade-off compared to existing approaches. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: Model Compression

arXiv:2305.13833 [pdf, other]

Reducing Sensitivity on Speaker Names for Text Generation from Dialogues

Authors: Qi Jia, Haifeng Tang, Kenny Q. Zhu

Abstract: Changing speaker names consistently throughout a dialogue should not affect its meaning and corresponding outputs for text generation from dialogues. However, pre-trained language models, serving as the backbone for dialogue-processing tasks, have shown to be sensitive to nuances. This may result in unfairness in real-world applications. No comprehensive analysis of this problem has been done in t… ▽ More Changing speaker names consistently throughout a dialogue should not affect its meaning and corresponding outputs for text generation from dialogues. However, pre-trained language models, serving as the backbone for dialogue-processing tasks, have shown to be sensitive to nuances. This may result in unfairness in real-world applications. No comprehensive analysis of this problem has been done in the past. In this work, we propose to quantitatively measure a model's sensitivity on speaker names, and comprehensively evaluate a number of known methods for reducing speaker name sensitivity, including a novel approach of our own. Extensive experiments on multiple datasets provide a benchmark for this problem and show the favorable performance of our approach in sensitivity reduction and quality of generation. △ Less

Submitted 20 August, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: findings of ACL'23

arXiv:2305.13614 [pdf, other]

LLM-empowered Chatbots for Psychiatrist and Patient Simulation: Application and Evaluation

Authors: Siyuan Chen, Mengyue Wu, Kenny Q. Zhu, Kunyao Lan, Zhiling Zhang, Lyuchun Cui

Abstract: Empowering chatbots in the field of mental health is receiving increasing amount of attention, while there still lacks exploration in developing and evaluating chatbots in psychiatric outpatient scenarios. In this work, we focus on exploring the potential of ChatGPT in powering chatbots for psychiatrist and patient simulation. We collaborate with psychiatrists to identify objectives and iterativel… ▽ More Empowering chatbots in the field of mental health is receiving increasing amount of attention, while there still lacks exploration in developing and evaluating chatbots in psychiatric outpatient scenarios. In this work, we focus on exploring the potential of ChatGPT in powering chatbots for psychiatrist and patient simulation. We collaborate with psychiatrists to identify objectives and iteratively develop the dialogue system to closely align with real-world scenarios. In the evaluation experiments, we recruit real psychiatrists and patients to engage in diagnostic conversations with the chatbots, collecting their ratings for assessment. Our findings demonstrate the feasibility of using ChatGPT-powered chatbots in psychiatric scenarios and explore the impact of prompt designs on chatbot behavior and user experience. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.12394 [pdf, other]

Pruning Pre-trained Language Models with Principled Importance and Self-regularization

Authors: Siyu Ren, Kenny Q. Zhu

Abstract: Iterative pruning is one of the most effective compression methods for pre-trained language models. We discovered that finding the optimal pruning decision is an equality-constrained 0-1 Integer Linear Programming problem. The solution to this optimization problem leads to a principled importance criterion which we use to rank parameters during iterative model pruning. To mitigate the poor general… ▽ More Iterative pruning is one of the most effective compression methods for pre-trained language models. We discovered that finding the optimal pruning decision is an equality-constrained 0-1 Integer Linear Programming problem. The solution to this optimization problem leads to a principled importance criterion which we use to rank parameters during iterative model pruning. To mitigate the poor generalization at high sparsity levels, we propose a self-regularization scheme where model prediction is regularized by the latest checkpoint with increasing sparsity throughout pruning. Our experiments on natural language understanding, question-answering, named entity recognition, and data-to-text generation with various Transformer-based PLMs show the effectiveness of the approach at various sparsity levels. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: Accepted at Findings of ACL 2023

arXiv:2305.02820 [pdf, other]

Semantic Space Grounded Weighted Decoding for Multi-Attribute Controllable Dialogue Generation

Authors: Zhiling Zhang, Mengyue Wu, Kenny Q. Zhu

Abstract: Controlling chatbot utterance generation with multiple attributes such as personalities, emotions and dialogue acts is a practically useful but under-studied problem. We propose a novel framework called DASC that possesses strong controllability with a weighted decoding paradigm, while improving generation quality with the grounding in an attribute semantics space. Generation with multiple attribu… ▽ More Controlling chatbot utterance generation with multiple attributes such as personalities, emotions and dialogue acts is a practically useful but under-studied problem. We propose a novel framework called DASC that possesses strong controllability with a weighted decoding paradigm, while improving generation quality with the grounding in an attribute semantics space. Generation with multiple attributes is then intuitively implemented with an interpolation of multiple attribute embeddings, which results in substantial reduction in the model sizes. Experiments show that DASC can achieve high control accuracy in generation task with the simultaneous control of 3 aspects while also producing interesting and reasonably sensible responses, even in an out-of-distribution robustness test. △ Less

Submitted 5 November, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

Comments: EMNLP 2023

arXiv:2303.07902 [pdf, other]

BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data

Authors: Xuenan Xu, Zhiling Zhang, Zelin Zhou, Pingyue Zhang, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu

Abstract: Compared with ample visual-text pre-training research, few works explore audio-text pre-training, mostly due to the lack of sufficient parallel audio-text data. Most existing methods incorporate the visual modality as a pivot for audio-text pre-training, which inevitably induces data noise. In this paper, we propose to utilize audio captioning to generate text directly from audio, without the aid… ▽ More Compared with ample visual-text pre-training research, few works explore audio-text pre-training, mostly due to the lack of sufficient parallel audio-text data. Most existing methods incorporate the visual modality as a pivot for audio-text pre-training, which inevitably induces data noise. In this paper, we propose to utilize audio captioning to generate text directly from audio, without the aid of the visual modality so that potential noise from modality mismatch is eliminated. Furthermore, we propose caption generation under the guidance of AudioSet tags, leading to more accurate captions. With the above two improvements, we curate high-quality, large-scale parallel audio-text data, based on which we perform audio-text pre-training. We comprehensively demonstrate the performance of the pre-trained model on a series of downstream audio-related tasks, including single-modality tasks like audio classification and tagging, as well as cross-modal tasks consisting of audio-text retrieval and audio-based text generation. Experimental results indicate that our approach achieves state-of-the-art zero-shot classification performance on most datasets, suggesting the effectiveness of our synthetic data. The audio encoder also serves as an efficient pattern recognition model by fine-tuning it on audio-related tasks. Synthetic data and pre-trained models are available online. △ Less

Submitted 5 March, 2024; v1 submitted 14 March, 2023; originally announced March 2023.

arXiv:2211.11297 [pdf, other]

In-sample Curriculum Learning by Sequence Completion for Natural Language Generation

Authors: Qi Jia, Yizhu Liu, Haifeng Tang, Kenny Q. Zhu

Abstract: Curriculum learning has shown promising improvements in multiple domains by training machine learning models from easy samples to hard ones. Previous works which either design rules or train models for scoring the difficulty highly rely on task-specific expertise, and cannot generalize. Inspired by the "easy-to-hard" intuition, we propose to do in-sample curriculum learning for natural language ge… ▽ More Curriculum learning has shown promising improvements in multiple domains by training machine learning models from easy samples to hard ones. Previous works which either design rules or train models for scoring the difficulty highly rely on task-specific expertise, and cannot generalize. Inspired by the "easy-to-hard" intuition, we propose to do in-sample curriculum learning for natural language generation tasks. Our learning strategy starts training the model to generate the last few words, i.e., do sequence completion, and gradually extends to generate the whole output sequence. Comprehensive experiments show that it generalizes well to different tasks and achieves significant improvements over strong baselines. △ Less

Submitted 23 May, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: ACL'23

arXiv:2210.09894 [pdf, other]

Taxonomy of Abstractive Dialogue Summarization: Scenarios, Approaches and Future Directions

Authors: Qi Jia, Yizhu Liu, Siyu Ren, Kenny Q. Zhu

Abstract: Abstractive dialogue summarization is to generate a concise and fluent summary covering the salient information in a dialogue among two or more interlocutors. It has attracted great attention in recent years based on the massive emergence of social communication platforms and an urgent requirement for efficient dialogue information understanding and digestion. Different from news or articles in tr… ▽ More Abstractive dialogue summarization is to generate a concise and fluent summary covering the salient information in a dialogue among two or more interlocutors. It has attracted great attention in recent years based on the massive emergence of social communication platforms and an urgent requirement for efficient dialogue information understanding and digestion. Different from news or articles in traditional document summarization, dialogues bring unique characteristics and additional challenges, including different language styles and formats, scattered information, flexible discourse structures and unclear topic boundaries. This survey provides a comprehensive investigation on existing work for abstractive dialogue summarization from scenarios, approaches to evaluations. It categorizes the task into two broad categories according to the type of input dialogues, i.e., open-domain and task-oriented, and presents a taxonomy of existing techniques in three directions, namely, injecting dialogue features, designing auxiliary training tasks and using additional data.A list of datasets under different scenarios and widely-accepted evaluation metrics are summarized for completeness. After that, the trends of scenarios and techniques are summarized, together with deep insights on correlations between extensively exploited features and different scenarios. Based on these analyses, we recommend future directions including more controlled and complicated scenarios, technical innovations and comparisons, publicly available datasets in special domains, etc. △ Less

Submitted 6 August, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: Under review at ACM Computing Surveys (CSUR), submitted in January 2022

arXiv:2205.11308 [pdf, other]

Symptom Identification for Interpretable Detection of Multiple Mental Disorders

Authors: Zhiling Zhang, Siyuan Chen, Mengyue Wu, Kenny Q. Zhu

Abstract: Mental disease detection (MDD) from social media has suffered from poor generalizability and interpretability, due to lack of symptom modeling. This paper introduces PsySym, the first annotated symptom identification corpus of multiple psychiatric disorders, to facilitate further research progress. PsySym is annotated according to a knowledge graph of the 38 symptom classes related to 7 mental dis… ▽ More Mental disease detection (MDD) from social media has suffered from poor generalizability and interpretability, due to lack of symptom modeling. This paper introduces PsySym, the first annotated symptom identification corpus of multiple psychiatric disorders, to facilitate further research progress. PsySym is annotated according to a knowledge graph of the 38 symptom classes related to 7 mental diseases complied from established clinical manuals and scales, and a novel annotation framework for diversity and quality. Experiments show that symptom-assisted MDD enabled by PsySym can outperform strong pure-text baselines. We also exhibit the convincing MDD explanations provided by symptom predictions with case studies, and point to their further potential applications. △ Less

Submitted 23 May, 2022; originally announced May 2022.

arXiv:2205.10593 [pdf, other]

Few-Shot Natural Language Inference Generation with PDD: Prompt and Dynamic Demonstration

Authors: Kaijian Li, Shansan Gong, Kenny Q. Zhu

Abstract: Natural Language Inference Generation task is to generate a text hypothesis given a text premise and a logical relation between the two. This task can be used in data augmentation and controllable text generation in practice. In this paper, we propose language models with prompt and dynamic demonstration (LM-PDD) to tackle this problem in few-shot settings. Our framework outperforms standard fine-… ▽ More Natural Language Inference Generation task is to generate a text hypothesis given a text premise and a logical relation between the two. This task can be used in data augmentation and controllable text generation in practice. In this paper, we propose language models with prompt and dynamic demonstration (LM-PDD) to tackle this problem in few-shot settings. Our framework outperforms standard fine-tuned models with low resource, achieving an average 8% absolute improvement on SNLI and MNLI datasets, and the results on 13 natural language classification tasks also show that our dynamic demonstration method has good generalizability. △ Less

Submitted 21 May, 2022; originally announced May 2022.

Comments: 13 pages

arXiv:2205.09497 [pdf, other]

Psychiatric Scale Guided Risky Post Screening for Early Detection of Depression

Authors: Zhiling Zhang, Siyuan Chen, Mengyue Wu, Kenny Q. Zhu

Abstract: Depression is a prominent health challenge to the world, and early risk detection (ERD) of depression from online posts can be a promising technique for combating the threat. Early depression detection faces the challenge of efficiently tackling streaming data, balancing the tradeoff between timeliness, accuracy and explainability. To tackle these challenges, we propose a psychiatric scale guided… ▽ More Depression is a prominent health challenge to the world, and early risk detection (ERD) of depression from online posts can be a promising technique for combating the threat. Early depression detection faces the challenge of efficiently tackling streaming data, balancing the tradeoff between timeliness, accuracy and explainability. To tackle these challenges, we propose a psychiatric scale guided risky post screening method that can capture risky posts related to the dimensions defined in clinical depression scales, and providing interpretable diagnostic basis. A Hierarchical Attentional Network equipped with BERT (HAN-BERT) is proposed to further advance explainable predictions. For ERD, we propose an online algorithm based on an evolving queue of risky posts that can significantly reduce the number of model inferences to boost efficiency. Experiments show that our method outperforms the competitive feature-based and neural models under conventional depression detection settings, and achieves simultaneous improvement in both efficacy and efficiency for ERD. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: IJCAI 2022 AI for Good

arXiv:2205.06058 [pdf, other]

doi 10.1145/3477495.3532040

Positive, Negative and Neutral: Modeling Implicit Feedback in Session-based News Recommendation

Authors: Shansan Gong, Kenny Q. Zhu

Abstract: News recommendation for anonymous readers is a useful but challenging task for many news portals, where interactions between readers and articles are limited within a temporary login session. Previous works tend to formulate session-based recommendation as a next item prediction task, while they neglect the implicit feedback from user behaviors, which indicates what users really like or dislike. H… ▽ More News recommendation for anonymous readers is a useful but challenging task for many news portals, where interactions between readers and articles are limited within a temporary login session. Previous works tend to formulate session-based recommendation as a next item prediction task, while they neglect the implicit feedback from user behaviors, which indicates what users really like or dislike. Hence, we propose a comprehensive framework to model user behaviors through positive feedback (i.e., the articles they spend more time on) and negative feedback (i.e., the articles they choose to skip without clicking in). Moreover, the framework implicitly models the user using their session start time, and the article using its initial publishing time, in what we call "neutral feedback". Empirical evaluation on three real-world news datasets shows the framework's promising performance of more accurate, diverse and even unexpectedness recommendations than other state-of-the-art session-based recommendation approaches. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: Accepted by SIGIR 2022 main conference

ACM Class: H.3.3

arXiv:2204.13913 [pdf, ps, other]

Leaner and Faster: Two-Stage Model Compression for Lightweight Text-Image Retrieval

Authors: Siyu Ren, Kenny Q. Zhu

Abstract: Current text-image approaches (e.g., CLIP) typically adopt dual-encoder architecture using pre-trained vision-language representation. However, these models still pose non-trivial memory requirements and substantial incremental indexing time, which makes them less practical on mobile devices. In this paper, we present an effective two-stage framework to compress large pre-trained dual-encoder for… ▽ More Current text-image approaches (e.g., CLIP) typically adopt dual-encoder architecture using pre-trained vision-language representation. However, these models still pose non-trivial memory requirements and substantial incremental indexing time, which makes them less practical on mobile devices. In this paper, we present an effective two-stage framework to compress large pre-trained dual-encoder for lightweight text-image retrieval. The resulting model is smaller (39% of the original), faster (1.6x/2.9x for processing image/text respectively), yet performs on par with or better than the original full model on Flickr30K and MSCOCO benchmarks. We also open-source an accompanying realistic mobile image search application. △ Less

Submitted 29 April, 2022; originally announced April 2022.

Comments: Accepted by NAACL 2022 main conference

arXiv:2204.13498 [pdf, other]

Post-Training Dialogue Summarization using Pseudo-Paraphrasing

Authors: Qi Jia, Yizhu Liu, Haifeng Tang, Kenny Q. Zhu

Abstract: Previous dialogue summarization techniques adapt large language models pretrained on the narrative text by injecting dialogue-specific features into the models. These features either require additional knowledge to recognize or make the resulting models harder to tune. To bridge the format gap between dialogues and narrative summaries in dialogue summarization tasks, we propose to post-train pretr… ▽ More Previous dialogue summarization techniques adapt large language models pretrained on the narrative text by injecting dialogue-specific features into the models. These features either require additional knowledge to recognize or make the resulting models harder to tune. To bridge the format gap between dialogues and narrative summaries in dialogue summarization tasks, we propose to post-train pretrained language models (PLMs) to rephrase from dialogue to narratives. After that, the model is fine-tuned for dialogue summarization as usual. Comprehensive experiments show that our approach significantly improves vanilla PLMs on dialogue summarization and outperforms other SOTA models by the summary quality and implementation costs. △ Less

Submitted 28 April, 2022; originally announced April 2022.

Comments: Findings of NAACL 2022

arXiv:2201.10376 [pdf, other]

Modeling Multi-level Context for Informational Bias Detection by Contrastive Learning and Sentential Graph Network

Authors: Shijia Guo, Kenny Q. Zhu

Abstract: Informational bias is widely present in news articles. It refers to providing one-sided, selective or suggestive information of specific aspects of certain entity to guide a specific interpretation, thereby biasing the reader's opinion. Sentence-level informational bias detection is a very challenging task in a way that such bias can only be revealed together with the context, examples include col… ▽ More Informational bias is widely present in news articles. It refers to providing one-sided, selective or suggestive information of specific aspects of certain entity to guide a specific interpretation, thereby biasing the reader's opinion. Sentence-level informational bias detection is a very challenging task in a way that such bias can only be revealed together with the context, examples include collecting information from various sources or analyzing the entire article in combination with the background. In this paper, we integrate three levels of context to detect the sentence-level informational bias in English news articles: adjacent sentences, whole article, and articles from other news outlets describing the same event. Our model, MultiCTX (Multi-level ConTeXt), uses contrastive learning and sentence graphs together with Graph Attention Network (GAT) to encode these three degrees of context at different stages by tactically composing contrastive triplets and constructing sentence graphs within events. Our experiments proved that contrastive learning together with sentence graphs effectively incorporates context in varying degrees and significantly outperforms the current SOTA model sentence-wise in informational bias detection. △ Less

Submitted 25 January, 2022; originally announced January 2022.

Comments: 10 pages including bibliography

arXiv:2110.04684 [pdf, other]

Can Audio Captions Be Evaluated with Image Caption Metrics?

Authors: Zelin Zhou, Zhiling Zhang, Xuenan Xu, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu

Abstract: Automated audio captioning aims at generating textual descriptions for an audio clip. To evaluate the quality of generated audio captions, previous works directly adopt image captioning metrics like SPICE and CIDEr, without justifying their suitability in this new domain, which may mislead the development of advanced models. This problem is still unstudied due to the lack of human judgment dataset… ▽ More Automated audio captioning aims at generating textual descriptions for an audio clip. To evaluate the quality of generated audio captions, previous works directly adopt image captioning metrics like SPICE and CIDEr, without justifying their suitability in this new domain, which may mislead the development of advanced models. This problem is still unstudied due to the lack of human judgment datasets on caption quality. Therefore, we firstly construct two evaluation benchmarks, AudioCaps-Eval and Clotho-Eval. They are established with pairwise comparison instead of absolute rating to achieve better inter-annotator agreement. Current metrics are found in poor correlation with human annotations on these datasets. To overcome their limitations, we propose a metric named FENSE, where we combine the strength of Sentence-BERT in capturing similarity, and a novel Error Detector to penalize erroneous sentences for robustness. On the newly established benchmarks, FENSE outperforms current metrics by 14-25% accuracy. Code, data and web demo available at: https://github.com/blmoistawinde/fense △ Less

Submitted 27 January, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

Comments: ICASSP 2022

arXiv:2110.01009 [pdf, other]

doi 10.1145/3459637.3482097

Enriching Ontology with Temporal Commonsense for Low-Resource Audio Tagging

Authors: Zhiling Zhang, Zelin Zhou, Haifeng Tang, Guangwei Li, Mengyue Wu, Kenny Q. Zhu

Abstract: Audio tagging aims at predicting sound events occurred in a recording. Traditional models require enormous laborious annotations, otherwise performance degeneration will be the norm. Therefore, we investigate robust audio tagging models in low-resource scenarios with the enhancement of knowledge graphs. Besides existing ontological knowledge, we further propose a semi-automatic approach that can c… ▽ More Audio tagging aims at predicting sound events occurred in a recording. Traditional models require enormous laborious annotations, otherwise performance degeneration will be the norm. Therefore, we investigate robust audio tagging models in low-resource scenarios with the enhancement of knowledge graphs. Besides existing ontological knowledge, we further propose a semi-automatic approach that can construct temporal knowledge graphs on diverse domain-specific label sets. Moreover, we leverage a variant of relation-aware graph neural network, D-GCN, to combine the strength of the two knowledge types. Experiments on AudioSet and SONYC urban sound tagging datasets suggest the effectiveness of the introduced temporal knowledge, and the advantage of the combined KGs with D-GCN over single knowledge source. △ Less

Submitted 3 October, 2021; originally announced October 2021.

Comments: CIKM 2021

arXiv:2104.10317 [pdf, other]

doi 10.1145/3442381.3449876

Diverse and Specific Clarification Question Generation with Keywords

Authors: Zhiling Zhang, Kenny Q. Zhu

Abstract: Product descriptions on e-commerce websites often suffer from missing important aspects. Clarification question generation (CQGen) can be a promising approach to help alleviate the problem. Unlike traditional QGen assuming the existence of answers in the context and generating questions accordingly, CQGen mimics user behaviors of asking for unstated information. The generated CQs can serve as a sa… ▽ More Product descriptions on e-commerce websites often suffer from missing important aspects. Clarification question generation (CQGen) can be a promising approach to help alleviate the problem. Unlike traditional QGen assuming the existence of answers in the context and generating questions accordingly, CQGen mimics user behaviors of asking for unstated information. The generated CQs can serve as a sanity check or proofreading to help e-commerce merchant to identify potential missing information before advertising their product, and improve consumer experience consequently. Due to the variety of possible user backgrounds and use cases, the information need can be quite diverse but also specific to a detailed topic, while previous works assume generating one CQ per context and the results tend to be generic. We thus propose the task of Diverse CQGen and also tackle the challenge of specificity. We propose a new model named KPCNet, which generates CQs with Keyword Prediction and Conditioning, to deal with the tasks. Automatic and human evaluation on 2 datasets (Home & Kitchen, Office) showed that KPCNet can generate more specific questions and promote better group-level diversity than several competing baselines. △ Less

Submitted 20 April, 2021; originally announced April 2021.

Comments: 11 pages, 3 figures, WWW 2021

arXiv:2102.04632 [pdf, ps, other]

Statistically Profiling Biases in Natural Language Reasoning Datasets and Models

Authors: Shanshan Huang, Kenny Q. Zhu

Abstract: Recent work has indicated that many natural language understanding and reasoning datasets contain statistical cues that may be taken advantaged of by NLP models whose capability may thus be grossly overestimated. To discover the potential weakness in the models, some human-designed stress tests have been proposed but they are expensive to create and do not generalize to arbitrary models. We propos… ▽ More Recent work has indicated that many natural language understanding and reasoning datasets contain statistical cues that may be taken advantaged of by NLP models whose capability may thus be grossly overestimated. To discover the potential weakness in the models, some human-designed stress tests have been proposed but they are expensive to create and do not generalize to arbitrary models. We propose a light-weight and general statistical profiling framework, ICQ (I-See-Cue), which automatically identifies possible biases in any multiple-choice NLU datasets without the need to create any additional test cases, and further evaluates through blackbox testing the extent to which models may exploit these biases. △ Less

Submitted 8 February, 2021; originally announced February 2021.

arXiv:2012.02553 [pdf, other]

DDRel: A New Dataset for Interpersonal Relation Classification in Dyadic Dialogues

Authors: Qi Jia, Hongru Huang, Kenny Q. Zhu

Abstract: Interpersonal language style shifting in dialogues is an interesting and almost instinctive ability of human. Understanding interpersonal relationship from language content is also a crucial step toward further understanding dialogues. Previous work mainly focuses on relation extraction between named entities in texts. In this paper, we propose the task of relation classification of interlocutors… ▽ More Interpersonal language style shifting in dialogues is an interesting and almost instinctive ability of human. Understanding interpersonal relationship from language content is also a crucial step toward further understanding dialogues. Previous work mainly focuses on relation extraction between named entities in texts. In this paper, we propose the task of relation classification of interlocutors based on their dialogues. We crawled movie scripts from IMSDb, and annotated the relation labels for each session according to 13 pre-defined relationships. The annotated dataset DDRel consists of 6300 dyadic dialogue sessions between 694 pair of speakers with 53,126 utterances in total. We also construct session-level and pair-level relation classification tasks with widely-accepted baselines. The experimental results show that this task is challenging for existing models and the dataset will be useful for future research. △ Less

Submitted 4 December, 2020; originally announced December 2020.

Comments: This paper has been accepted by AAAI2021

arXiv:2010.01502 [pdf, other]

Multi-turn Response Selection using Dialogue Dependency Relations

Authors: Qi Jia, Yizhu Liu, Siyu Ren, Kenny Q. Zhu, Haifeng Tang

Abstract: Multi-turn response selection is a task designed for developing dialogue agents. The performance on this task has a remarkable improvement with pre-trained language models. However, these models simply concatenate the turns in dialogue history as the input and largely ignore the dependencies between the turns. In this paper, we propose a dialogue extraction algorithm to transform a dialogue histor… ▽ More Multi-turn response selection is a task designed for developing dialogue agents. The performance on this task has a remarkable improvement with pre-trained language models. However, these models simply concatenate the turns in dialogue history as the input and largely ignore the dependencies between the turns. In this paper, we propose a dialogue extraction algorithm to transform a dialogue history into threads based on their dependency relations. Each thread can be regarded as a self-contained sub-dialogue. We also propose Thread-Encoder model to encode threads and candidates into compact representations by pre-trained Transformers and finally get the matching score through an attention layer. The experiments show that dependency relations are helpful for dialogue context understanding, and our model outperforms the state-of-the-art baselines on both DSTC7 and DSTC8*, with competitive results on UbuntuV2. △ Less

Submitted 30 November, 2023; v1 submitted 4 October, 2020; originally announced October 2020.

Comments: Accepted for publication as a long paper in EMNLP2020

Journal ref: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

arXiv:2005.09276 [pdf, other]

doi 10.3233/FAIA200326

Matching Questions and Answers in Dialogues from Online Forums

Authors: Qi Jia, Mengxue Zhang, Shengyao Zhang, Kenny Q. Zhu

Abstract: Matching question-answer relations between two turns in conversations is not only the first step in analyzing dialogue structures, but also valuable for training dialogue systems. This paper presents a QA matching model considering both distance information and dialogue history by two simultaneous attention mechanisms called mutual attention. Given scores computed by the trained model between each… ▽ More Matching question-answer relations between two turns in conversations is not only the first step in analyzing dialogue structures, but also valuable for training dialogue systems. This paper presents a QA matching model considering both distance information and dialogue history by two simultaneous attention mechanisms called mutual attention. Given scores computed by the trained model between each non-question turn with its candidate questions, a greedy matching strategy is used for final predictions. Because existing dialogue datasets such as the Ubuntu dataset are not suitable for the QA matching task, we further create a dataset with 1,000 labeled dialogues and demonstrate that our proposed model outperforms the state-of-the-art and other strong baselines, particularly for matching long-distance QA pairs. △ Less

Submitted 2 August, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

Comments: Accepted at ECAI2020

Journal ref: ECAI 2020: 24th European Conference on Artificial Intelligence

arXiv:2004.14164 [pdf, other]

doi 10.1145/3340531.3411858

MICK: A Meta-Learning Framework for Few-shot Relation Classification with Small Training Data

Authors: Xiaoqing Geng, Xiwen Chen, Kenny Q. Zhu, Libin Shen, Yinggong Zhao

Abstract: Few-shot relation classification seeks to classify incoming query instances after meeting only few support instances. This ability is gained by training with large amount of in-domain annotated data. In this paper, we tackle an even harder problem by further limiting the amount of data available at training time. We propose a few-shot learning framework for relation classification, which is partic… ▽ More Few-shot relation classification seeks to classify incoming query instances after meeting only few support instances. This ability is gained by training with large amount of in-domain annotated data. In this paper, we tackle an even harder problem by further limiting the amount of data available at training time. We propose a few-shot learning framework for relation classification, which is particularly powerful when the training data is very small. In this framework, models not only strive to classify query instances, but also seek underlying knowledge about the support instances to obtain better instance representations. The framework also includes a method for aggregating cross-domain knowledge into models by open-source task enrichment. Additionally, we construct a brand new dataset: the TinyRel-CM dataset, a few-shot relation classification dataset in health domain with purposely small training data and challenging relation classes. Experimental results demonstrate that our framework brings performance gains for most underlying classification models, outperforms the state-of-the-art results given small training data, and achieves competitive results with sufficiently large training data. △ Less

Submitted 14 December, 2020; v1 submitted 26 April, 2020; originally announced April 2020.

Journal ref: CIKM 2020: The 29th ACM International Conference on Information and Knowledge Management

arXiv:2004.11742 [pdf, other]

ST$^2$: Small-data Text Style Transfer via Multi-task Meta-Learning

Authors: Xiwen Chen, Kenny Q. Zhu

Abstract: Text style transfer aims to paraphrase a sentence in one style into another style while preserving content. Due to lack of parallel training data, state-of-art methods are unsupervised and rely on large datasets that share content. Furthermore, existing methods have been applied on very limited categories of styles such as positive/negative and formal/informal. In this work, we develop a meta-lear… ▽ More Text style transfer aims to paraphrase a sentence in one style into another style while preserving content. Due to lack of parallel training data, state-of-art methods are unsupervised and rely on large datasets that share content. Furthermore, existing methods have been applied on very limited categories of styles such as positive/negative and formal/informal. In this work, we develop a meta-learning framework to transfer between any kind of text styles, including personal writing styles that are more fine-grained, share less content and have much smaller training data. While state-of-art models fail in the few-shot style transfer task, our framework effectively utilizes information from other styles to improve both language fluency and style transfer accuracy. △ Less

Submitted 24 April, 2020; originally announced April 2020.

Comments: 9 pages, 11 figures

arXiv:2004.09853 [pdf, ps, other]

Knowledge-Driven Distractor Generation for Cloze-style Multiple Choice Questions

Authors: Siyu Ren, Kenny Q. Zhu

Abstract: In this paper, we propose a novel configurable framework to automatically generate distractive choices for open-domain cloze-style multiple-choice questions, which incorporates a general-purpose knowledge base to effectively create a small distractor candidate set, and a feature-rich learning-to-rank model to select distractors that are both plausible and reliable. Experimental results on datasets… ▽ More In this paper, we propose a novel configurable framework to automatically generate distractive choices for open-domain cloze-style multiple-choice questions, which incorporates a general-purpose knowledge base to effectively create a small distractor candidate set, and a feature-rich learning-to-rank model to select distractors that are both plausible and reliable. Experimental results on datasets across four domains show that our framework yields distractors that are more plausible and reliable than previous methods. This dataset can also be used as a benchmark for distractor generation in the future. △ Less

Submitted 7 December, 2020; v1 submitted 21 April, 2020; originally announced April 2020.

Comments: To appear at AAAI 2021

arXiv:2004.01358 [pdf, other]

Unpack Local Model Interpretation for GBDT

Authors: Wenjing Fang, Jun Zhou, Xiaolong Li, Kenny Q. Zhu

Abstract: A gradient boosting decision tree (GBDT), which aggregates a collection of single weak learners (i.e. decision trees), is widely used for data mining tasks. Because GBDT inherits the good performance from its ensemble essence, much attention has been drawn to the optimization of this model. With its popularization, an increasing need for model interpretation arises. Besides the commonly used featu… ▽ More A gradient boosting decision tree (GBDT), which aggregates a collection of single weak learners (i.e. decision trees), is widely used for data mining tasks. Because GBDT inherits the good performance from its ensemble essence, much attention has been drawn to the optimization of this model. With its popularization, an increasing need for model interpretation arises. Besides the commonly used feature importance as a global interpretation, feature contribution is a local measure that reveals the relationship between a specific instance and the related output. This work focuses on the local interpretation and proposes an unified computation mechanism to get the instance-level feature contributions for GBDT in any version. Practicality of this mechanism is validated by the listed experiments as well as applications in real industry scenarios. △ Less

Submitted 2 April, 2020; originally announced April 2020.

Comments: 12 pages, 5 figures

arXiv:2003.13230 [pdf, ps, other]

AliCoCo: Alibaba E-commerce Cognitive Concept Net

Authors: Xusheng Luo, Luxin Liu, Yonghua Yang, Le Bo, Yuanpeng Cao, Jinhang Wu, Qiang Li, Keping Yang, Kenny Q. Zhu

Abstract: One of the ultimate goals of e-commerce platforms is to satisfy various shopping needs for their customers. Much efforts are devoted to creating taxonomies or ontologies in e-commerce towards this goal. However, user needs in e-commerce are still not well defined, and none of the existing ontologies has the enough depth and breadth for universal user needs understanding. The semantic gap in-betwee… ▽ More One of the ultimate goals of e-commerce platforms is to satisfy various shopping needs for their customers. Much efforts are devoted to creating taxonomies or ontologies in e-commerce towards this goal. However, user needs in e-commerce are still not well defined, and none of the existing ontologies has the enough depth and breadth for universal user needs understanding. The semantic gap in-between prevents shopping experience from being more intelligent. In this paper, we propose to construct a large-scale e-commerce cognitive concept net named "AliCoCo", which is practiced in Alibaba, the largest Chinese e-commerce platform in the world. We formally define user needs in e-commerce, then conceptualize them as nodes in the net. We present details on how AliCoCo is constructed semi-automatically and its successful, ongoing and potential applications in e-commerce. △ Less

Submitted 30 March, 2020; originally announced March 2020.

Comments: 15 pages. Accepted by SIGMOD 2020 Industry Track

arXiv:2002.11982 [pdf, other]

Adapted tree boosting for Transfer Learning

Authors: Wenjing Fang, Chaochao Chen, Bowen Song, Li Wang, Jun Zhou, Kenny Q. Zhu

Abstract: Secure online transaction is an essential task for e-commerce platforms. Alipay, one of the world's leading cashless payment platform, provides the payment service to both merchants and individual customers. The fraud detection models are built to protect the customers, but stronger demands are raised by the new scenes, which are lacking in training data and labels. The proposed model makes a diff… ▽ More Secure online transaction is an essential task for e-commerce platforms. Alipay, one of the world's leading cashless payment platform, provides the payment service to both merchants and individual customers. The fraud detection models are built to protect the customers, but stronger demands are raised by the new scenes, which are lacking in training data and labels. The proposed model makes a difference by utilizing the data under similar old scenes and the data under a new scene is treated as the target domain to be promoted. Inspired by this real case in Alipay, we view the problem as a transfer learning problem and design a set of revise strategies to transfer the source domain models to the target domain under the framework of gradient boosting tree models. This work provides an option for the cold-starting and data-sharing problems. △ Less

Submitted 2 April, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

ACM Class: I.2.6

arXiv:1910.10893 [pdf, other]

Low-Resource Sequence Labeling via Unsupervised Multilingual Contextualized Representations

Authors: Zuyi Bao, Rui Huang, Chen Li, Kenny Q. Zhu

Abstract: Previous work on cross-lingual sequence labeling tasks either requires parallel data or bridges the two languages through word-byword matching. Such requirements and assumptions are infeasible for most languages, especially for languages with large linguistic distances, e.g., English and Chinese. In this work, we propose a Multilingual Language Model with deep semantic Alignment (MLMA) to generate… ▽ More Previous work on cross-lingual sequence labeling tasks either requires parallel data or bridges the two languages through word-byword matching. Such requirements and assumptions are infeasible for most languages, especially for languages with large linguistic distances, e.g., English and Chinese. In this work, we propose a Multilingual Language Model with deep semantic Alignment (MLMA) to generate language-independent representations for cross-lingual sequence labeling. Our methods require only monolingual corpora with no bilingual resources at all and take advantage of deep contextualized representations. Experimental results show that our approach achieves new state-of-the-art NER and POS performance across European languages, and is also effective on distant language pairs such as English and Chinese. △ Less

Submitted 23 October, 2019; originally announced October 2019.

Comments: Accepted at EMNLP 2019

arXiv:1910.03295 [pdf, ps, other]

Conceptualize and Infer User Needs in E-commerce

Authors: Xusheng Luo, Yonghua Yang, Kenny Q. Zhu, Yu Gong, Keping Yang

Abstract: Understanding latent user needs beneath shopping behaviors is critical to e-commercial applications. Without a proper definition of user needs in e-commerce, most industry solutions are not driven directly by user needs at current stage, which prevents them from further improving user satisfaction. Representing implicit user needs explicitly as nodes like "outdoor barbecue" or "keep warm for kids"… ▽ More Understanding latent user needs beneath shopping behaviors is critical to e-commercial applications. Without a proper definition of user needs in e-commerce, most industry solutions are not driven directly by user needs at current stage, which prevents them from further improving user satisfaction. Representing implicit user needs explicitly as nodes like "outdoor barbecue" or "keep warm for kids" in a knowledge graph, provides new imagination for various e- commerce applications. Backed by such an e-commerce knowledge graph, we propose a supervised learning algorithm to conceptualize user needs from their transaction history as "concept" nodes in the graph and infer those concepts for each user through a deep attentive model. Offline experiments demonstrate the effectiveness and stability of our model, and online industry strength tests show substantial advantages of such user needs understanding. △ Less

Submitted 8 October, 2019; originally announced October 2019.

Comments: 9 pages, 6 figures. Accepted by CIKM 2019 Applied Research Track

arXiv:1905.07089 [pdf, other]

Exact-K Recommendation via Maximal Clique Optimization

Authors: Yu Gong, Yu Zhu, Lu Duan, Qingwen Liu, Ziyu Guan, Fei Sun, Wenwu Ou, Kenny Q. Zhu

Abstract: This paper targets to a novel but practical recommendation problem named exact-K recommendation. It is different from traditional top-K recommendation, as it focuses more on (constrained) combinatorial optimization which will optimize to recommend a whole set of K items called card, rather than ranking optimization which assumes that "better" items should be put into top positions. Thus we take th… ▽ More This paper targets to a novel but practical recommendation problem named exact-K recommendation. It is different from traditional top-K recommendation, as it focuses more on (constrained) combinatorial optimization which will optimize to recommend a whole set of K items called card, rather than ranking optimization which assumes that "better" items should be put into top positions. Thus we take the first step to give a formal problem definition, and innovatively reduce it to Maximum Clique Optimization based on graph. To tackle this specific combinatorial optimization problem which is NP-hard, we propose Graph Attention Networks (GAttN) with a Multi-head Self-attention encoder and a decoder with attention mechanism. It can end-to-end learn the joint distribution of the K items and generate an optimal card rather than rank individual items by prediction scores. Then we propose Reinforcement Learning from Demonstrations (RLfD) which combines the advantages in behavior cloning and reinforcement learning, making it sufficient- and-efficient to train the model. Extensive experiments on three datasets demonstrate the effectiveness of our proposed GAttN with RLfD method, it outperforms several strong baselines with a relative improvement of 7.7% and 4.7% on average in Precision and Hit Ratio respectively, and achieves state-of-the-art (SOTA) performance for the exact-K recommendation problem. △ Less

Submitted 16 May, 2019; originally announced May 2019.

Comments: SIGKDD 2019

arXiv:1808.06880 [pdf, other]

Automatic Generation of Text Descriptive Comments for Code Blocks

Authors: Yuding Liang, Kenny Q. Zhu

Abstract: We propose a framework to automatically generate descriptive comments for source code blocks. While this problem has been studied by many researchers previously, their methods are mostly based on fixed template and achieves poor results. Our framework does not rely on any template, but makes use of a new recursive neural network called Code-RNN to extract features from the source code and embed th… ▽ More We propose a framework to automatically generate descriptive comments for source code blocks. While this problem has been studied by many researchers previously, their methods are mostly based on fixed template and achieves poor results. Our framework does not rely on any template, but makes use of a new recursive neural network called Code-RNN to extract features from the source code and embed them into one vector. When this vector representation is input to a new recurrent neural network (Code-GRU), the overall framework generates text descriptions of the code with accuracy (Rouge-2 value) significantly higher than other learning-based approaches such as sequence-to-sequence model. The Code-RNN model can also be used in other scenario where the representation of code is required. △ Less

Submitted 21 August, 2018; originally announced August 2018.

Comments: aaai 2018

arXiv:1803.11359 [pdf, other]

Automatic Generation of Chinese Short Product Titles for Mobile Display

Authors: Yu Gong, Xusheng Luo, Kenny Q. Zhu, Wenwu Ou, Zhao Li, Lu Duan

Abstract: This paper studies the problem of automatically extracting a short title from a manually written longer description of E-commerce products for display on mobile devices. It is a new extractive summarization problem on short text inputs, for which we propose a feature-enriched network model, combining three different categories of features in parallel. Experimental results show that our framework s… ▽ More This paper studies the problem of automatically extracting a short title from a manually written longer description of E-commerce products for display on mobile devices. It is a new extractive summarization problem on short text inputs, for which we propose a feature-enriched network model, combining three different categories of features in parallel. Experimental results show that our framework significantly outperforms several baselines by a substantial gain of 4.5%. Moreover, we produce an extractive summarization dataset for E-commerce short texts and will release it to the research community. △ Less

Submitted 6 May, 2019; v1 submitted 30 March, 2018; originally announced March 2018.

Comments: IAAI 2019

arXiv:1803.11326 [pdf, other]

Deep Cascade Multi-task Learning for Slot Filling in Online Shopping Assistant

Authors: Yu Gong, Xusheng Luo, Yu Zhu, Wenwu Ou, Zhao Li, Muhua Zhu, Kenny Q. Zhu, Lu Duan, Xi Chen

Abstract: Slot filling is a critical task in natural language understanding (NLU) for dialog systems. State-of-the-art approaches treat it as a sequence labeling problem and adopt such models as BiLSTM-CRF. While these models work relatively well on standard benchmark datasets, they face challenges in the context of E-commerce where the slot labels are more informative and carry richer expressions. In this… ▽ More Slot filling is a critical task in natural language understanding (NLU) for dialog systems. State-of-the-art approaches treat it as a sequence labeling problem and adopt such models as BiLSTM-CRF. While these models work relatively well on standard benchmark datasets, they face challenges in the context of E-commerce where the slot labels are more informative and carry richer expressions. In this work, inspired by the unique structure of E-commerce knowledge base, we propose a novel multi-task model with cascade and residual connections, which jointly learns segment tagging, named entity tagging and slot filling. Experiments show the effectiveness of the proposed cascade and residual structures. Our model has a 14.6% advantage in F1 score over the strong baseline methods on a new Chinese E-commerce shopping assistant dataset, while achieving competitive accuracies on a standard dataset. Furthermore, online test deployed on such dominant E-commerce platform shows 130% improvement on accuracy of understanding user utterances. Our model has already gone into production in the E-commerce platform. △ Less

Submitted 6 May, 2019; v1 submitted 29 March, 2018; originally announced March 2018.

Comments: AAAI 2019

arXiv:1803.00729 [pdf, other]

Representing Verbs as Argument Concepts

Authors: Yu Gong, Kaiqi Zhao, Kenny Q. Zhu

Abstract: Verbs play an important role in the understanding of natural language text. This paper studies the problem of abstracting the subject and object arguments of a verb into a set of noun concepts, known as the "argument concepts". This set of concepts, whose size is parameterized, represents the fine-grained semantics of a verb. For example, the object of "enjoy" can be abstracted into time, hobby an… ▽ More Verbs play an important role in the understanding of natural language text. This paper studies the problem of abstracting the subject and object arguments of a verb into a set of noun concepts, known as the "argument concepts". This set of concepts, whose size is parameterized, represents the fine-grained semantics of a verb. For example, the object of "enjoy" can be abstracted into time, hobby and event, etc. We present a novel framework to automatically infer human readable and machine computable action concepts with high accuracy. △ Less

Submitted 2 March, 2018; originally announced March 2018.

Comments: 7 pages, 2 figures, AAAI 2016

arXiv:1711.04204 [pdf, other]

Automatic Extraction of Commonsense LocatedNear Knowledge

Authors: Frank F. Xu, Bill Yuchen Lin, Kenny Q. Zhu

Abstract: LocatedNear relation is a kind of commonsense knowledge describing two physical objects that are typically found near each other in real life. In this paper, we study how to automatically extract such relationship through a sentence-level relation classifier and aggregating the scores of entity pairs from a large corpus. Also, we release two benchmark datasets for evaluation and future research. LocatedNear relation is a kind of commonsense knowledge describing two physical objects that are typically found near each other in real life. In this paper, we study how to automatically extract such relationship through a sentence-level relation classifier and aggregating the scores of entity pairs from a large corpus. Also, we release two benchmark datasets for evaluation and future research. △ Less

Submitted 12 May, 2018; v1 submitted 11 November, 2017; originally announced November 2017.

Comments: Accepted by ACL 2018. A preliminary version is presented on AKBC@NIPS'17

arXiv:1707.02047 [pdf, ps, other]

InferSpark: Statistical Inference at Scale

Authors: Zhuoyue Zhao, Jialing Pei, Eric Lo, Kenny Q. Zhu, Chris Liu

Abstract: The Apache Spark stack has enabled fast large-scale data processing. Despite a rich library of statistical models and inference algorithms, it does not give domain users the ability to develop their own models. The emergence of probabilistic programming languages has showed the promise of developing sophisticated probabilistic models in a succinct and programmatic way. These frameworks have the po… ▽ More The Apache Spark stack has enabled fast large-scale data processing. Despite a rich library of statistical models and inference algorithms, it does not give domain users the ability to develop their own models. The emergence of probabilistic programming languages has showed the promise of developing sophisticated probabilistic models in a succinct and programmatic way. These frameworks have the potential of automatically generating inference algorithms for the user defined models and answering various statistical queries about the model. It is a perfect time to unite these two great directions to produce a programmable big data analysis framework. We thus propose, InferSpark, a probabilistic programming framework on top of Apache Spark. Efficient statistical inference can be easily implemented on this framework and inference process can leverage the distributed main memory processing power of Spark. This framework makes statistical inference on big data possible and speed up the penetration of probabilistic programming into the data engineering domain. △ Less

Submitted 9 October, 2017; v1 submitted 7 July, 2017; originally announced July 2017.

Comments: 13 pages, 22 figures

arXiv:1211.2073 [pdf, ps, other]

LAGE: A Java Framework to reconstruct Gene Regulatory Networks from Large-Scale Continues Expression Data

Authors: Yang Lu, Mengying Wang, Kenny Q. Zhu, Bo Yuan

Abstract: LAGE is a systematic framework developed in Java. The motivation of LAGE is to provide a scalable and parallel solution to reconstruct Gene Regulatory Networks (GRNs) from continuous gene expression data for very large amount of genes. The basic idea of our framework is motivated by the philosophy of divideand-conquer. Specifically, LAGE recursively partitions genes into multiple overlapping commu… ▽ More LAGE is a systematic framework developed in Java. The motivation of LAGE is to provide a scalable and parallel solution to reconstruct Gene Regulatory Networks (GRNs) from continuous gene expression data for very large amount of genes. The basic idea of our framework is motivated by the philosophy of divideand-conquer. Specifically, LAGE recursively partitions genes into multiple overlapping communities with much smaller sizes, learns intra-community GRNs respectively before merge them altogether. Besides, the complete information of overlapping communities serves as the byproduct, which could be used to mine meaningful functional modules in biological networks. △ Less

Submitted 9 November, 2012; originally announced November 2012.

Comments: 2 pages

Showing 1–45 of 45 results for author: Zhu, K Q