subscribe to arXiv mailings

Talking to Machines: do you read me?

Abstract: In this dissertation I would like to guide the reader to the research on dialogue but more precisely the research I have conducted during my career since my PhD thesis. Starting from modular architectures with machine learning/deep learning and reinforcement learning to end-to-end deep neural networks. Besides my work as research associate, I also present the work I have supervised in the last yea… ▽ More In this dissertation I would like to guide the reader to the research on dialogue but more precisely the research I have conducted during my career since my PhD thesis. Starting from modular architectures with machine learning/deep learning and reinforcement learning to end-to-end deep neural networks. Besides my work as research associate, I also present the work I have supervised in the last years. I review briefly the state of the art and highlight the open research problems on conversational agents. Afterwards, I present my contribution to Task-Oriented Dialogues (TOD), both as research associate and as the industrial supervisor of CIFRE theses. I discuss conversational QA. Particularly, I present the work of two PhD candidates Thibault Cordier and Sebastien Montella; as well as the work of the young researcher Quentin Brabant. Finally, I present the scientific project, where I discuss about Large Language Models (LLMs) for Task-Oriented Dialogue and Multimodal Task-Oriented Dialogue. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: French Doctoral Habilitation HDR manuscript: https://hal.science/tel-04620199

arXiv:2406.12945 [pdf, other]

Under the Hood of Tabular Data Generation Models: the Strong Impact of Hyperparameter Tuning

Authors: G. Charbel N. Kindji, Lina Maria Rojas-Barahona, Elisa Fromont, Tanguy Urvoy

Abstract: We investigate the impact of dataset-specific hyperparameter, feature encoding, and architecture tuning on five recent model families for tabular data generation through an extensive benchmark on 16 datasets. This study addresses the practical need for a unified evaluation of models that fully considers hyperparameter optimization. Additionally, we propose a reduced search space for each model tha… ▽ More We investigate the impact of dataset-specific hyperparameter, feature encoding, and architecture tuning on five recent model families for tabular data generation through an extensive benchmark on 16 datasets. This study addresses the practical need for a unified evaluation of models that fully considers hyperparameter optimization. Additionally, we propose a reduced search space for each model that allows for quick optimization, achieving nearly equivalent performance at a significantly lower cost.Our benchmark demonstrates that, for most models, large-scale dataset-specific tuning substantially improves performance compared to the original configurations. Furthermore, we confirm that diffusion-based models generally outperform other models on tabular data. However, this advantage is not significant when the entire tuning and training process is restricted to the same GPU budget for all models. △ Less

Submitted 12 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2404.07836 [pdf, ps, other]

Question Generation in Knowledge-Driven Dialog: Explainability and Evaluation

Authors: Juliette Faille, Quentin Brabant, Gwenole Lecorve, Lina M. Rojas-Barahona, Claire Gardent

Abstract: We explore question generation in the context of knowledge-grounded dialogs focusing on explainability and evaluation. Inspired by previous work on planning-based summarisation, we present a model which instead of directly generating a question, sequentially predicts first a fact then a question. We evaluate our approach on 37k test dialogs adapted from the KGConv dataset and we show that, althoug… ▽ More We explore question generation in the context of knowledge-grounded dialogs focusing on explainability and evaluation. Inspired by previous work on planning-based summarisation, we present a model which instead of directly generating a question, sequentially predicts first a fact then a question. We evaluate our approach on 37k test dialogs adapted from the KGConv dataset and we show that, although more demanding in terms of inference, our approach performs on par with a standard model which solely generates a question while allowing for a detailed referenceless evaluation of the model behaviour in terms of relevance, factuality and pronominalisation. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2401.03905 [pdf, other]

WEBDial, a Multi-domain, Multitask Statistical Dialogue Framework with RDF

Authors: Morgan Veyret, Jean-Baptiste Duchene, Kekeli Afonouvi, Quentin Brabant, Gwenole Lecorve, Lina M. Rojas-Barahona

Abstract: Typically available dialogue frameworks have adopted a semantic representation based on dialogue-acts and slot-value pairs. Despite its simplicity, this representation has disadvantages such as the lack of expressivity, scalability and explainability. We present WEBDial: a dialogue framework that relies on a graph formalism by using RDF triples instead of slot-value pairs. We describe its overall… ▽ More Typically available dialogue frameworks have adopted a semantic representation based on dialogue-acts and slot-value pairs. Despite its simplicity, this representation has disadvantages such as the lack of expressivity, scalability and explainability. We present WEBDial: a dialogue framework that relies on a graph formalism by using RDF triples instead of slot-value pairs. We describe its overall architecture and the graph-based semantic representation. We show its applicability from simple to complex applications, by varying the complexity of domains and tasks: from single domain and tasks to multiple domains and complex tasks. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.15098 [pdf, other]

doi 10.21437/Interspeech.2023-1929

Unsupervised Auditory and Semantic Entrainment Models with Deep Neural Networks

Authors: Jay Kejriwal, Stefan Benus, Lina M. Rojas-Barahona

Abstract: Speakers tend to engage in adaptive behavior, known as entrainment, when they become similar to their interlocutor in various aspects of speaking. We present an unsupervised deep learning framework that derives meaningful representation from textual features for developing semantic entrainment. We investigate the model's performance by extracting features using different variations of the BERT mod… ▽ More Speakers tend to engage in adaptive behavior, known as entrainment, when they become similar to their interlocutor in various aspects of speaking. We present an unsupervised deep learning framework that derives meaningful representation from textual features for developing semantic entrainment. We investigate the model's performance by extracting features using different variations of the BERT model (DistilBERT and XLM-RoBERTa) and Google's universal sentence encoder (USE) embeddings on two human-human (HH) corpora (The Fisher Corpus English Part 1, Columbia games corpus) and one human-machine (HM) corpus (Voice Assistant Conversation Corpus (VACC)). In addition to semantic features we also trained DNN-based models utilizing two auditory embeddings (TRIpLet Loss network (TRILL) vectors, Low-level descriptors (LLD) features) and two units of analysis (Inter pausal unit and Turn). The results show that semantic entrainment can be assessed with our model, that models can distinguish between HH and HM interactions and that the two units of analysis for extracting acoustic features provide comparable findings. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: Interspeech2023

arXiv:2311.04922 [pdf, other]

Are cascade dialogue state tracking models speaking out of turn in spoken dialogues?

Authors: Lucas Druart, Léo Jacqmin, Benoît Favre, Lina Maria Rojas-Barahona, Valentin Vielzeuf

Abstract: In Task-Oriented Dialogue (TOD) systems, correctly updating the system's understanding of the user's needs is key to a smooth interaction. Traditionally TOD systems are composed of several modules that interact with one another. While each of these components is the focus of active research communities, their behavior in interaction can be overlooked. This paper proposes a comprehensive analysis o… ▽ More In Task-Oriented Dialogue (TOD) systems, correctly updating the system's understanding of the user's needs is key to a smooth interaction. Traditionally TOD systems are composed of several modules that interact with one another. While each of these components is the focus of active research communities, their behavior in interaction can be overlooked. This paper proposes a comprehensive analysis of the errors of state of the art systems in complex settings such as Dialogue State Tracking which highly depends on the dialogue context. Based on spoken MultiWoz, we identify that errors on non-categorical slots' values are essential to address in order to bridge the gap between spoken and chat-based dialogue systems. We explore potential solutions to improve transcriptions and help dialogue state tracking generative models correct such errors. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: Submitted to IEEE ICASSP 2024© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2308.15298 [pdf, other]

KGConv, a Conversational Corpus grounded in Wikidata

Authors: Quentin Brabant, Gwenole Lecorve, Lina M. Rojas-Barahona, Claire Gardent

Abstract: We present KGConv, a large, conversational corpus of 71k conversations where each question-answer pair is grounded in a Wikidata fact. Conversations contain on average 8.6 questions and for each Wikidata fact, we provide multiple variants (12 on average) of the corresponding question using templates, human annotations, hand-crafted rules and a question rewriting neural model. We provide baselines… ▽ More We present KGConv, a large, conversational corpus of 71k conversations where each question-answer pair is grounded in a Wikidata fact. Conversations contain on average 8.6 questions and for each Wikidata fact, we provide multiple variants (12 on average) of the corresponding question using templates, human annotations, hand-crafted rules and a question rewriting neural model. We provide baselines for the task of Knowledge-Based, Conversational Question Generation. KGConv can further be used for other generation and analysis tasks such as single-turn question generation from Wikidata triples, question rewriting, question answering from conversation or from knowledge graphs and quiz generation. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2304.14986 [pdf, other]

doi 10.3389/frai.2023.1220476

Interpreting Vision and Language Generative Models with Semantic Visual Priors

Authors: Michele Cafagna, Lina M. Rojas-Barahona, Kees van Deemter, Albert Gatt

Abstract: When applied to Image-to-text models, interpretability methods often provide token-by-token explanations namely, they compute a visual explanation for each token of the generated sequence. Those explanations are expensive to compute and unable to comprehensively explain the model's output. Therefore, these models often require some sort of approximation that eventually leads to misleading explanat… ▽ More When applied to Image-to-text models, interpretability methods often provide token-by-token explanations namely, they compute a visual explanation for each token of the generated sequence. Those explanations are expensive to compute and unable to comprehensively explain the model's output. Therefore, these models often require some sort of approximation that eventually leads to misleading explanations. We develop a framework based on SHAP, that allows for generating comprehensive, meaningful explanations leveraging the meaning representation of the output sequence as a whole. Moreover, by exploiting semantic priors in the visual backbone, we extract an arbitrary number of features that allows the efficient computation of Shapley values on large-scale models, generating at the same time highly meaningful visual explanations. We demonstrate that our method generates semantically more expressive explanations than traditional methods at a lower compute cost and that it can be generalized over other explainability methods. △ Less

Submitted 4 May, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

arXiv:2304.11073 [pdf, other]

OLISIA: a Cascade System for Spoken Dialogue State Tracking

Authors: Léo Jacqmin, Lucas Druart, Yannick Estève, Benoît Favre, Lina Maria Rojas-Barahona, Valentin Vielzeuf

Abstract: Though Dialogue State Tracking (DST) is a core component of spoken dialogue systems, recent work on this task mostly deals with chat corpora, disregarding the discrepancies between spoken and written language.In this paper, we propose OLISIA, a cascade system which integrates an Automatic Speech Recognition (ASR) model and a DST model. We introduce several adaptations in the ASR and DST modules to… ▽ More Though Dialogue State Tracking (DST) is a core component of spoken dialogue systems, recent work on this task mostly deals with chat corpora, disregarding the discrepancies between spoken and written language.In this paper, we propose OLISIA, a cascade system which integrates an Automatic Speech Recognition (ASR) model and a DST model. We introduce several adaptations in the ASR and DST modules to improve integration and robustness to spoken conversations.With these adaptations, our system ranked first in DSTC11 Track 3, a benchmark to evaluate spoken DST. We conduct an in-depth analysis of the results and find that normalizing the ASR outputs and adapting the DST inputs through data augmentation, along with increasing the pre-trained models size all play an important role in reducing the performance discrepancy between written and spoken conversations. △ Less

Submitted 31 August, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

arXiv:2302.11199 [pdf, other]

Few-Shot Structured Policy Learning for Multi-Domain and Multi-Task Dialogues

Authors: Thibault Cordier, Tanguy Urvoy, Fabrice Lefevre, Lina M. Rojas-Barahona

Abstract: Reinforcement learning has been widely adopted to model dialogue managers in task-oriented dialogues. However, the user simulator provided by state-of-the-art dialogue frameworks are only rough approximations of human behaviour. The ability to learn from a small number of human interactions is hence crucial, especially on multi-domain and multi-task environments where the action space is large. We… ▽ More Reinforcement learning has been widely adopted to model dialogue managers in task-oriented dialogues. However, the user simulator provided by state-of-the-art dialogue frameworks are only rough approximations of human behaviour. The ability to learn from a small number of human interactions is hence crucial, especially on multi-domain and multi-task environments where the action space is large. We therefore propose to use structured policies to improve sample efficiency when learning on these kinds of environments. We also evaluate the impact of learning from human vs simulated experts. Among the different levels of structure that we tested, the graph neural networks (GNNs) show a remarkable superiority by reaching a success rate above 80% with only 50 dialogues, when learning from simulated experts. They also show superiority when learning from human experts, although a performance drop was observed, indicating a possible difficulty in capturing the variability of human strategies. We therefore suggest to concentrate future research efforts on bridging the gap between human data, simulators and automatic evaluators in dialogue frameworks. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: 8 pages, at the EACL2023 conference (Findings)

arXiv:2302.05900 [pdf, other]

Investigating the Effect of Relative Positional Embeddings on AMR-to-Text Generation with Structural Adapters

Authors: Sebastien Montella, Alexis Nasr, Johannes Heinecke, Frederic Bechet, Lina M. Rojas-Barahona

Abstract: Text generation from Abstract Meaning Representation (AMR) has substantially benefited from the popularized Pretrained Language Models (PLMs). Myriad approaches have linearized the input graph as a sequence of tokens to fit the PLM tokenization requirements. Nevertheless, this transformation jeopardizes the structural integrity of the graph and is therefore detrimental to its resulting representat… ▽ More Text generation from Abstract Meaning Representation (AMR) has substantially benefited from the popularized Pretrained Language Models (PLMs). Myriad approaches have linearized the input graph as a sequence of tokens to fit the PLM tokenization requirements. Nevertheless, this transformation jeopardizes the structural integrity of the graph and is therefore detrimental to its resulting representation. To overcome this issue, Ribeiro et al. have recently proposed StructAdapt, a structure-aware adapter which injects the input graph connectivity within PLMs using Graph Neural Networks (GNNs). In this paper, we investigate the influence of Relative Position Embeddings (RPE) on AMR-to-Text, and, in parallel, we examine the robustness of StructAdapt. Through ablation studies, graph attack and link prediction, we reveal that RPE might be partially encoding input graphs. We suggest further research regarding the role of RPE will provide valuable insights for Graph-to-Text generation. △ Less

Submitted 12 February, 2023; originally announced February 2023.

Comments: Accepted to EACL 2023 (Main)

arXiv:2210.05252 [pdf, other]

Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues

Authors: Thibault Cordier, Tanguy Urvoy, Fabrice Lefèvre, Lina M. Rojas-Barahona

Abstract: Task-oriented dialogue systems are designed to achieve specific goals while conversing with humans. In practice, they may have to handle simultaneously several domains and tasks. The dialogue manager must therefore be able to take into account domain changes and plan over different domains/tasks in order to deal with multidomain dialogues. However, learning with reinforcement in such context becom… ▽ More Task-oriented dialogue systems are designed to achieve specific goals while conversing with humans. In practice, they may have to handle simultaneously several domains and tasks. The dialogue manager must therefore be able to take into account domain changes and plan over different domains/tasks in order to deal with multidomain dialogues. However, learning with reinforcement in such context becomes difficult because the state-action dimension is larger while the reward signal remains scarce. Our experimental results suggest that structured policies based on graph neural networks combined with different degrees of imitation learning can effectively handle multi-domain dialogues. The reported experiments underline the benefit of structured policies over standard policies. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Journal ref: SIGDIAL 2022

arXiv:2207.14627 [pdf, other]

"Do you follow me?": A Survey of Recent Approaches in Dialogue State Tracking

Authors: Léo Jacqmin, Lina M. Rojas-Barahona, Benoit Favre

Abstract: While communicating with a user, a task-oriented dialogue system has to track the user's needs at each turn according to the conversation history. This process called dialogue state tracking (DST) is crucial because it directly informs the downstream dialogue policy. DST has received a lot of interest in recent years with the text-to-text paradigm emerging as the favored approach. In this review p… ▽ More While communicating with a user, a task-oriented dialogue system has to track the user's needs at each turn according to the conversation history. This process called dialogue state tracking (DST) is crucial because it directly informs the downstream dialogue policy. DST has received a lot of interest in recent years with the text-to-text paradigm emerging as the favored approach. In this review paper, we first present the task and its associated datasets. Then, considering a large number of recent publications, we identify highlights and advances of research in 2021-2022. Although neural approaches have enabled significant progress, we argue that some critical aspects of dialogue systems such as generalizability are still underexplored. To motivate future studies, we propose several research avenues. △ Less

Submitted 29 July, 2022; originally announced July 2022.

Comments: SIGDIAL 2022

arXiv:2207.03240 [pdf, ps, other]

CoQAR: Question Rewriting on CoQA

Authors: Quentin Brabant, Gwenole Lecorve, Lina M. Rojas-Barahona

Abstract: Questions asked by humans during a conversation often contain contextual dependencies, i.e., explicit or implicit references to previous dialogue turns. These dependencies take the form of coreferences (e.g., via pronoun use) or ellipses, and can make the understanding difficult for automated systems. One way to facilitate the understanding and subsequent treatments of a question is to rewrite it… ▽ More Questions asked by humans during a conversation often contain contextual dependencies, i.e., explicit or implicit references to previous dialogue turns. These dependencies take the form of coreferences (e.g., via pronoun use) or ellipses, and can make the understanding difficult for automated systems. One way to facilitate the understanding and subsequent treatments of a question is to rewrite it into an out-of-context form, i.e., a form that can be understood without the conversational context. We propose CoQAR, a corpus containing $4.5$K conversations from the Conversational Question-Answering dataset CoQA, for a total of $53$K follow-up question-answer pairs. Each original question was manually annotated with at least 2 at most 3 out-of-context rewritings. CoQAR can be used in the supervised learning of three tasks: question paraphrasing, question rewriting and conversational question answering. In order to assess the quality of CoQAR's rewritings, we conduct several experiments consisting in training and evaluating models for these three tasks. Our results support the idea that question rewriting can be used as a preprocessing step for question answering models, thereby increasing their performances. △ Less

Submitted 7 July, 2022; originally announced July 2022.

Comments: Published in LREC2022

arXiv:2207.03145 [pdf, other]

Active Learning and Multi-label Classification for Ellipsis and Coreference Detection in Conversational Question-Answering

Authors: Quentin Brabant, Lina Maria Rojas-Barahona, Claire Gardent

Abstract: In human conversations, ellipsis and coreference are commonly occurring linguistic phenomena. Although these phenomena are a mean of making human-machine conversations more fluent and natural, only few dialogue corpora contain explicit indications on which turns contain ellipses and/or coreferences. In this paper we address the task of automatically detecting ellipsis and coreferences in conversat… ▽ More In human conversations, ellipsis and coreference are commonly occurring linguistic phenomena. Although these phenomena are a mean of making human-machine conversations more fluent and natural, only few dialogue corpora contain explicit indications on which turns contain ellipses and/or coreferences. In this paper we address the task of automatically detecting ellipsis and coreferences in conversational question answering. We propose to use a multi-label classifier based on DistilBERT. Multi-label classification and active learning are employed to compensate the limited amount of labeled data. We show that these methods greatly enhance the performance of the classifier for detecting these phenomena on a manually labeled dataset. △ Less

Submitted 7 July, 2022; originally announced July 2022.

Comments: Published in IWSDS 2021

arXiv:2101.05004 [pdf, other]

Is the User Enjoying the Conversation? A Case Study on the Impact on the Reward Function

Authors: Lina M. Rojas-Barahona

Abstract: The impact of user satisfaction in policy learning task-oriented dialogue systems has long been a subject of research interest. Most current models for estimating the user satisfaction either (i) treat out-of-context short-texts, such as product reviews, or (ii) rely on turn features instead of on distributed semantic representations. In this work we adopt deep neural networks that use distributed… ▽ More The impact of user satisfaction in policy learning task-oriented dialogue systems has long been a subject of research interest. Most current models for estimating the user satisfaction either (i) treat out-of-context short-texts, such as product reviews, or (ii) rely on turn features instead of on distributed semantic representations. In this work we adopt deep neural networks that use distributed semantic representation learning for estimating the user satisfaction in conversations. We evaluate the impact of modelling context length in these networks. Moreover, we show that the proposed hierarchical network outperforms state-of-the-art quality estimators. Furthermore, we show that applying these networks to infer the reward function in a Partial Observable Markov Decision Process (POMDP) yields to a great improvement in the task success rate. △ Less

Submitted 13 January, 2021; originally announced January 2021.

Comments: Accepted at the Human in the Loop Dialogue Systems, 34st Conference on Neural Information Processing Systems (NeurIPS 2020). Paper updated with minor changes

arXiv:2012.04687 [pdf, ps, other]

Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation

Authors: Thibault Cordier, Tanguy Urvoy, Lina M. Rojas-Barahona, Fabrice Lefèvre

Abstract: A learning dialogue agent can infer its behaviour from interactions with the users. These interactions can be taken from either human-to-human or human-machine conversations. However, human interactions are scarce and costly, making learning from few interactions essential. One solution to speedup the learning process is to guide the agent's exploration with the help of an expert. We present in th… ▽ More A learning dialogue agent can infer its behaviour from interactions with the users. These interactions can be taken from either human-to-human or human-machine conversations. However, human interactions are scarce and costly, making learning from few interactions essential. One solution to speedup the learning process is to guide the agent's exploration with the help of an expert. We present in this paper several imitation learning strategies for dialogue policy where the guiding expert is a near-optimal handcrafted policy. We incorporate these strategies with state-of-the-art reinforcement learning methods based on Q-learning and actor-critic. We notably propose a randomised exploration policy which allows for a seamless hybridisation of the learned policy and the expert. Our experiments show that our hybridisation strategy outperforms several baselines, and that it can accelerate the learning when facing real humans. △ Less

Submitted 25 November, 2020; originally announced December 2020.

Comments: 8 pages, Accepted at Human in the Loop Dialogue Systems Workshop, NeurIPS 2020

arXiv:1909.11980 [pdf, other]

Spoken Conversational Search for General Knowledge

Authors: Lina M. Rojas-Barahona, Pascal Bellec, Benoit Besset, Martinho Dos-Santos, Johannes Heinecke, Munshi Asadullah, Olivier Le-Blouch, Jean Y. Lancien, Géraldine Damnati, Emmanuel Mory, Frédéric Herledan

Abstract: We present a spoken conversational question answering proof of concept that is able to answer questions about general knowledge from Wikidata. The dialogue component does not only orchestrate various components but also solve coreferences and ellipsis. We present a spoken conversational question answering proof of concept that is able to answer questions about general knowledge from Wikidata. The dialogue component does not only orchestrate various components but also solve coreferences and ellipsis. △ Less

Submitted 26 September, 2019; originally announced September 2019.

Comments: SIGDial2019

arXiv:1806.05484 [pdf, other]

Nearly Zero-Shot Learning for Semantic Decoding in Spoken Dialogue Systems

Authors: Lina M. Rojas-Barahona, Stefan Ultes, Pawel Budzianowski, Iñigo Casanueva, Milica Gasic, Bo-Hsiang Tseng, Steve Young

Abstract: This paper presents two ways of dealing with scarce data in semantic decoding using N-Best speech recognition hypotheses. First, we learn features by using a deep learning architecture in which the weights for the unknown and known categories are jointly optimised. Second, an unsupervised method is used for further tuning the weights. Sharing weights injects prior knowledge to unknown categories.… ▽ More This paper presents two ways of dealing with scarce data in semantic decoding using N-Best speech recognition hypotheses. First, we learn features by using a deep learning architecture in which the weights for the unknown and known categories are jointly optimised. Second, an unsupervised method is used for further tuning the weights. Sharing weights injects prior knowledge to unknown categories. The unsupervised tuning (i.e. the risk minimisation) improves the F-Measure when recognising nearly zero-shot data on the DSTC3 corpus. This unsupervised method can be applied subject to two assumptions: the rank of the class marginal is assumed to be known and the class-conditional scores of the classifier are assumed to follow a Gaussian distribution. △ Less

Submitted 21 June, 2018; v1 submitted 14 June, 2018; originally announced June 2018.

arXiv:1609.02846 [pdf, other]

Dialogue manager domain adaptation using Gaussian process reinforcement learning

Authors: Milica Gasic, Nikola Mrksic, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, David Vandyke, Tsung-Hsien Wen, Steve Young

Abstract: Spoken dialogue systems allow humans to interact with machines using natural speech. As such, they have many benefits. By using speech as the primary communication medium, a computer interface can facilitate swift, human-like acquisition of information. In recent years, speech interfaces have become ever more popular, as is evident from the rise of personal assistants such as Siri, Google Now, Cor… ▽ More Spoken dialogue systems allow humans to interact with machines using natural speech. As such, they have many benefits. By using speech as the primary communication medium, a computer interface can facilitate swift, human-like acquisition of information. In recent years, speech interfaces have become ever more popular, as is evident from the rise of personal assistants such as Siri, Google Now, Cortana and Amazon Alexa. Recently, data-driven machine learning methods have been applied to dialogue modelling and the results achieved for limited-domain applications are comparable to or outperform traditional approaches. Methods based on Gaussian processes are particularly effective as they enable good models to be estimated from limited training data. Furthermore, they provide an explicit estimate of the uncertainty which is particularly useful for reinforcement learning. This article explores the additional steps that are necessary to extend these methods to model multiple dialogue domains. We show that Gaussian process reinforcement learning is an elegant framework that naturally supports a range of methods, including prior knowledge, Bayesian committee machines and multi-agent learning, for facilitating extensible and adaptable dialogue systems. △ Less

Submitted 9 September, 2016; originally announced September 2016.

Comments: accepted for publication in Computer Speech and Language

arXiv:1606.03352 [pdf, other]

Conditional Generation and Snapshot Learning in Neural Dialogue Systems

Authors: Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, David Vandyke, Steve Young

Abstract: Recently a variety of LSTM-based conditional language models (LM) have been applied across a range of language generation tasks. In this work we study various model architectures and different ways to represent and aggregate the source information in an end-to-end neural dialogue system framework. A method called snapshot learning is also proposed to facilitate learning from supervised sequential… ▽ More Recently a variety of LSTM-based conditional language models (LM) have been applied across a range of language generation tasks. In this work we study various model architectures and different ways to represent and aggregate the source information in an end-to-end neural dialogue system framework. A method called snapshot learning is also proposed to facilitate learning from supervised sequential signals by applying a companion cross-entropy objective function to the conditioning vector. The experimental and analytical results demonstrate firstly that competition occurs between the conditioning vector and the LM, and the differing architectures provide different trade-offs between the two. Secondly, the discriminative power and transparency of the conditioning vector is key to providing both model interpretability and better performance. Thirdly, snapshot learning leads to consistent performance improvements independent of which architecture is used. △ Less

Submitted 10 June, 2016; originally announced June 2016.

arXiv:1604.04562 [pdf, other]

A Network-based End-to-End Trainable Task-oriented Dialogue System

Authors: Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, Milica Gasic, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, Steve Young

Abstract: Teaching machines to accomplish tasks by conversing naturally with humans is challenging. Currently, developing task-oriented dialogue systems requires creating multiple components and typically this involves either a large amount of handcrafting, or acquiring costly labelled datasets to solve a statistical learning problem for each component. In this work we introduce a neural network-based text-… ▽ More Teaching machines to accomplish tasks by conversing naturally with humans is challenging. Currently, developing task-oriented dialogue systems requires creating multiple components and typically this involves either a large amount of handcrafting, or acquiring costly labelled datasets to solve a statistical learning problem for each component. In this work we introduce a neural network-based text-in, text-out end-to-end trainable goal-oriented dialogue system along with a new way of collecting dialogue data based on a novel pipe-lined Wizard-of-Oz framework. This approach allows us to develop dialogue systems easily and without making too many assumptions about the task at hand. The results show that the model can converse with human subjects naturally whilst helping them to accomplish tasks in a restaurant search domain. △ Less

Submitted 24 April, 2017; v1 submitted 15 April, 2016; originally announced April 2016.

Comments: published at EACL 2017

arXiv:1603.01232 [pdf, other]

Multi-domain Neural Network Language Generation for Spoken Dialogue Systems

Authors: Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina M. Rojas-Barahona, Pei-Hao Su, David Vandyke, Steve Young

Abstract: Moving from limited-domain natural language generation (NLG) to open domain is difficult because the number of semantic input combinations grows exponentially with the number of domains. Therefore, it is important to leverage existing resources and exploit similarities between domains to facilitate domain adaptation. In this paper, we propose a procedure to train multi-domain, Recurrent Neural Net… ▽ More Moving from limited-domain natural language generation (NLG) to open domain is difficult because the number of semantic input combinations grows exponentially with the number of domains. Therefore, it is important to leverage existing resources and exploit similarities between domains to facilitate domain adaptation. In this paper, we propose a procedure to train multi-domain, Recurrent Neural Network-based (RNN) language generators via multiple adaptation steps. In this procedure, a model is first trained on counterfeited data synthesised from an out-of-domain dataset, and then fine tuned on a small set of in-domain utterances with a discriminative objective function. Corpus-based evaluation results show that the proposed procedure can achieve competitive performance in terms of BLEU score and slot error rate while significantly reducing the data needed to train generators in new, unseen domains. In subjective testing, human judges confirm that the procedure greatly improves generator performance when only a small amount of data is available in the domain. △ Less

Submitted 3 March, 2016; originally announced March 2016.

Comments: Accepted as a long paper in NAACL-HLT 2016

Showing 1–23 of 23 results for author: Rojas-Barahona, L M