Skip to main content

Showing 1–45 of 45 results for author: Sordoni, A

  1. arXiv:2405.15589  [pdf, other

    cs.LG cs.CR

    Efficient Adversarial Training in LLMs with Continuous Attacks

    Authors: Sophie Xhonneux, Alessandro Sordoni, Stephan Günnemann, Gauthier Gidel, Leo Schwinn

    Abstract: Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. In many domains, adversarial training has proven to be one of the most promising methods to reliably improve robustness against such attacks. Yet, in the context of LLMs, current methods for adversarial training are hindered by the high computational costs required to perform discrete advers… ▽ More

    Submitted 21 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 19 pages, 4 figures

  2. arXiv:2405.11157  [pdf, other

    cs.LG cs.CL

    Towards Modular LLMs by Building and Reusing a Library of LoRAs

    Authors: Oleksiy Ostapenko, Zhan Su, Edoardo Maria Ponti, Laurent Charlin, Nicolas Le Roux, Matheus Pereira, Lucas Caccia, Alessandro Sordoni

    Abstract: The growing number of parameter-efficient adaptations of a base large language model (LLM) calls for studying whether we can reuse such trained adapters to improve performance for new tasks. We study how to best build a library of adapters given multi-task data and devise techniques for both zero-shot and supervised task generalization through routing in such library. We benchmark existing approac… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  3. arXiv:2402.06457  [pdf, other

    cs.LG cs.AI cs.CL

    V-STaR: Training Verifiers for Self-Taught Reasoners

    Authors: Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville, Alessandro Sordoni, Rishabh Agarwal

    Abstract: Common self-improvement approaches for large language models (LLMs), such as STaR (Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability. However, these approaches discard the large amounts of incorrect solutions generated during this process, potentially neglecting valuable information in such solutions. To address this shortcoming… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  4. arXiv:2310.05707  [pdf, other

    cs.CL cs.AI cs.LG

    Guiding Language Model Math Reasoning with Planning Tokens

    Authors: Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni

    Abstract: Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning st… ▽ More

    Submitted 5 February, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  5. arXiv:2306.12509  [pdf, other

    cs.CL cs.LG

    Joint Prompt Optimization of Stacked LLMs using Variational Inference

    Authors: Alessandro Sordoni, Xingdi Yuan, Marc-Alexandre Côté, Matheus Pereira, Adam Trischler, Ziang Xiao, Arian Hosseini, Friederike Niedtner, Nicolas Le Roux

    Abstract: Large language models (LLMs) can be seen as atomic units of computation mapping sequences to a distribution over sequences. Thus, they can be seen as stochastic language layers in a language network, where the learnable parameters are the natural language prompts at each layer. By stacking two such layers and feeding the output of one layer to the next, we obtain a Deep Language Network (DLN). We… ▽ More

    Submitted 4 December, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023

  6. arXiv:2211.08473  [pdf, other

    cs.CL cs.LG

    On the Compositional Generalization Gap of In-Context Learning

    Authors: Arian Hosseini, Ankit Vani, Dzmitry Bahdanau, Alessandro Sordoni, Aaron Courville

    Abstract: Pretrained large generative language models have shown great performance on many tasks, but exhibit low compositional generalization abilities. Scaling such models has been shown to improve their performance on various NLP tasks even just by conditioning them on a few examples to solve the task without any fine-tuning (also known as in-context learning). In this work, we look at the gap between th… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  7. arXiv:2211.03831  [pdf, other

    cs.AI

    Multi-Head Adapter Routing for Cross-Task Generalization

    Authors: Lucas Caccia, Edoardo Ponti, Zhan Su, Matheus Pereira, Nicolas Le Roux, Alessandro Sordoni

    Abstract: Parameter-efficient fine-tuning (PEFT) for cross-task generalization consists in pre-training adapters on a multi-task training set before few-shot adaptation to test tasks. Polytropon [Ponti et al., 2023] ($\texttt{Poly}$) jointly learns an inventory of adapters and a routing function that selects a (variable-size) subset of adapters for each task during both pre-training and few-shot adaptation.… ▽ More

    Submitted 13 November, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: Accepted at NeurIPS 2023. Code is available at https://github.com/microsoft/mttl

  8. arXiv:2206.01251  [pdf, other

    cs.LG cs.AI cs.CV

    Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

    Authors: Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni

    Abstract: We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training. We argue that representations can be evaluated through the lens of expressiveness and learnability. We propose to use the Intrinsic Dimension (ID) to assess expressivene… ▽ More

    Submitted 14 November, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

    Journal ref: TMLR 2023 -- Transactions of Machine Learning Research, 11/2023

  9. arXiv:2203.12788  [pdf, other

    cs.CL

    Evaluating Distributional Distortion in Neural Language Modeling

    Authors: Benjamin LeBrun, Alessandro Sordoni, Timothy J. O'Donnell

    Abstract: A fundamental characteristic of natural language is the high rate at which speakers produce novel expressions. Because of this novelty, a heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language (Baayen, 2001). Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate. As a resul… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Journal ref: International Conference on Learning Representations. 2022

  10. arXiv:2203.10692  [pdf, other

    cs.CL

    Better Language Model with Hypernym Class Prediction

    Authors: He Bai, Tong Wang, Alessandro Sordoni, Peng Shi

    Abstract: Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs. In this study, we revisit this approach in the context of neural LMs. We hypothesize that class-based prediction leads to an implicit context aggregation for similar words and thus can improve generalization for rare words. We map words that have a common WordNet hypernym to the same class and tra… ▽ More

    Submitted 20 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  11. arXiv:2202.13914  [pdf, other

    cs.LG cs.CL

    Combining Modular Skills in Multitask Learning

    Authors: Edoardo M. Ponti, Alessandro Sordoni, Yoshua Bengio, Siva Reddy

    Abstract: A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks. In this work, we assume that each task is associated with a subset of latent discrete skills from a (potentially small) inventory. In turn, skills correspond to parameter-efficient (sparse / low-rank) model parameterisations. By jointly learning these… ▽ More

    Submitted 1 March, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

  12. arXiv:2112.08583  [pdf, other

    cs.CL

    Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge

    Authors: Ian Porada, Alessandro Sordoni, Jackie Chi Kit Cheung

    Abstract: Transformer models pre-trained with a masked-language-modeling objective (e.g., BERT) encode commonsense knowledge as evidenced by behavioral probes; however, the extent to which this knowledge is acquired by systematic inference over the semantics of the pre-training corpora is an open question. To answer this question, we selectively inject verbalized knowledge into the minibatches of a BERT mod… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

  13. arXiv:2109.08259  [pdf, other

    cs.CL

    Self-training with Few-shot Rationalization: Teacher Explanations Aid Student in Few-shot NLU

    Authors: Meghana Moorthy Bhat, Alessandro Sordoni, Subhabrata Mukherjee

    Abstract: While pre-trained language models have obtained state-of-the-art performance for several natural language understanding tasks, they are quite opaque in terms of their decision-making process. While some recent works focus on rationalizing neural predictions by highlighting salient concepts in the text as justifications or rationales, they rely on thousands of labeled training examples for both tas… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

    Comments: To Appear in EMNLP 2021

  14. arXiv:2109.06232  [pdf, other

    cs.CL cs.IT cs.NE

    The Emergence of the Shape Bias Results from Communicative Efficiency

    Authors: Eva Portelance, Michael C. Frank, Dan Jurafsky, Alessandro Sordoni, Romain Laroche

    Abstract: By the age of two, children tend to assume that new word categories are based on objects' shape, rather than their color or texture; this assumption is called the shape bias. They are thought to learn this bias by observing that their caregiver's language is biased towards shape based categories. This presents a chicken and egg problem: if the shape bias must be present in the language in order fo… ▽ More

    Submitted 14 September, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: Accepted at CoNLL 2021

  15. arXiv:2106.13401  [pdf, other

    cs.LG cs.AI

    Decomposed Mutual Information Estimation for Contrastive Representation Learning

    Authors: Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Phil Bachman, Remi Tachet

    Abstract: Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong unde… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: ICML 2021

  16. arXiv:2105.03519  [pdf, other

    cs.CL

    Understanding by Understanding Not: Modeling Negation in Language Models

    Authors: Arian Hosseini, Siva Reddy, Dzmitry Bahdanau, R Devon Hjelm, Alessandro Sordoni, Aaron Courville

    Abstract: Negation is a core construction in natural language. Despite being very successful on many tasks, state-of-the-art pre-trained language models often handle negation incorrectly. To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus. By training BERT with the r… ▽ More

    Submitted 7 May, 2021; originally announced May 2021.

  17. Linguistic Dependencies and Statistical Dependence

    Authors: Jacob Louis Hoover, Alessandro Sordoni, Wenyu Du, Timothy J. O'Donnell

    Abstract: Are pairs of words that tend to occur together also likely to stand in a linguistic dependency? This empirical question is motivated by a long history of literature in cognitive science, psycholinguistics, and NLP. In this work we contribute an extensive analysis of the relationship between linguistic dependencies and statistical dependence between words. Improving on previous work, we introduce t… ▽ More

    Submitted 29 April, 2022; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: EMNLP2021 camera-ready version. 9 pages, plus references and appendices

    Report number: 2021.emnlp-main.234

    Journal ref: Proceedings EMNLP (2021), 2941--2963

  18. arXiv:2011.07960  [pdf, other

    cs.CL cs.LG

    Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle

    Authors: Yikang Shen, Shawn Tan, Alessandro Sordoni, Siva Reddy, Aaron Courville

    Abstract: Syntax is fundamental to our thinking about language. Failing to capture the structure of input language could lead to generalization problems and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with an incremental parser and maintains the conditional probability setting of a standard… ▽ More

    Submitted 10 May, 2021; v1 submitted 21 October, 2020; originally announced November 2020.

    Comments: 12 pages, 10 figures

    Journal ref: NAACL 2021

  19. arXiv:2010.04704  [pdf, other

    cs.CL cs.LG

    Recursive Top-Down Production for Sentence Generation with Latent Trees

    Authors: Shawn Tan, Yikang Shen, Timothy J. O'Donnell, Alessandro Sordoni, Aaron Courville

    Abstract: We model the recursive production property of context-free grammars for natural and synthetic languages. To this end, we present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves, allowing us to compute the likelihood of a sequence of $N$ tokens under a latent tree model, which we maximise to train a recursive neural function. We demonstrate perfo… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

  20. arXiv:2005.00770  [pdf, other

    cs.CL

    Exploring and Predicting Transferability across NLP Tasks

    Authors: Tu Vu, Tong Wang, Tsendsuren Munkhdalai, Alessandro Sordoni, Adam Trischler, Andrew Mattarella-Micke, Subhransu Maji, Mohit Iyyer

    Abstract: Recent advances in NLP demonstrate the effectiveness of training large-scale language models and transferring them to downstream tasks. Can fine-tuning these models on tasks other than language modeling further improve performance? In this paper, we conduct an extensive study of the transferability between 33 NLP tasks across three broad classes of problems (text classification, question answering… ▽ More

    Submitted 6 October, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: Accepted as a conference paper at EMNLP 2020, 45 pages, 3 figures, 34 tables

  21. arXiv:2003.01680  [pdf, other

    cs.CL

    Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation

    Authors: Igor Shalyminov, Alessandro Sordoni, Adam Atkinson, Hannes Schulz

    Abstract: Domain adaptation has recently become a key problem in dialogue systems research. Deep learning, while being the preferred technique for modeling such systems, works best given massive training data. However, in the real-world scenario, such resources aren't available for every new domain, so the ability to train with a few dialogue examples can be considered essential. Pre-training on large data… ▽ More

    Submitted 6 March, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: Presented at DSTC8@AAAI 2020

    ACM Class: I.2.7

  22. arXiv:1912.09910  [pdf, other

    cs.IR

    Report on the First HIPstIR Workshop on the Future of Information Retrieval

    Authors: Laura Dietz, Bhaskar Mitra, Jeremy Pickens, Hana Anber, Sandeep Avula, Asia Biega, Adrian Boteanu, Shubham Chatterjee, Jeff Dalton, Shiri Dori-Hacohen, John Foley, Henry Feild, Ben Gamari, Rosie Jones, Pallika Kanani, Sumanta Kashyapi, Widad Machmouchi, Matthew Mitsui, Steve Nole, Alexandre Tachard Passos, Jordan Ramsdell, Adam Roegiest, David Smith, Alessandro Sordoni

    Abstract: The vision of HIPstIR is that early stage information retrieval (IR) researchers get together to develop a future for non-mainstream ideas and research agendas in IR. The first iteration of this vision materialized in the form of a three day workshop in Portsmouth, New Hampshire attended by 24 researchers across academia and industry. Attendees pre-submitted one or more topics that they want to pi… ▽ More

    Submitted 20 December, 2019; originally announced December 2019.

  23. arXiv:1911.03861  [pdf, other

    cs.CL cs.LG

    Increasing Robustness to Spurious Correlations using Forgettable Examples

    Authors: Yadollah Yaghoobzadeh, Soroush Mehri, Remi Tachet, T. J. Hazen, Alessandro Sordoni

    Abstract: Neural NLP models tend to rely on spurious correlations between labels and input features to perform their tasks. Minority examples, i.e., examples that contradict the spurious correlations present in the majority of data points, have been shown to increase the out-of-distribution generalization of pre-trained language models. In this paper, we first propose using example forgetting to find minori… ▽ More

    Submitted 1 February, 2021; v1 submitted 10 November, 2019; originally announced November 2019.

    Comments: 14 pages, Accepted at EACL2021

  24. arXiv:1910.13466  [pdf, other

    cs.LG cs.CL

    Ordered Memory

    Authors: Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron Courville

    Abstract: Stack-augmented recurrent neural networks (RNNs) have been of interest to the deep learning community for some time. However, the difficulty of training memory models remains a problem obstructing the widespread use of such models. In this paper, we propose the Ordered Memory architecture. Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cum… ▽ More

    Submitted 3 November, 2019; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: Published in NeurIPS 2019

  25. arXiv:1907.09720  [pdf, other

    cs.NE cs.LG stat.ML

    Metalearned Neural Memory

    Authors: Tsendsuren Munkhdalai, Alessandro Sordoni, Tong Wang, Adam Trischler

    Abstract: We augment recurrent neural networks with an external memory mechanism that builds upon recent progress in metalearning. We conceptualize this memory as a rapidly adaptable function that we parameterize as a deep neural network. Reading from the neural memory function amounts to pushing an input (the key vector) through the function to produce an output (the value vector). Writing to memory means… ▽ More

    Submitted 3 December, 2019; v1 submitted 23 July, 2019; originally announced July 2019.

    Comments: NeurIPS 2019

  26. arXiv:1812.05159  [pdf, other

    cs.LG stat.ML

    An Empirical Study of Example Forgetting during Deep Neural Network Learning

    Authors: Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, Geoffrey J. Gordon

    Abstract: Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks. Our goal is to understand whether a related phenomenon occurs when data does not undergo a clear distributional shift. We define a `forgetting event' to have occurred when an individual training example transitions from being classified correc… ▽ More

    Submitted 15 November, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

    Comments: ICLR 2019

  27. arXiv:1810.09536  [pdf, other

    cs.CL cs.LG

    Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

    Authors: Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville

    Abstract: Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hi… ▽ More

    Submitted 8 May, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: Published as a conference paper at ICLR 2019

  28. arXiv:1807.04106  [pdf, other

    cs.LG stat.ML

    VFunc: a Deep Generative Model for Functions

    Authors: Philip Bachman, Riashat Islam, Alessandro Sordoni, Zafarali Ahmed

    Abstract: We introduce a deep generative model for functions. Our model provides a joint distribution p(f, z) over functions f and latent variables z which lets us efficiently sample from the marginal p(f) and maximize a variational lower bound on the entropy H(f). We can thus maximize objectives of the form E_{f~p(f)}[R(f)] + c*H(f), where R(f) denotes, e.g., a data log-likelihood term or an expected rewar… ▽ More

    Submitted 11 July, 2018; originally announced July 2018.

    Comments: To be presented at the ICML 2018 workshop on Prediction and Generative Modeling in Reinforcement Learning

  29. arXiv:1806.11525  [pdf, other

    cs.CL cs.LG

    Counting to Explore and Generalize in Text-based Games

    Authors: Xingdi Yuan, Marc-Alexandre Côté, Alessandro Sordoni, Romain Laroche, Remi Tachet des Combes, Matthew Hausknecht, Adam Trischler

    Abstract: We propose a recurrent RL agent with an episodic exploration mechanism that helps discovering good policies in text-based game environments. We show promising results on a set of generated text-based games of varying difficulty where the goal is to collect a coin located at the end of a chain of rooms. In contrast to previous text-based RL approaches, we observe that our agent learns policies that… ▽ More

    Submitted 6 March, 2019; v1 submitted 29 June, 2018; originally announced June 2018.

  30. arXiv:1806.04342  [pdf, other

    stat.ML cs.LG

    Focused Hierarchical RNNs for Conditional Sequence Processing

    Authors: Nan Rosemary Ke, Konrad Zolna, Alessandro Sordoni, Zhouhan Lin, Adam Trischler, Yoshua Bengio, Joelle Pineau, Laurent Charlin, Chris Pal

    Abstract: Recurrent Neural Networks (RNNs) with attention mechanisms have obtained state-of-the-art results for many sequence processing tasks. Most of these models use a simple form of encoder with attention that looks over the entire sequence and assigns a weight to each token independently. We present a mechanism for focusing RNN encoders for sequence modelling tasks which allows them to attend to key pa… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

    Comments: To appear at ICML 2018

  31. arXiv:1806.04168  [pdf, other

    cs.CL cs.AI cs.LG

    Straight to the Tree: Constituency Parsing with Neural Syntactic Distance

    Authors: Yikang Shen, Zhouhan Lin, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio

    Abstract: In this work, we propose a novel constituency parsing scheme. The model predicts a vector of real-valued scalars, named syntactic distances, for each split position in the input sentence. The syntactic distances specify the order in which the split points will be selected, recursively partitioning the input, in a top-down fashion. Compared to traditional shift-reduce parsing schemes, our approach… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: Published at ACL2018

  32. arXiv:1802.10151  [pdf, other

    cs.LG

    Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data

    Authors: Amjad Almahairi, Sai Rajeswar, Alessandro Sordoni, Philip Bachman, Aaron Courville

    Abstract: Learning inter-domain mappings from unpaired data can improve performance in structured prediction tasks, such as image segmentation, by reducing the need for paired data. CycleGAN was recently proposed for this problem, but critically assumes the underlying inter-domain mapping is approximately deterministic and one-to-one. This assumption renders the model ineffective for tasks requiring flexibl… ▽ More

    Submitted 18 June, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

    Comments: ICML 2018

  33. arXiv:1711.05411  [pdf, other

    stat.ML cs.LG

    Z-Forcing: Training Stochastic Recurrent Networks

    Authors: Anirudh Goyal, Alessandro Sordoni, Marc-Alexandre Côté, Nan Rosemary Ke, Yoshua Bengio

    Abstract: Many efforts have been devoted to training generative latent variable models with autoregressive decoders, such as recurrent neural networks (RNN). Stochastic recurrent models have been successful in capturing the variability observed in natural sequential data such as speech. We unify successful ideas from recently proposed architectures into a stochastic recurrent model: each step in the sequenc… ▽ More

    Submitted 16 November, 2017; v1 submitted 15 November, 2017; originally announced November 2017.

    Comments: To appear in NIPS'17

  34. arXiv:1708.06742  [pdf, other

    cs.LG stat.ML

    Twin Networks: Matching the Future for Sequence Generation

    Authors: Dmitriy Serdyuk, Nan Rosemary Ke, Alessandro Sordoni, Adam Trischler, Chris Pal, Yoshua Bengio

    Abstract: We propose a simple technique for encouraging generative RNNs to plan ahead. We train a "backward" recurrent network to generate a given sequence in reverse order, and we encourage states of the forward model to predict cotemporal states of the backward model. The backward network is used only during training, and plays no role during sampling or inference. We hypothesize that our approach eases m… ▽ More

    Submitted 23 February, 2018; v1 submitted 22 August, 2017; originally announced August 2017.

    Comments: 12 pages, 3 figures, published at ICLR 2018

  35. arXiv:1708.00088  [pdf, other

    cs.LG

    Learning Algorithms for Active Learning

    Authors: Philip Bachman, Alessandro Sordoni, Adam Trischler

    Abstract: We introduce a model that learns active learning algorithms via metalearning. For a distribution of related tasks, our model jointly learns: a data representation, an item selection heuristic, and a method for constructing prediction functions from labeled training sets. Our model uses the item selection heuristic to gather labeled training sets from which to construct prediction functions. Using… ▽ More

    Submitted 31 July, 2017; originally announced August 2017.

    Comments: Accepted for publication at ICML 2017

  36. arXiv:1705.02012  [pdf, ps, other

    cs.CL

    Machine Comprehension by Text-to-Text Neural Question Generation

    Authors: Xingdi Yuan, Tong Wang, Caglar Gulcehre, Alessandro Sordoni, Philip Bachman, Sandeep Subramanian, Saizheng Zhang, Adam Trischler

    Abstract: We propose a recurrent neural model that generates natural-language questions from documents, conditioned on answers. We show how to train the model using a combination of supervised and reinforcement learning. After teacher forcing for standard maximum likelihood training, we fine-tune the model using policy gradient techniques to maximize several rewards that measure question quality. Most notab… ▽ More

    Submitted 15 May, 2017; v1 submitted 4 May, 2017; originally announced May 2017.

  37. arXiv:1612.02605  [pdf, other

    cs.LG

    Towards Information-Seeking Agents

    Authors: Philip Bachman, Alessandro Sordoni, Adam Trischler

    Abstract: We develop a general problem setting for training and testing the ability of agents to gather information efficiently. Specifically, we present a collection of tasks in which success requires searching through a partially-observed environment, for fragments of information which can be pieced together to accomplish various goals. We combine deep architectures with techniques from reinforcement lear… ▽ More

    Submitted 8 December, 2016; originally announced December 2016.

    Comments: Under review for ICLR 2017

  38. arXiv:1611.09830  [pdf, other

    cs.CL cs.AI

    NewsQA: A Machine Comprehension Dataset

    Authors: Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, Kaheer Suleman

    Abstract: We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reas… ▽ More

    Submitted 7 February, 2017; v1 submitted 29 November, 2016; originally announced November 2016.

  39. arXiv:1606.02245  [pdf, other

    cs.CL cs.NE

    Iterative Alternating Neural Attention for Machine Reading

    Authors: Alessandro Sordoni, Philip Bachman, Adam Trischler, Yoshua Bengio

    Abstract: We propose a novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document. Unlike previous models, we do not collapse the query into a single vector, instead we deploy an iterative alternating attention mechanism that allows a fine-grained exploration of both the query and the document. Our model outperforms state-of-th… ▽ More

    Submitted 9 November, 2016; v1 submitted 7 June, 2016; originally announced June 2016.

  40. arXiv:1605.06069  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

    Authors: Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio

    Abstract: Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue. In an effort to model this kind of generative process, we propose a neural network-based generative architecture, with latent stochastic variables that span a variable number of time steps. We apply the proposed model to the task of dialogue r… ▽ More

    Submitted 13 June, 2016; v1 submitted 19 May, 2016; originally announced May 2016.

    Comments: 15 pages, 5 tables, 4 figures

    ACM Class: I.5.1; I.2.7

  41. arXiv:1507.04808  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

    Authors: Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau

    Abstract: We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models. Generative models produce system responses that are autonomously generated word-by-word, opening up the possibility for realistic, flexible interactions. In support of this goal, we extend the recently proposed hierarchical recurrent encoder-decoder neural netwo… ▽ More

    Submitted 6 April, 2016; v1 submitted 16 July, 2015; originally announced July 2015.

    Comments: 8 pages with references; Published in AAAI 2016 (Special Track on Cognitive Systems)

    ACM Class: I.5.1; I.2.7

  42. arXiv:1507.02221  [pdf, other

    cs.NE cs.IR

    A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion

    Authors: Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob G. Simonsen, Jian-Yun Nie

    Abstract: Users may strive to formulate an adequate textual query for their information need. Search engines assist the users by presenting query suggestions. To preserve the original search intent, suggestions should be context-aware and account for the previous queries issued by the user. Achieving context awareness is challenging due to data sparsity. We present a probabilistic suggestion model that is a… ▽ More

    Submitted 8 July, 2015; originally announced July 2015.

    Comments: To appear in Conference of Information Knowledge and Management (CIKM) 2015

  43. arXiv:1506.06863  [pdf, other

    cs.CL

    deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets

    Authors: Michel Galley, Chris Brockett, Alessandro Sordoni, Yangfeng Ji, Michael Auli, Chris Quirk, Margaret Mitchell, Jianfeng Gao, Bill Dolan

    Abstract: We introduce Discriminative BLEU (deltaBLEU), a novel metric for intrinsic evaluation of generated text in tasks that admit a diverse range of possible outputs. Reference strings are scored for quality by human raters on a scale of [-1, +1] to weight multi-reference BLEU. In tasks involving generation of conversational responses, deltaBLEU correlates reasonably with human judgments and outperforms… ▽ More

    Submitted 23 June, 2015; v1 submitted 23 June, 2015; originally announced June 2015.

    Comments: 6 pages, to appear at ACL 2015

  44. arXiv:1506.06714  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    A Neural Network Approach to Context-Sensitive Generation of Conversational Responses

    Authors: Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, Bill Dolan

    Abstract: We present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations. A neural network architecture is used to address sparsity issues that arise when integrating contextual information into classic statistical models, allowing the system to take into account previous dialog utterances. Our dynamic-context generative models show con… ▽ More

    Submitted 22 June, 2015; originally announced June 2015.

    Comments: A. Sordoni, M. Galley, M. Auli, C. Brockett, Y. Ji, M. Mitchell, J.-Y. Nie, J. Gao, B. Dolan. 2015. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses. In Proc. of NAACL-HLT. Pages 196-205

  45. arXiv:1401.1732  [pdf, ps, other

    cs.IR

    Looking at Vector Space and Language Models for IR using Density Matrices

    Authors: Alessandro Sordoni, Jian-Yun Nie

    Abstract: In this work, we conduct a joint analysis of both Vector Space and Language Models for IR using the mathematical framework of Quantum Theory. We shed light on how both models allocate the space of density matrices. A density matrix is shown to be a general representational tool capable of leveraging capabilities of both VSM and LM representations thus paving the way for a new generation of retriev… ▽ More

    Submitted 8 January, 2014; originally announced January 2014.

    Comments: In Proceedings of Quantum Interaction 2013