Skip to main content

Showing 1–26 of 26 results for author: Shieber, S

  1. arXiv:2405.14838  [pdf, other

    cs.CL cs.AI cs.LG

    From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step

    Authors: Yuntian Deng, Yejin Choi, Stuart Shieber

    Abstract: When leveraging language models for reasoning tasks, generating explicit chain-of-thought (CoT) steps often proves essential for achieving high accuracy in final outputs. In this paper, we investigate if models can be taught to internalize these CoT steps. To this end, we propose a simple yet effective method for internalizing CoT steps: starting with a model trained for explicit CoT reasoning, we… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  2. arXiv:2311.01460  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Implicit Chain of Thought Reasoning via Knowledge Distillation

    Authors: Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, Stuart Shieber

    Abstract: To augment language models with the ability to reason, researchers usually prompt or finetune them to produce chain of thought reasoning steps before producing the final answer. However, although people use natural language to reason effectively, it may be that LMs could reason more effectively with some intermediate computation that is not in natural language. In this work, we explore an alternat… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  3. arXiv:2304.14395  [pdf, other

    cs.CL cs.DL

    string2string: A Modern Python Library for String-to-String Algorithms

    Authors: Mirac Suzgun, Stuart M. Shieber, Dan Jurafsky

    Abstract: We introduce string2string, an open-source library that offers a comprehensive suite of efficient algorithms for a broad range of string-to-string problems. It includes traditional algorithmic solutions as well as recent advanced neural approaches to tackle various problems in string alignment, distance measurement, lexical and semantic search, and similarity analysis -- along with several helpful… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: GitHub: https://github.com/stanfordnlp/string2string; Documentation: http://string2string.readthedocs.io/

  4. arXiv:2207.04043  [pdf, other

    cs.CL cs.CY cs.LG

    The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and Multi-Purpose Corpus of Patent Applications

    Authors: Mirac Suzgun, Luke Melas-Kyriazi, Suproteem K. Sarkar, Scott Duke Kominers, Stuart M. Shieber

    Abstract: Innovation is a major driver of economic and social development, and information about many kinds of innovation is embedded in semi-structured data from patents and patent applications. Although the impact and novelty of innovations expressed in patent data are difficult to measure through traditional means, ML offers a promising set of techniques for evaluating novelty, summarizing contributions,… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: Website: https://patentdataset.org/, GitHub Repository: https://github.com/suzgunmirac/hupd, Hugging Face Datasets: https://huggingface.co/datasets/HUPD/hupd

  5. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  6. arXiv:2106.06087  [pdf, other

    cs.CL

    Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models

    Authors: Matthew Finlayson, Aaron Mueller, Sebastian Gehrmann, Stuart Shieber, Tal Linzen, Yonatan Belinkov

    Abstract: Targeted syntactic evaluations have demonstrated the ability of language models to perform subject-verb agreement given difficult contexts. To elucidate the mechanisms by which the models accomplish this behavior, this study applies causal mediation analysis to pre-trained neural language models. We investigate the magnitude of models' preferences for grammatical inflections, as well as whether ne… ▽ More

    Submitted 22 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted to ACL-IJCNLP 2021

    MSC Class: 68T50 ACM Class: I.2.7

  7. arXiv:2006.08331  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Probing Neural Dialog Models for Conversational Understanding

    Authors: Abdelrhman Saleh, Tovly Deutsch, Stephen Casper, Yonatan Belinkov, Stuart Shieber

    Abstract: The predominant approach to open-domain dialog generation relies on end-to-end training of neural models on chat datasets. However, this approach provides little insight as to what these models learn (or do not learn) about engaging in dialog. In this study, we analyze the internal representations learned by neural open-domain dialog systems and evaluate the quality of these representations for le… ▽ More

    Submitted 7 June, 2020; originally announced June 2020.

  8. Linguistic Features for Readability Assessment

    Authors: Tovly Deutsch, Masoud Jasbi, Stuart Shieber

    Abstract: Readability assessment aims to automatically classify text by the level appropriate for learning readers. Traditional approaches to this task utilize a variety of linguistically motivated features paired with simple machine learning models. More recent methods have improved performance by discarding these features and utilizing deep learning models. However, it is unknown whether augmenting deep l… ▽ More

    Submitted 30 May, 2020; originally announced June 2020.

    Comments: To be published in ACL BEA workshop (15th Workshop on Innovative Use of NLP for Building Educational Applications)

  9. arXiv:2004.12265  [pdf, other

    cs.CL

    Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias

    Authors: Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Simas Sakenis, Jason Huang, Yaron Singer, Stuart Shieber

    Abstract: Common methods for interpreting neural models in natural language processing typically examine either their structure or their behavior, but not both. We propose a methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. It enables us to analyze the mechanisms by which information flows from input to output thr… ▽ More

    Submitted 22 November, 2020; v1 submitted 25 April, 2020; originally announced April 2020.

    Comments: Expanded version

    MSC Class: 68T50 ACM Class: I.2.7

  10. arXiv:1911.03329  [pdf, other

    cs.CL cs.LG cs.NE

    Memory-Augmented Recurrent Neural Networks Can Learn Generalized Dyck Languages

    Authors: Mirac Suzgun, Sebastian Gehrmann, Yonatan Belinkov, Stuart M. Shieber

    Abstract: We introduce three memory-augmented Recurrent Neural Networks (MARNNs) and explore their capabilities on a series of simple language modeling tasks whose solutions require stack-based mechanisms. We provide the first demonstration of neural networks recognizing the generalized Dyck languages, which express the core of what it means to be a language with hierarchical structure. Our memory-augmented… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

  11. arXiv:1907.04389  [pdf, other

    cs.CL

    On Adversarial Removal of Hypothesis-only Bias in Natural Language Inference

    Authors: Yonatan Belinkov, Adam Poliak, Stuart M. Shieber, Benjamin Van Durme, Alexander M. Rush

    Abstract: Popular Natural Language Inference (NLI) datasets have been shown to be tainted by hypothesis-only biases. Adversarial learning may help models ignore sensitive biases and spurious correlations in data. We evaluate whether adversarial learning can be used in NLI to encourage models to learn representations free of hypothesis-only biases. Our analyses indicate that the representations learned via a… ▽ More

    Submitted 9 July, 2019; originally announced July 2019.

    Comments: StarSem 2019 - The Eighth Joint Conference on Lexical and Computational Semantics

  12. arXiv:1907.04380  [pdf, other

    cs.CL

    Don't Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference

    Authors: Yonatan Belinkov, Adam Poliak, Stuart M. Shieber, Benjamin Van Durme, Alexander M. Rush

    Abstract: Natural Language Inference (NLI) datasets often contain hypothesis-only biases---artifacts that allow models to achieve non-trivial performance without learning whether a premise entails a hypothesis. We propose two probabilistic methods to build models that are more robust to such biases and better transfer across datasets. In contrast to standard approaches to NLI, our methods predict the probab… ▽ More

    Submitted 9 July, 2019; originally announced July 2019.

    Comments: ACL 2019

  13. arXiv:1906.03648  [pdf, other

    cs.CL cs.FL cs.LG

    LSTM Networks Can Perform Dynamic Counting

    Authors: Mirac Suzgun, Sebastian Gehrmann, Yonatan Belinkov, Stuart M. Shieber

    Abstract: In this paper, we systematically assess the ability of standard recurrent networks to perform dynamic counting and to encode hierarchical representations. All the neural models in our experiments are designed to be small-sized networks both to prevent them from memorizing the training sets and to visualize and interpret their behaviour at test time. Our results demonstrate that the Long Short-Term… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

    Comments: ACL 2019 Workshop on Deep Learning and Formal Languages

    ACM Class: F.4.3; I.2.6; I.2.7

  14. arXiv:1811.01001  [pdf, other

    cs.CL cs.AI cs.LG

    On Evaluating the Generalization of LSTM Models in Formal Languages

    Authors: Mirac Suzgun, Yonatan Belinkov, Stuart M. Shieber

    Abstract: Recurrent Neural Networks (RNNs) are theoretically Turing-complete and established themselves as a dominant model for language processing. Yet, there still remains an uncertainty regarding their language learning capabilities. In this paper, we empirically evaluate the inductive learning capabilities of Long Short-Term Memory networks, a popular extension of simple RNNs, to learn simple formal lan… ▽ More

    Submitted 2 November, 2018; originally announced November 2018.

    Comments: Proceedings of the Society for Computation in Linguistics (SCiL) 2019

    ACM Class: I.2.7; I.2.6; F.4.3

  15. arXiv:1808.10122  [pdf, other

    cs.CL cs.LG

    Learning Neural Templates for Text Generation

    Authors: Sam Wiseman, Stuart M. Shieber, Alexander M. Rush

    Abstract: While neural, encoder-decoder models have had significant empirical success in text generation, there remain several unaddressed problems with this style of generation. Encoder-decoder models are largely (a) uninterpretable, and (b) difficult to control in terms of their phrasing or content. This work proposes a neural generation system using a hidden semi-markov model (HSMM) decoder, which learns… ▽ More

    Submitted 17 June, 2019; v1 submitted 30 August, 2018; originally announced August 2018.

    Comments: EMNLP 2018; purity calculations updated

  16. arXiv:1707.09067  [pdf, other

    cs.CL

    Adapting Sequence Models for Sentence Correction

    Authors: Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber

    Abstract: In a controlled experiment of sequence-to-sequence approaches for the task of sentence correction, we find that character-based models are generally more effective than word-based models and models that encode subword information via convolutions, and that modeling the output data as a series of diffs improves effectiveness over standard approaches. Our strongest sequence-to-sequence model improve… ▽ More

    Submitted 27 July, 2017; originally announced July 2017.

    Comments: EMNLP 2017

  17. arXiv:1707.08052  [pdf, ps, other

    cs.CL

    Challenges in Data-to-Document Generation

    Authors: Sam Wiseman, Stuart M. Shieber, Alexander M. Rush

    Abstract: Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive d… ▽ More

    Submitted 25 July, 2017; originally announced July 2017.

    Comments: EMNLP 2017

  18. arXiv:1604.08633  [pdf, ps, other

    cs.CL

    Word Ordering Without Syntax

    Authors: Allen Schmaltz, Alexander M. Rush, Stuart M. Shieber

    Abstract: Recent work on word ordering has argued that syntactic structure is important, or even required, for effectively recovering the order of a sentence. We find that, in fact, an n-gram language model with a simple heuristic gives strong results on this task. Furthermore, we show that a long short-term memory (LSTM) language model is even more effective at recovering order, with our basic model outper… ▽ More

    Submitted 23 September, 2016; v1 submitted 28 April, 2016; originally announced April 2016.

    Comments: EMNLP 2016

  19. arXiv:1604.04677  [pdf, other

    cs.CL

    Sentence-Level Grammatical Error Identification as Sequence-to-Sequence Correction

    Authors: Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber

    Abstract: We demonstrate that an attention-based encoder-decoder model can be used for sentence-level grammatical error identification for the Automated Evaluation of Scientific Writing (AESW) Shared Task 2016. The attention-based encoder-decoder models can be used for the generation of corrections, in addition to error identification, which is of interest for certain end-user applications. We show that a c… ▽ More

    Submitted 15 April, 2016; originally announced April 2016.

    Comments: To appear at BEA11, as part of the AESW 2016 Shared Task

  20. arXiv:1604.03035  [pdf, other

    cs.CL

    Learning Global Features for Coreference Resolution

    Authors: Sam Wiseman, Alexander M. Rush, Stuart M. Shieber

    Abstract: There is compelling evidence that coreference prediction would benefit from modeling global information about entity-clusters. Yet, state-of-the-art performance can be achieved with systems treating each mention prediction independently, which we attribute to the inherent difficulty of crafting informative cluster-level features. We instead propose to use recurrent neural networks (RNNs) to learn… ▽ More

    Submitted 11 April, 2016; originally announced April 2016.

    Comments: Accepted to NAACL 2016

  21. Recognizing Uncertainty in Speech

    Authors: Heather Pon-Barry, Stuart M. Shieber

    Abstract: We address the problem of inferring a speaker's level of certainty based on prosodic information in the speech signal, which has application in speech-based dialogue systems. We show that using phrase-level prosodic features centered around the phrases causing uncertainty, in addition to utterance-level prosodic features, improves our model's level of certainty classification. In addition, our mod… ▽ More

    Submitted 9 March, 2011; originally announced March 2011.

    Comments: 11 pages

    Journal ref: EURASIP Journal on Advances in Signal Processing, Volume 2011, Article ID 251753, 11 pages

  22. Ellipsis and Higher-Order Unification

    Authors: Mary Dalrymple, Stuart M. Shieber, Fernando C. N. Pereira

    Abstract: We present a new method for characterizing the interpretive possibilities generated by elliptical constructions in natural language. Unlike previous analyses, which postulate ambiguity of interpretation or derivation in the full clause source of the ellipsis, our analysis requires no such hidden ambiguity. Further, the analysis follows relatively directly from an abstract statement of the ellipsis… ▽ More

    Submitted 8 March, 1995; originally announced March 1995.

    Comments: 54 pages

    Report number: CSLI-19-91 and Xerox SSL-91-105

    Journal ref: Linguistics and Philosophy 14(4):399-452

  23. Principles and Implementation of Deductive Parsing

    Authors: Stuart M. Shieber, Yves Schabes, Fernando C. N. Pereira

    Abstract: We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definite-clause grammars a… ▽ More

    Submitted 26 April, 1994; originally announced April 1994.

    Comments: 69 pages, includes full Prolog code

    Report number: CRCT TR-11-94 (Computer Science Department, Harvard University)

  24. Restricting the Weak-Generative Capacity of Synchronous Tree-Adjoining Grammars

    Authors: Stuart M. Shieber

    Abstract: The formalism of synchronous tree-adjoining grammars, a variant of standard tree-adjoining grammars (TAG), was intended to allow the use of TAGs for language transduction in addition to language specification. In previous work, the definition of the transduction relation defined by a synchronous TAG was given by appeal to an iterative rewriting process. The rewriting definition of derivation is… ▽ More

    Submitted 30 August, 1994; v1 submitted 3 April, 1994; originally announced April 1994.

    Comments: 21 pages, uses lingmacros.sty, psfig.sty, fullname.sty; minor typographical changes only

    Journal ref: Computational Intelligence 10(4):371-385, November 1994

  25. Lessons from a Restricted Turing Test

    Authors: Stuart M. Shieber

    Abstract: We report on the recent Loebner prize competition inspired by Turing's test of intelligent behavior. The presentation covers the structure of the competition and the outcome of its first instantiation in an actual event, and an analysis of the purpose, design, and appropriateness of such a competition. We argue that the competition has no clear purpose, that its design prevents any useful outcom… ▽ More

    Submitted 3 April, 1994; originally announced April 1994.

    Comments: 20 pages

    Report number: CRCT TR-19-92

  26. An Alternative Conception of Tree-Adjoining Derivation

    Authors: Yves Schabes, Stuart M. Shieber

    Abstract: The precise formulation of derivation for tree-adjoining grammars has important ramifications for a wide variety of uses of the formalism, from syntactic analysis to semantic interpretation and statistical language modeling. We argue that the definition of tree-adjoining derivation must be reformulated in order to manifest the proper linguistic dependencies in derivations. The particular proposa… ▽ More

    Submitted 3 April, 1994; originally announced April 1994.

    Comments: 33 pages

    Report number: CRCT TR-08-92

    Journal ref: Computational Linguistics 20(1):91-124