-
Persuasiveness of Generated Free-Text Rationales in Subjective Decisions: A Case Study on Pairwise Argument Ranking
Authors:
Mohamed Elaraby,
Diane Litman,
Xiang Lorraine Li,
Ahmed Magooda
Abstract:
Generating free-text rationales is among the emergent capabilities of Large Language Models (LLMs). These rationales have been found to enhance LLM performance across various NLP tasks. Recently, there has been growing interest in using these rationales to provide insights for various important downstream tasks. In this paper, we analyze generated free-text rationales in tasks with subjective answ…
▽ More
Generating free-text rationales is among the emergent capabilities of Large Language Models (LLMs). These rationales have been found to enhance LLM performance across various NLP tasks. Recently, there has been growing interest in using these rationales to provide insights for various important downstream tasks. In this paper, we analyze generated free-text rationales in tasks with subjective answers, emphasizing the importance of rationalization in such scenarios. We focus on pairwise argument ranking, a highly subjective task with significant potential for real-world applications, such as debate assistance. We evaluate the persuasiveness of rationales generated by nine LLMs to support their subjective choices. Our findings suggest that open-source LLMs, particularly Llama2-70B-chat, are capable of providing highly persuasive rationalizations, surpassing even GPT models. Additionally, our experiments show that rationale persuasiveness can be improved by controlling its parameters through prompting or through self-refinement.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Analyzing Large Language Models for Classroom Discussion Assessment
Authors:
Nhat Tran,
Benjamin Pierce,
Diane Litman,
Richard Correnti,
Lindsay Clare Matsumura
Abstract:
Automatically assessing classroom discussion quality is becoming increasingly feasible with the help of new NLP advancements such as large language models (LLMs). In this work, we examine how the assessment performance of 2 LLMs interacts with 3 factors that may affect performance: task formulation, context length, and few-shot examples. We also explore the computational efficiency and predictive…
▽ More
Automatically assessing classroom discussion quality is becoming increasingly feasible with the help of new NLP advancements such as large language models (LLMs). In this work, we examine how the assessment performance of 2 LLMs interacts with 3 factors that may affect performance: task formulation, context length, and few-shot examples. We also explore the computational efficiency and predictive consistency of the 2 LLMs. Our results suggest that the 3 aforementioned factors do affect the performance of the tested LLMs and there is a relation between consistency and performance. We recommend a LLM-based assessment approach that has a good balance in terms of predictive performance, computational efficiency, and consistency.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
What metrics of participation balance predict outcomes of collaborative learning with a robot?
Authors:
Yuya Asano,
Diane Litman,
Quentin King-Shepard,
Tristan Maidment,
Tyree Langley,
Teresa Davison,
Timothy Nokes-Malach,
Adriana Kovashka,
Erin Walker
Abstract:
One of the keys to the success of collaborative learning is balanced participation by all learners, but this does not always happen naturally. Pedagogical robots have the potential to facilitate balance. However, it remains unclear what participation balance robots should aim at; various metrics have been proposed, but it is still an open question whether we should balance human participation in h…
▽ More
One of the keys to the success of collaborative learning is balanced participation by all learners, but this does not always happen naturally. Pedagogical robots have the potential to facilitate balance. However, it remains unclear what participation balance robots should aim at; various metrics have been proposed, but it is still an open question whether we should balance human participation in human-human interactions (HHI) or human-robot interactions (HRI) and whether we should consider robots' participation in collaborative learning involving multiple humans and a robot. This paper examines collaborative learning between a pair of students and a teachable robot that acts as a peer tutee to answer the aforementioned question. Through an exploratory study, we hypothesize which balance metrics in the literature and which portions of dialogues (including vs. excluding robots' participation and human participation in HHI vs. HRI) will better predict learning as a group. We test the hypotheses with another study and replicate them with automatically obtained units of participation to simulate the information available to robots when they adaptively fix imbalances in real-time. Finally, we discuss recommendations on which metrics learning science researchers should choose when trying to understand how to facilitate collaboration.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Enhancing Knowledge Retrieval with Topic Modeling for Knowledge-Grounded Dialogue
Authors:
Nhat Tran,
Diane Litman
Abstract:
Knowledge retrieval is one of the major challenges in building a knowledge-grounded dialogue system. A common method is to use a neural retriever with a distributed approximate nearest-neighbor database to quickly find the relevant knowledge sentences. In this work, we propose an approach that utilizes topic modeling on the knowledge base to further improve retrieval accuracy and as a result, impr…
▽ More
Knowledge retrieval is one of the major challenges in building a knowledge-grounded dialogue system. A common method is to use a neural retriever with a distributed approximate nearest-neighbor database to quickly find the relevant knowledge sentences. In this work, we propose an approach that utilizes topic modeling on the knowledge base to further improve retrieval accuracy and as a result, improve response generation. Additionally, we experiment with a large language model, ChatGPT, to take advantage of the improved retrieval performance to further improve the generation results. Experimental results on two datasets show that our approach can increase retrieval and generation performance. The results also indicate that ChatGPT is a better response generator for knowledge-grounded dialogue when relevant knowledge is provided.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community
Authors:
Casey Kennington,
Malihe Alikhani,
Heather Pon-Barry,
Katherine Atwell,
Yonatan Bisk,
Daniel Fried,
Felix Gervits,
Zhao Han,
Mert Inan,
Michael Johnston,
Raj Korpan,
Diane Litman,
Matthew Marge,
Cynthia Matuszek,
Ross Mead,
Shiwali Mohan,
Raymond Mooney,
Natalie Parde,
Jivko Sinapov,
Angela Stewart,
Matthew Stone,
Stefanie Tellex,
Tom Williams
Abstract:
The ability to interact with machines using natural human language is becoming not just commonplace, but expected. The next step is not just text interfaces, but speech interfaces and not just with computers, but with all machines including robots. In this paper, we chronicle the recent history of this growing field of spoken dialogue with robots and offer the community three proposals, the first…
▽ More
The ability to interact with machines using natural human language is becoming not just commonplace, but expected. The next step is not just text interfaces, but speech interfaces and not just with computers, but with all machines including robots. In this paper, we chronicle the recent history of this growing field of spoken dialogue with robots and offer the community three proposals, the first focused on education, the second on benchmarks, and the third on the modeling of language when it comes to spoken interaction with robots. The three proposals should act as white papers for any researcher to take and build upon.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
ReflectSumm: A Benchmark for Course Reflection Summarization
Authors:
Yang Zhong,
Mohamed Elaraby,
Diane Litman,
Ahmed Ashraf Butt,
Muhsin Menekse
Abstract:
This paper introduces ReflectSumm, a novel summarization dataset specifically designed for summarizing students' reflective writing. The goal of ReflectSumm is to facilitate developing and evaluating novel summarization techniques tailored to real-world scenarios with little training data, %practical tasks with potential implications in the opinion summarization domain in general and the education…
▽ More
This paper introduces ReflectSumm, a novel summarization dataset specifically designed for summarizing students' reflective writing. The goal of ReflectSumm is to facilitate developing and evaluating novel summarization techniques tailored to real-world scenarios with little training data, %practical tasks with potential implications in the opinion summarization domain in general and the educational domain in particular. The dataset encompasses a diverse range of summarization tasks and includes comprehensive metadata, enabling the exploration of various research questions and supporting different applications. To showcase its utility, we conducted extensive evaluations using multiple state-of-the-art baselines. The results provide benchmarks for facilitating further research in this area.
△ Less
Submitted 22 April, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Overview of ImageArg-2023: The First Shared Task in Multimodal Argument Mining
Authors:
Zhexiong Liu,
Mohamed Elaraby,
Yang Zhong,
Diane Litman
Abstract:
This paper presents an overview of the ImageArg shared task, the first multimodal Argument Mining shared task co-located with the 10th Workshop on Argument Mining at EMNLP 2023. The shared task comprises two classification subtasks - (1) Subtask-A: Argument Stance Classification; (2) Subtask-B: Image Persuasiveness Classification. The former determines the stance of a tweet containing an image and…
▽ More
This paper presents an overview of the ImageArg shared task, the first multimodal Argument Mining shared task co-located with the 10th Workshop on Argument Mining at EMNLP 2023. The shared task comprises two classification subtasks - (1) Subtask-A: Argument Stance Classification; (2) Subtask-B: Image Persuasiveness Classification. The former determines the stance of a tweet containing an image and a piece of text toward a controversial topic (e.g., gun control and abortion). The latter determines whether the image makes the tweet text more persuasive. The shared task received 31 submissions for Subtask-A and 21 submissions for Subtask-B from 9 different teams across 6 countries. The top submission in Subtask-A achieved an F1-score of 0.8647 while the best submission in Subtask-B achieved an F1-score of 0.5561.
△ Less
Submitted 24 October, 2023; v1 submitted 15 October, 2023;
originally announced October 2023.
-
STRONG -- Structure Controllable Legal Opinion Summary Generation
Authors:
Yang Zhong,
Diane Litman
Abstract:
We propose an approach for the structure controllable summarization of long legal opinions that considers the argument structure of the document. Our approach involves using predicted argument role information to guide the model in generating coherent summaries that follow a provided structure pattern. We demonstrate the effectiveness of our approach on a dataset of legal opinions and show that it…
▽ More
We propose an approach for the structure controllable summarization of long legal opinions that considers the argument structure of the document. Our approach involves using predicted argument role information to guide the model in generating coherent summaries that follow a provided structure pattern. We demonstrate the effectiveness of our approach on a dataset of legal opinions and show that it outperforms several strong baselines with respect to ROUGE, BERTScore, and structure similarity.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
Learning from Auxiliary Sources in Argumentative Revision Classification
Authors:
Tazin Afrin,
Diane Litman
Abstract:
We develop models to classify desirable reasoning revisions in argumentative writing. We explore two approaches -- multi-task learning and transfer learning -- to take advantage of auxiliary sources of revision data for similar tasks. Results of intrinsic and extrinsic evaluations show that both approaches can indeed improve classifier performance over baselines. While multi-task learning shows th…
▽ More
We develop models to classify desirable reasoning revisions in argumentative writing. We explore two approaches -- multi-task learning and transfer learning -- to take advantage of auxiliary sources of revision data for similar tasks. Results of intrinsic and extrinsic evaluations show that both approaches can indeed improve classifier performance over baselines. While multi-task learning shows that training on different sources of data at the same time may improve performance, transfer-learning better represents the relationship between the data.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Utilizing Natural Language Processing for Automated Assessment of Classroom Discussion
Authors:
Nhat Tran,
Benjamin Pierce,
Diane Litman,
Richard Correnti,
Lindsay Clare Matsumura
Abstract:
Rigorous and interactive class discussions that support students to engage in high-level thinking and reasoning are essential to learning and are a central component of most teaching interventions. However, formally assessing discussion quality 'at scale' is expensive and infeasible for most researchers. In this work, we experimented with various modern natural language processing (NLP) techniques…
▽ More
Rigorous and interactive class discussions that support students to engage in high-level thinking and reasoning are essential to learning and are a central component of most teaching interventions. However, formally assessing discussion quality 'at scale' is expensive and infeasible for most researchers. In this work, we experimented with various modern natural language processing (NLP) techniques to automatically generate rubric scores for individual dimensions of classroom text discussion quality. Specifically, we worked on a dataset of 90 classroom discussion transcripts consisting of over 18000 turns annotated with fine-grained Analyzing Teaching Moves (ATM) codes and focused on four Instructional Quality Assessment (IQA) rubrics. Despite the limited amount of data, our work shows encouraging results in some of the rubrics while suggesting that there is room for improvement in the others. We also found that certain NLP approaches work better for certain rubrics.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Impact of Experiencing Misrecognition by Teachable Agents on Learning and Rapport
Authors:
Yuya Asano,
Diane Litman,
Mingzhi Yu,
Nikki Lobczowski,
Timothy Nokes-Malach,
Adriana Kovashka,
Erin Walker
Abstract:
While speech-enabled teachable agents have some advantages over typing-based ones, they are vulnerable to errors stemming from misrecognition by automatic speech recognition (ASR). These errors may propagate, resulting in unexpected changes in the flow of conversation. We analyzed how such changes are linked with learning gains and learners' rapport with the agents. Our results show they are not r…
▽ More
While speech-enabled teachable agents have some advantages over typing-based ones, they are vulnerable to errors stemming from misrecognition by automatic speech recognition (ASR). These errors may propagate, resulting in unexpected changes in the flow of conversation. We analyzed how such changes are linked with learning gains and learners' rapport with the agents. Our results show they are not related to learning gains or rapport, regardless of the types of responses the agents should have returned given the correct input from learners without ASR errors. We also discuss the implications for optimal error-recovery policies for teachable agents that can be drawn from these findings.
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
Towards Argument-Aware Abstractive Summarization of Long Legal Opinions with Summary Reranking
Authors:
Mohamed Elaraby,
Yang Zhong,
Diane Litman
Abstract:
We propose a simple approach for the abstractive summarization of long legal opinions that considers the argument structure of the document. Legal opinions often contain complex and nuanced argumentation, making it challenging to generate a concise summary that accurately captures the main points of the legal opinion. Our approach involves using argument role information to generate multiple candi…
▽ More
We propose a simple approach for the abstractive summarization of long legal opinions that considers the argument structure of the document. Legal opinions often contain complex and nuanced argumentation, making it challenging to generate a concise summary that accurately captures the main points of the legal opinion. Our approach involves using argument role information to generate multiple candidate summaries, then reranking these candidates based on alignment with the document's argument structure. We demonstrate the effectiveness of our approach on a dataset of long legal opinions and show that it outperforms several strong baselines.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Predicting the Quality of Revisions in Argumentative Writing
Authors:
Zhexiong Liu,
Diane Litman,
Elaine Wang,
Lindsay Matsumura,
Richard Correnti
Abstract:
The ability to revise in response to feedback is critical to students' writing success. In the case of argument writing in specific, identifying whether an argument revision (AR) is successful or not is a complex problem because AR quality is dependent on the overall content of an argument. For example, adding the same evidence sentence could strengthen or weaken existing claims in different argum…
▽ More
The ability to revise in response to feedback is critical to students' writing success. In the case of argument writing in specific, identifying whether an argument revision (AR) is successful or not is a complex problem because AR quality is dependent on the overall content of an argument. For example, adding the same evidence sentence could strengthen or weaken existing claims in different argument contexts (ACs). To address this issue we developed Chain-of-Thought prompts to facilitate ChatGPT-generated ACs for AR quality predictions. The experiments on two corpora, our annotated elementary essays and existing college essays benchmark, demonstrate the superiority of the proposed ACs over baselines.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Predicting Desirable Revisions of Evidence and Reasoning in Argumentative Writing
Authors:
Tazin Afrin,
Diane Litman
Abstract:
We develop models to classify desirable evidence and desirable reasoning revisions in student argumentative writing. We explore two ways to improve classifier performance - using the essay context of the revision, and using the feedback students received before the revision. We perform both intrinsic and extrinsic evaluation for each of our models and report a qualitative analysis. Our results sho…
▽ More
We develop models to classify desirable evidence and desirable reasoning revisions in student argumentative writing. We explore two ways to improve classifier performance - using the essay context of the revision, and using the feedback students received before the revision. We perform both intrinsic and extrinsic evaluation for each of our models and report a qualitative analysis. Our results show that while a model using feedback information improves over a baseline model, models utilizing context - either alone or with feedback - are the most successful in identifying desirable revisions.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Computing and Exploiting Document Structure to Improve Unsupervised Extractive Summarization of Legal Case Decisions
Authors:
Yang Zhong,
Diane Litman
Abstract:
Though many algorithms can be used to automatically summarize legal case decisions, most fail to incorporate domain knowledge about how important sentences in a legal decision relate to a representation of its document structure. For example, analysis of a legal case summarization dataset demonstrates that sentences serving different types of argumentative roles in the decision appear in different…
▽ More
Though many algorithms can be used to automatically summarize legal case decisions, most fail to incorporate domain knowledge about how important sentences in a legal decision relate to a representation of its document structure. For example, analysis of a legal case summarization dataset demonstrates that sentences serving different types of argumentative roles in the decision appear in different sections of the document. In this work, we propose an unsupervised graph-based ranking model that uses a reweighting algorithm to exploit properties of the document structure of legal case decisions. We also explore the impact of using different methods to compute the document structure. Results on the Canadian Legal Case Law dataset show that our proposed method outperforms several strong baselines.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Comparison of Lexical Alignment with a Teachable Robot in Human-Robot and Human-Human-Robot Interactions
Authors:
Yuya Asano,
Diane Litman,
Mingzhi Yu,
Nikki Lobczowski,
Timothy Nokes-Malach,
Adriana Kovashka,
Erin Walker
Abstract:
Speakers build rapport in the process of aligning conversational behaviors with each other. Rapport engendered with a teachable agent while instructing domain material has been shown to promote learning. Past work on lexical alignment in the field of education suffers from limitations in both the measures used to quantify alignment and the types of interactions in which alignment with agents has b…
▽ More
Speakers build rapport in the process of aligning conversational behaviors with each other. Rapport engendered with a teachable agent while instructing domain material has been shown to promote learning. Past work on lexical alignment in the field of education suffers from limitations in both the measures used to quantify alignment and the types of interactions in which alignment with agents has been studied. In this paper, we apply alignment measures based on a data-driven notion of shared expressions (possibly composed of multiple words) and compare alignment in one-on-one human-robot (H-R) interactions with the H-R portions of collaborative human-human-robot (H-H-R) interactions. We find that students in the H-R setting align with a teachable robot more than in the H-H-R setting and that the relationship between lexical alignment and rapport is more complex than what is predicted by previous theoretical and empirical work.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
ImageArg: A Multi-modal Tweet Dataset for Image Persuasiveness Mining
Authors:
Zhexiong Liu,
Meiqi Guo,
Yue Dai,
Diane Litman
Abstract:
The growing interest in developing corpora of persuasive texts has promoted applications in automated systems, e.g., debating and essay scoring systems; however, there is little prior work mining image persuasiveness from an argumentative perspective. To expand persuasiveness mining into a multi-modal realm, we present a multi-modal dataset, ImageArg, consisting of annotations of image persuasiven…
▽ More
The growing interest in developing corpora of persuasive texts has promoted applications in automated systems, e.g., debating and essay scoring systems; however, there is little prior work mining image persuasiveness from an argumentative perspective. To expand persuasiveness mining into a multi-modal realm, we present a multi-modal dataset, ImageArg, consisting of annotations of image persuasiveness in tweets. The annotations are based on a persuasion taxonomy we developed to explore image functionalities and the means of persuasion. We benchmark image persuasiveness tasks on ImageArg using widely-used multi-modal learning methods. The experimental results show that our dataset offers a useful resource for this rich and challenging topic, and there is ample room for modeling improvement.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
ArgLegalSumm: Improving Abstractive Summarization of Legal Documents with Argument Mining
Authors:
Mohamed Elaraby,
Diane Litman
Abstract:
A challenging task when generating summaries of legal documents is the ability to address their argumentative nature. We introduce a simple technique to capture the argumentative structure of legal documents by integrating argument role labeling into the summarization process. Experiments with pretrained language models show that our proposed approach improves performance over strong baselines
A challenging task when generating summaries of legal documents is the ability to address their argumentative nature. We introduce a simple technique to capture the argumentative structure of legal documents by integrating argument role labeling into the summarization process. Experiments with pretrained language models show that our proposed approach improves performance over strong baselines
△ Less
Submitted 20 September, 2022; v1 submitted 4 September, 2022;
originally announced September 2022.
-
ArgRewrite V.2: an Annotated Argumentative Revisions Corpus
Authors:
Omid Kashefi,
Tazin Afrin,
Meghan Dale,
Christopher Olshefski,
Amanda Godley,
Diane Litman,
Rebecca Hwa
Abstract:
Analyzing how humans revise their writings is an interesting research question, not only from an educational perspective but also in terms of artificial intelligence. Better understanding of this process could facilitate many NLP applications, from intelligent tutoring systems to supportive and collaborative writing environments. Developing these applications, however, requires revision corpora, w…
▽ More
Analyzing how humans revise their writings is an interesting research question, not only from an educational perspective but also in terms of artificial intelligence. Better understanding of this process could facilitate many NLP applications, from intelligent tutoring systems to supportive and collaborative writing environments. Developing these applications, however, requires revision corpora, which are not widely available. In this work, we present ArgRewrite V.2, a corpus of annotated argumentative revisions, collected from two cycles of revisions to argumentative essays about self-driving cars. Annotations are provided at different levels of purpose granularity (coarse and fine) and scope (sentential and subsentential). In addition, the corpus includes the revision goal given to each writer, essay scores, annotation verification, pre- and post-study surveys collected from participants as meta-data. The variety of revision unit scope and purpose granularity levels in ArgRewrite, along with the inclusion of new types of meta-data, can make it a useful resource for research and applications that involve revision analysis. We demonstrate some potential applications of ArgRewrite V.2 in the development of automatic revision purpose predictors, as a training source and benchmark.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization
Authors:
Ahmed Magooda,
Diane Litman
Abstract:
This paper explores three simple data manipulation techniques (synthesis, augmentation, curriculum) for improving abstractive summarization models without the need for any additional data. We introduce a method of data synthesis with paraphrasing, a data augmentation technique with sample mixing, and curriculum learning with two new difficulty metrics based on specificity and abstractiveness. We c…
▽ More
This paper explores three simple data manipulation techniques (synthesis, augmentation, curriculum) for improving abstractive summarization models without the need for any additional data. We introduce a method of data synthesis with paraphrasing, a data augmentation technique with sample mixing, and curriculum learning with two new difficulty metrics based on specificity and abstractiveness. We conduct experiments to show that these three techniques can help improve abstractive summarization across two summarization models and two different small datasets. Furthermore, we show that these techniques can improve performance when applied in isolation and when combined.
△ Less
Submitted 17 September, 2021;
originally announced September 2021.
-
Exploring Multitask Learning for Low-Resource AbstractiveSummarization
Authors:
Ahmed Magooda,
Mohamed Elaraby,
Diane Litman
Abstract:
This paper explores the effect of using multitask learning for abstractive summarization in the context of small training corpora. In particular, we incorporate four different tasks (extractive summarization, language modeling, concept detection, and paraphrase detection) both individually and in combination, with the goal of enhancing the target task of abstractive summarization via multitask lea…
▽ More
This paper explores the effect of using multitask learning for abstractive summarization in the context of small training corpora. In particular, we incorporate four different tasks (extractive summarization, language modeling, concept detection, and paraphrase detection) both individually and in combination, with the goal of enhancing the target task of abstractive summarization via multitask learning. We show that for many task combinations, a model trained in a multitask setting outperforms a model trained only for abstractive summarization, with no additional summarization data introduced. Additionally, we do a comprehensive search and find that certain tasks (e.g. paraphrase detection) consistently benefit abstractive summarization, not only when combined with other tasks but also when using different architectures and training corpora.
△ Less
Submitted 17 September, 2021;
originally announced September 2021.
-
A Neural Network-Based Linguistic Similarity Measure for Entrainment in Conversations
Authors:
Mingzhi Yu,
Diane Litman,
Shuang Ma,
Jian Wu
Abstract:
Linguistic entrainment is a phenomenon where people tend to mimic each other in conversation. The core instrument to quantify entrainment is a linguistic similarity measure between conversational partners. Most of the current similarity measures are based on bag-of-words approaches that rely on linguistic markers, ignoring the overall language structure and dialogue context. To address this issue,…
▽ More
Linguistic entrainment is a phenomenon where people tend to mimic each other in conversation. The core instrument to quantify entrainment is a linguistic similarity measure between conversational partners. Most of the current similarity measures are based on bag-of-words approaches that rely on linguistic markers, ignoring the overall language structure and dialogue context. To address this issue, we propose to use a neural network model to perform the similarity measure for entrainment. Our model is context-aware, and it further leverages a novel component to learn the shared high-level linguistic features across dialogues. We first investigate the effectiveness of our novel component. Then we use the model to perform similarity measure in a corpus-based entrainment analysis. We observe promising results for both evaluation tasks.
△ Less
Submitted 4 September, 2021;
originally announced September 2021.
-
Effective Interfaces for Student-Driven Revision Sessions for Argumentative Writing
Authors:
Tazin Afrin,
Omid Kashefi,
Christopher Olshefski,
Diane Litman,
Rebecca Hwa,
Amanda Godley
Abstract:
We present the design and evaluation of a web-based intelligent writing assistant that helps students recognize their revisions of argumentative essays. To understand how our revision assistant can best support students, we have implemented four versions of our system with differences in the unit span (sentence versus sub-sentence) of revision analysis and the level of feedback provided (none, bin…
▽ More
We present the design and evaluation of a web-based intelligent writing assistant that helps students recognize their revisions of argumentative essays. To understand how our revision assistant can best support students, we have implemented four versions of our system with differences in the unit span (sentence versus sub-sentence) of revision analysis and the level of feedback provided (none, binary, or detailed revision purpose categorization). We first discuss the design decisions behind relevant components of the system, then analyze the efficacy of the different versions through a Wizard of Oz study with university students. Our results show that while a simple interface with no revision feedback is easier to use, an interface that provides a detailed categorization of sentence-level revisions is the most helpful based on user survey data, as well as the most effective based on improvement in writing outcomes.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
Annotation and Classification of Evidence and Reasoning Revisions in Argumentative Writing
Authors:
Tazin Afrin,
Elaine Wang,
Diane Litman,
Lindsay C. Matsumura,
Richard Correnti
Abstract:
Automated writing evaluation systems can improve students' writing insofar as students attend to the feedback provided and revise their essay drafts in ways aligned with such feedback. Existing research on revision of argumentative writing in such systems, however, has focused on the types of revisions students make (e.g., surface vs. content) rather than the extent to which revisions actually res…
▽ More
Automated writing evaluation systems can improve students' writing insofar as students attend to the feedback provided and revise their essay drafts in ways aligned with such feedback. Existing research on revision of argumentative writing in such systems, however, has focused on the types of revisions students make (e.g., surface vs. content) rather than the extent to which revisions actually respond to the feedback provided and improve the essay. We introduce an annotation scheme to capture the nature of sentence-level revisions of evidence use and reasoning (the `RER' scheme) and apply it to 5th- and 6th-grade students' argumentative essays. We show that reliable manual annotation can be achieved and that revision annotations correlate with a holistic assessment of essay improvement in line with the feedback provided. Furthermore, we explore the feasibility of automatically classifying revisions according to our scheme.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
Leveraging Linguistic Coordination in Reranking N-Best Candidates For End-to-End Response Selection Using BERT
Authors:
Mingzhi Yu,
Diane Litman
Abstract:
Retrieval-based dialogue systems select the best response from many candidates. Although many state-of-the-art models have shown promising performance in dialogue response selection tasks, there is still quite a gap between R@1 and R@10 performance. To address this, we propose to leverage linguistic coordination (a phenomenon that individuals tend to develop similar linguistic behaviors in convers…
▽ More
Retrieval-based dialogue systems select the best response from many candidates. Although many state-of-the-art models have shown promising performance in dialogue response selection tasks, there is still quite a gap between R@1 and R@10 performance. To address this, we propose to leverage linguistic coordination (a phenomenon that individuals tend to develop similar linguistic behaviors in conversation) to rerank the N-best candidates produced by BERT, a state-of-the-art pre-trained language model. Our results show an improvement in R@1 compared to BERT baselines, demonstrating the utility of repairing machine-generated outputs by leveraging a linguistic theory.
△ Less
Submitted 27 May, 2021;
originally announced May 2021.
-
Discussion Tracker: Supporting Teacher Learning about Students' Collaborative Argumentation in High School Classrooms
Authors:
Luca Lugini,
Christopher Olshefski,
Ravneet Singh,
Diane Litman,
Amanda Godley
Abstract:
Teaching collaborative argumentation is an advanced skill that many K-12 teachers struggle to develop. To address this, we have developed Discussion Tracker, a classroom discussion analytics system based on novel algorithms for classifying argument moves, specificity, and collaboration. Results from a classroom deployment indicate that teachers found the analytics useful, and that the underlying c…
▽ More
Teaching collaborative argumentation is an advanced skill that many K-12 teachers struggle to develop. To address this, we have developed Discussion Tracker, a classroom discussion analytics system based on novel algorithms for classifying argument moves, specificity, and collaboration. Results from a classroom deployment indicate that teachers found the analytics useful, and that the underlying classifiers perform with moderate to substantial agreement with humans.
△ Less
Submitted 20 February, 2021;
originally announced February 2021.
-
Contextual Argument Component Classification for Class Discussions
Authors:
Luca Lugini,
Diane Litman
Abstract:
Argument mining systems often consider contextual information, i.e. information outside of an argumentative discourse unit, when trained to accomplish tasks such as argument component identification, classification, and relation extraction. However, prior work has not carefully analyzed the utility of different contextual properties in context-aware models. In this work, we show how two different…
▽ More
Argument mining systems often consider contextual information, i.e. information outside of an argumentative discourse unit, when trained to accomplish tasks such as argument component identification, classification, and relation extraction. However, prior work has not carefully analyzed the utility of different contextual properties in context-aware models. In this work, we show how two different types of contextual information, local discourse context and speaker context, can be incorporated into a computational model for classifying argument components in multi-party classroom discussions. We find that both context types can improve performance, although the improvements are dependent on context size and position.
△ Less
Submitted 20 February, 2021;
originally announced February 2021.
-
Automated Topical Component Extraction Using Neural Network Attention Scores from Source-based Essay Scoring
Authors:
Haoran Zhang,
Diane Litman
Abstract:
While automated essay scoring (AES) can reliably grade essays at scale, automated writing evaluation (AWE) additionally provides formative feedback to guide essay revision. However, a neural AES typically does not provide useful feature representations for supporting AWE. This paper presents a method for linking AWE and neural AES, by extracting Topical Components (TCs) representing evidence from…
▽ More
While automated essay scoring (AES) can reliably grade essays at scale, automated writing evaluation (AWE) additionally provides formative feedback to guide essay revision. However, a neural AES typically does not provide useful feature representations for supporting AWE. This paper presents a method for linking AWE and neural AES, by extracting Topical Components (TCs) representing evidence from a source text using the intermediate output of attention layers. We evaluate performance using a feature-based AES requiring TCs. Results show that performance is comparable whether using automatically or manually constructed TCs for 1) representing essays as rubric-based features, 2) grading essays.
△ Less
Submitted 4 August, 2020;
originally announced August 2020.
-
The Discussion Tracker Corpus of Collaborative Argumentation
Authors:
Christopher Olshefski,
Luca Lugini,
Ravneet Singh,
Diane Litman,
Amanda Godley
Abstract:
Although Natural Language Processing (NLP) research on argument mining has advanced considerably in recent years, most studies draw on corpora of asynchronous and written texts, often produced by individuals. Few published corpora of synchronous, multi-party argumentation are available. The Discussion Tracker corpus, collected in American high school English classes, is an annotated dataset of tra…
▽ More
Although Natural Language Processing (NLP) research on argument mining has advanced considerably in recent years, most studies draw on corpora of asynchronous and written texts, often produced by individuals. Few published corpora of synchronous, multi-party argumentation are available. The Discussion Tracker corpus, collected in American high school English classes, is an annotated dataset of transcripts of spoken, multi-party argumentation. The corpus consists of 29 multi-party discussions of English literature transcribed from 985 minutes of audio. The transcripts were annotated for three dimensions of collaborative argumentation: argument moves (claims, evidence, and explanations), specificity (low, medium, high) and collaboration (e.g., extensions of and disagreements about others' ideas). In addition to providing descriptive statistics on the corpus, we provide performance benchmarks and associated code for predicting each dimension separately, illustrate the use of the multiple annotations in the corpus to improve performance via multi-task learning, and finally discuss other ways the corpus might be used to further NLP research.
△ Less
Submitted 22 May, 2020;
originally announced May 2020.
-
Abstractive Summarization for Low Resource Data using Domain Transfer and Data Synthesis
Authors:
Ahmed Magooda,
Diane Litman
Abstract:
Training abstractive summarization models typically requires large amounts of data, which can be a limitation for many domains. In this paper we explore using domain transfer and data synthesis to improve the performance of recent abstractive summarization methods when applied to small corpora of student reflections. First, we explored whether tuning state of the art model trained on newspaper dat…
▽ More
Training abstractive summarization models typically requires large amounts of data, which can be a limitation for many domains. In this paper we explore using domain transfer and data synthesis to improve the performance of recent abstractive summarization methods when applied to small corpora of student reflections. First, we explored whether tuning state of the art model trained on newspaper data could boost performance on student reflection data. Evaluations demonstrated that summaries produced by the tuned model achieved higher ROUGE scores compared to model trained on just student reflection data or just newspaper data. The tuned model also achieved higher scores compared to extractive summarization baselines, and additionally was judged to produce more coherent and readable summaries in human evaluations. Second, we explored whether synthesizing summaries of student data could additionally boost performance. We proposed a template-based model to synthesize new data, which when incorporated into training further increased ROUGE scores. Finally, we showed that combining data synthesis with domain transfer achieved higher ROUGE scores compared to only using one of the two approaches.
△ Less
Submitted 9 February, 2020;
originally announced February 2020.
-
Annotation and Classification of Sentence-level Revision Improvement
Authors:
Tazin Afrin,
Diane Litman
Abstract:
Studies of writing revisions rarely focus on revision quality. To address this issue, we introduce a corpus of between-draft revisions of student argumentative essays, annotated as to whether each revision improves essay quality. We demonstrate a potential usage of our annotations by developing a machine learning model to predict revision improvement. With the goal of expanding training data, we a…
▽ More
Studies of writing revisions rarely focus on revision quality. To address this issue, we introduce a corpus of between-draft revisions of student argumentative essays, annotated as to whether each revision improves essay quality. We demonstrate a potential usage of our annotations by developing a machine learning model to predict revision improvement. With the goal of expanding training data, we also extract revisions from a dataset edited by expert proofreaders. Our results indicate that blending expert and non-expert revisions increases model performance, with expert data particularly important for predicting low-quality revisions.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
Identifying Editor Roles in Argumentative Writing from Student Revision Histories
Authors:
Tazin Afrin,
Diane Litman
Abstract:
We present a method for identifying editor roles from students' revision behaviors during argumentative writing. We first develop a method for applying a topic modeling algorithm to identify a set of editor roles from a vocabulary capturing three aspects of student revision behaviors: operation, purpose, and position. We validate the identified roles by showing that modeling the editor roles that…
▽ More
We present a method for identifying editor roles from students' revision behaviors during argumentative writing. We first develop a method for applying a topic modeling algorithm to identify a set of editor roles from a vocabulary capturing three aspects of student revision behaviors: operation, purpose, and position. We validate the identified roles by showing that modeling the editor roles that students take when revising a paper not only accounts for the variance in revision purposes in our data, but also relates to writing improvement.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
Annotating Student Talk in Text-based Classroom Discussions
Authors:
Luca Lugini,
Diane Litman,
Amanda Godley,
Christopher Olshefski
Abstract:
Classroom discussions in English Language Arts have a positive effect on students' reading, writing and reasoning skills. Although prior work has largely focused on teacher talk and student-teacher interactions, we focus on three theoretically-motivated aspects of high-quality student talk: argumentation, specificity, and knowledge domain. We introduce an annotation scheme, then show that the sche…
▽ More
Classroom discussions in English Language Arts have a positive effect on students' reading, writing and reasoning skills. Although prior work has largely focused on teacher talk and student-teacher interactions, we focus on three theoretically-motivated aspects of high-quality student talk: argumentation, specificity, and knowledge domain. We introduce an annotation scheme, then show that the scheme can be used to produce reliable annotations and that the annotations are predictive of discussion quality. We also highlight opportunities provided by our scheme for education and natural language processing research.
△ Less
Submitted 6 September, 2019;
originally announced September 2019.
-
Argument Component Classification for Classroom Discussions
Authors:
Luca Lugini,
Diane Litman
Abstract:
This paper focuses on argument component classification for transcribed spoken classroom discussions, with the goal of automatically classifying student utterances into claims, evidence, and warrants. We show that an existing method for argument component classification developed for another educationally-oriented domain performs poorly on our dataset. We then show that feature sets from prior wor…
▽ More
This paper focuses on argument component classification for transcribed spoken classroom discussions, with the goal of automatically classifying student utterances into claims, evidence, and warrants. We show that an existing method for argument component classification developed for another educationally-oriented domain performs poorly on our dataset. We then show that feature sets from prior work on argument mining for student essays and online dialogues can be used to improve performance considerably. We also provide a comparison between convolutional neural networks and recurrent neural networks when trained under different conditions to classify argument components in classroom discussions. While neural network models are not always able to outperform a logistic regression model, we were able to gain some useful insights: convolutional networks are more robust than recurrent networks both at the character and at the word level, and specificity information can help boost performance in multi-task training.
△ Less
Submitted 6 September, 2019;
originally announced September 2019.
-
Predicting Specificity in Classroom Discussion
Authors:
Luca Lugini,
Diane Litman
Abstract:
High quality classroom discussion is important to student development, enhancing abilities to express claims, reason about other students' claims, and retain information for longer periods of time. Previous small-scale studies have shown that one indicator of classroom discussion quality is specificity. In this paper we tackle the problem of predicting specificity for classroom discussions. We pro…
▽ More
High quality classroom discussion is important to student development, enhancing abilities to express claims, reason about other students' claims, and retain information for longer periods of time. Previous small-scale studies have shown that one indicator of classroom discussion quality is specificity. In this paper we tackle the problem of predicting specificity for classroom discussions. We propose several methods and feature sets capable of outperforming the state of the art in specificity prediction. Additionally, we provide a set of meaningful, interpretable features that can be used to analyze classroom discussions at a pedagogical level.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
Identifying Personality Traits Using Overlap Dynamics in Multiparty Dialogue
Authors:
Mingzhi Yu,
Emer Gilmartin,
Diane Litman
Abstract:
Research on human spoken language has shown that speech plays an important role in identifying speaker personality traits. In this work, we propose an approach for identifying speaker personality traits using overlap dynamics in multiparty spoken dialogues. We first define a set of novel features representing the overlap dynamics of each speaker. We then investigate the impact of speaker personali…
▽ More
Research on human spoken language has shown that speech plays an important role in identifying speaker personality traits. In this work, we propose an approach for identifying speaker personality traits using overlap dynamics in multiparty spoken dialogues. We first define a set of novel features representing the overlap dynamics of each speaker. We then investigate the impact of speaker personality traits on these features using ANOVA tests. We find that features of overlap dynamics significantly vary for speakers with different levels of both Extraversion and Conscientiousness. Finally, we find that classifiers using only overlap dynamics features outperform random guessing in identifying Extraversion and Agreeableness, and that the improvements are statistically significant.
△ Less
Submitted 2 September, 2019;
originally announced September 2019.
-
Investigating the Relationship between Multi-Party Linguistic Entrainment, Team Characteristics, and the Perception of Team Social Outcomes
Authors:
Mingzhi Yu,
Diane Litman,
Susannah Paletz
Abstract:
Multi-party linguistic entrainment refers to the phenomenon that speakers tend to speak more similarly during conversation. We first developed new measures of multi-party entrainment on features describing linguistic style, and then examined the relationship between entrainment and team characteristics in terms of gender composition, team size, and diversity. Next, we predicted the perception of t…
▽ More
Multi-party linguistic entrainment refers to the phenomenon that speakers tend to speak more similarly during conversation. We first developed new measures of multi-party entrainment on features describing linguistic style, and then examined the relationship between entrainment and team characteristics in terms of gender composition, team size, and diversity. Next, we predicted the perception of team social outcomes using multi-party linguistic entrainment and team characteristics with a hierarchical regression model. We found that teams with greater gender diversity had higher minimum convergence than teams with less gender diversity. Entrainment contributed significantly to predicting perceived team social outcomes both alone and controlling for team characteristics.
△ Less
Submitted 2 September, 2019;
originally announced September 2019.
-
Co-Attention Based Neural Network for Source-Dependent Essay Scoring
Authors:
Haoran Zhang,
Diane Litman
Abstract:
This paper presents an investigation of using a co-attention based neural network for source-dependent essay scoring. We use a co-attention mechanism to help the model learn the importance of each part of the essay more accurately. Also, this paper shows that the co-attention based neural network model provides reliable score prediction of source-dependent responses. We evaluate our model on two s…
▽ More
This paper presents an investigation of using a co-attention based neural network for source-dependent essay scoring. We use a co-attention mechanism to help the model learn the importance of each part of the essay more accurately. Also, this paper shows that the co-attention based neural network model provides reliable score prediction of source-dependent responses. We evaluate our model on two source-dependent response corpora. Results show that our model outperforms the baseline on both corpora. We also show that the attention of the model is similar to the expert opinions with examples.
△ Less
Submitted 6 August, 2019;
originally announced August 2019.
-
eRevise: Using Natural Language Processing to Provide Formative Feedback on Text Evidence Usage in Student Writing
Authors:
Haoran Zhang,
Ahmed Magooda,
Diane Litman,
Richard Correnti,
Elaine Wang,
Lindsay Clare Matsumura,
Emily Howe,
Rafael Quintana
Abstract:
Writing a good essay typically involves students revising an initial paper draft after receiving feedback. We present eRevise, a web-based writing and revising environment that uses natural language processing features generated for rubric-based essay scoring to trigger formative feedback messages regarding students' use of evidence in response-to-text writing. By helping students understand the c…
▽ More
Writing a good essay typically involves students revising an initial paper draft after receiving feedback. We present eRevise, a web-based writing and revising environment that uses natural language processing features generated for rubric-based essay scoring to trigger formative feedback messages regarding students' use of evidence in response-to-text writing. By helping students understand the criteria for using text evidence during writing, eRevise empowers students to better revise their paper drafts. In a pilot deployment of eRevise in 7 classrooms spanning grades 5 and 6, the quality of text evidence usage in writing improved after students received formative feedback then engaged in paper revision.
△ Less
Submitted 6 August, 2019;
originally announced August 2019.
-
Word Embedding for Response-To-Text Assessment of Evidence
Authors:
Haoran Zhang,
Diane Litman
Abstract:
Manually grading the Response to Text Assessment (RTA) is labor intensive. Therefore, an automatic method is being developed for scoring analytical writing when the RTA is administered in large numbers of classrooms. Our long-term goal is to also use this scoring method to provide formative feedback to students and teachers about students' writing quality. As a first step towards this goal, interp…
▽ More
Manually grading the Response to Text Assessment (RTA) is labor intensive. Therefore, an automatic method is being developed for scoring analytical writing when the RTA is administered in large numbers of classrooms. Our long-term goal is to also use this scoring method to provide formative feedback to students and teachers about students' writing quality. As a first step towards this goal, interpretable features for automatically scoring the evidence rubric of the RTA have been developed. In this paper, we present a simple but promising method for improving evidence scoring by employing the word embedding model. We evaluate our method on corpora of responses written by upper elementary students.
△ Less
Submitted 6 August, 2019;
originally announced August 2019.
-
A Novel ILP Framework for Summarizing Content with High Lexical Variety
Authors:
Wencan Luo,
Fei Liu,
Zitao Liu,
Diane Litman
Abstract:
Summarizing content contributed by individuals can be challenging, because people make different lexical choices even when describing the same events. However, there remains a significant need to summarize such content. Examples include the student responses to post-class reflective questions, product reviews, and news articles published by different news agencies related to the same events. High…
▽ More
Summarizing content contributed by individuals can be challenging, because people make different lexical choices even when describing the same events. However, there remains a significant need to summarize such content. Examples include the student responses to post-class reflective questions, product reviews, and news articles published by different news agencies related to the same events. High lexical diversity of these documents hinders the system's ability to effectively identify salient content and reduce summary redundancy. In this paper, we overcome this issue by introducing an integer linear programming-based summarization framework. It incorporates a low-rank approximation to the sentence-word co-occurrence matrix to intrinsically group semantically-similar lexical items. We conduct extensive experiments on datasets of student responses, product reviews, and news documents. Our approach compares favorably to a number of extractive baselines as well as a neural abstractive summarization system. The paper finally sheds light on when and why the proposed framework is effective at summarizing content with high lexical variety.
△ Less
Submitted 25 July, 2018;
originally announced July 2018.
-
An Improved Phrase-based Approach to Annotating and Summarizing Student Course Responses
Authors:
Wencan Luo,
Fei Liu,
Diane Litman
Abstract:
Teaching large classes remains a great challenge, primarily because it is difficult to attend to all the student needs in a timely manner. Automatic text summarization systems can be leveraged to summarize the student feedback, submitted immediately after each lecture, but it is left to be discovered what makes a good summary for student responses. In this work we explore a new methodology that ef…
▽ More
Teaching large classes remains a great challenge, primarily because it is difficult to attend to all the student needs in a timely manner. Automatic text summarization systems can be leveraged to summarize the student feedback, submitted immediately after each lecture, but it is left to be discovered what makes a good summary for student responses. In this work we explore a new methodology that effectively extracts summary phrases from the student responses. Each phrase is tagged with the number of students who raise the issue. The phrases are evaluated along two dimensions: with respect to text content, they should be informative and well-formed, measured by the ROUGE metric; additionally, they shall attend to the most pressing student needs, measured by a newly proposed metric. This work is enabled by a phrase-based annotation and highlighting scheme, which is new to the summarization task. The phrase-based framework allows us to summarize the student responses into a set of bullet points and present to the instructor promptly.
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
Automatic Summarization of Student Course Feedback
Authors:
Wencan Luo,
Fei Liu,
Zitao Liu,
Diane Litman
Abstract:
Student course feedback is generated daily in both classrooms and online course discussion forums. Traditionally, instructors manually analyze these responses in a costly manner. In this work, we propose a new approach to summarizing student course feedback based on the integer linear programming (ILP) framework. Our approach allows different student responses to share co-occurrence statistics and…
▽ More
Student course feedback is generated daily in both classrooms and online course discussion forums. Traditionally, instructors manually analyze these responses in a costly manner. In this work, we propose a new approach to summarizing student course feedback based on the integer linear programming (ILP) framework. Our approach allows different student responses to share co-occurrence statistics and alleviates sparsity issues. Experimental results on a student feedback corpus show that our approach outperforms a range of baselines in terms of both ROUGE scores and human evaluation.
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
A Joint Identification Approach for Argumentative Writing Revisions
Authors:
Fan Zhang,
Diane Litman
Abstract:
Prior work on revision identification typically uses a pipeline method: revision extraction is first conducted to identify the locations of revisions and revision classification is then conducted on the identified revisions. Such a setting propagates the errors of the revision extraction step to the revision classification step. This paper proposes an approach that identifies the revision location…
▽ More
Prior work on revision identification typically uses a pipeline method: revision extraction is first conducted to identify the locations of revisions and revision classification is then conducted on the identified revisions. Such a setting propagates the errors of the revision extraction step to the revision classification step. This paper proposes an approach that identifies the revision location and the revision type jointly to solve the issue of error propagation. It utilizes a sequence representation of revisions and conducts sequence labeling for revision identification. A mutation-based approach is utilized to update identification sequences. Results demonstrate that our proposed approach yields better performance on both revision location extraction and revision type classification compared to a pipeline baseline.
△ Less
Submitted 28 February, 2017;
originally announced March 2017.
-
Using Discourse Signals for Robust Instructor Intervention Prediction
Authors:
Muthu Kumar Chandrasekaran,
Carrie Demmans Epp,
Min-Yen Kan,
Diane Litman
Abstract:
We tackle the prediction of instructor intervention in student posts from discussion forums in Massive Open Online Courses (MOOCs). Our key finding is that using automatically obtained discourse relations improves the prediction of when instructors intervene in student discussions, when compared with a state-of-the-art, feature-rich baseline. Our supervised classifier makes use of an automatic dis…
▽ More
We tackle the prediction of instructor intervention in student posts from discussion forums in Massive Open Online Courses (MOOCs). Our key finding is that using automatically obtained discourse relations improves the prediction of when instructors intervene in student discussions, when compared with a state-of-the-art, feature-rich baseline. Our supervised classifier makes use of an automatic discourse parser which outputs Penn Discourse Treebank (PDTB) tags that represent in-post discourse features. We show PDTB relation-based features increase the robustness of the classifier and complement baseline features in recalling more diverse instructor intervention patterns. In comprehensive experiments over 14 MOOC offerings from several disciplines, the PDTB discourse features improve performance on average. The resultant models are less dependent on domain-specific vocabulary, allowing them to better generalize to new courses.
△ Less
Submitted 3 December, 2016;
originally announced December 2016.
-
Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System
Authors:
M. Kearns,
D. Litman,
S. Singh,
M. Walker
Abstract:
Designing the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a dialogue policy, which addresses the technical challenges in applying reinforcement learning to a working dialogue system with human users. We report on the design, construction and empirical evaluation of…
▽ More
Designing the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a dialogue policy, which addresses the technical challenges in applying reinforcement learning to a working dialogue system with human users. We report on the design, construction and empirical evaluation of NJFun, an experimental spoken dialogue system that provides users with access to information about fun things to do in New Jersey. Our results show that by optimizing its performance via reinforcement learning, NJFun measurably improves system performance.
△ Less
Submitted 3 June, 2011;
originally announced June 2011.
-
Empirically Evaluating an Adaptable Spoken Dialogue System
Authors:
Diane J. Litman,
Shimei Pan
Abstract:
Recent technological advances have made it possible to build real-time, interactive spoken dialogue systems for a wide variety of applications. However, when users do not respect the limitations of such systems, performance typically degrades. Although users differ with respect to their knowledge of system limitations, and although different dialogue strategies make system limitations more appar…
▽ More
Recent technological advances have made it possible to build real-time, interactive spoken dialogue systems for a wide variety of applications. However, when users do not respect the limitations of such systems, performance typically degrades. Although users differ with respect to their knowledge of system limitations, and although different dialogue strategies make system limitations more apparent to users, most current systems do not try to improve performance by adapting dialogue behavior to individual users. This paper presents an empirical evaluation of TOOT, an adaptable spoken dialogue system for retrieving train schedules on the web. We conduct an experiment in which 20 users carry out 4 tasks with both adaptable and non-adaptable versions of TOOT, resulting in a corpus of 80 dialogues. The values for a wide range of evaluation measures are then extracted from this corpus. Our results show that adaptable TOOT generally outperforms non-adaptable TOOT, and that the utility of adaptation depends on TOOT's initial dialogue strategies.
△ Less
Submitted 5 March, 1999;
originally announced March 1999.
-
PARADISE: A Framework for Evaluating Spoken Dialogue Agents
Authors:
Marilyn A. Walker,
Diane J. Litman,
Candace A. Kamm,
Alicia Abella
Abstract:
This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to perfo…
▽ More
This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.
△ Less
Submitted 15 April, 1997;
originally announced April 1997.
-
Cue Phrase Classification Using Machine Learning
Authors:
D. J. Litman
Abstract:
Cue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sentential sense to convey semantic rather than structural information. Correctly classifying cue phrases as discourse or sentential is critical in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition. Th…
▽ More
Cue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sentential sense to convey semantic rather than structural information. Correctly classifying cue phrases as discourse or sentential is critical in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition. This paper explores the use of machine learning for classifying cue phrases as discourse or sentential. Two machine learning programs (Cgrendel and C4.5) are used to induce classification models from sets of pre-classified cue phrases and their features in text and speech. Machine learning is shown to be an effective technique for not only automating the generation of classification models, but also for improving upon previous results. When compared to manually derived classification models already in the literature, the learned models often perform with higher accuracy and contain new linguistic insights into the data. In addition, the ability to automatically construct classification models makes it easier to comparatively analyze the utility of alternative feature representations of the data. Finally, the ease of retraining makes the learning approach more scalable and flexible than manual methods.
△ Less
Submitted 31 August, 1996;
originally announced September 1996.
-
Cue Phrase Classification Using Machine Learning
Authors:
Diane J. Litman
Abstract:
Cue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sentential sense to convey semantic rather than structural information. Correctly classifying cue phrases as discourse or sentential is critical in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition. Th…
▽ More
Cue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sentential sense to convey semantic rather than structural information. Correctly classifying cue phrases as discourse or sentential is critical in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition. This paper explores the use of machine learning for classifying cue phrases as discourse or sentential. Two machine learning programs (Cgrendel and C4.5) are used to induce classification models from sets of pre-classified cue phrases and their features in text and speech. Machine learning is shown to be an effective technique for not only automating the generation of classification models, but also for improving upon previous results. When compared to manually derived classification models already in the literature, the learned models often perform with higher accuracy and contain new linguistic insights into the data. In addition, the ability to automatically construct classification models makes it easier to comparatively analyze the utility of alternative feature representations of the data. Finally, the ease of retraining makes the learning approach more scalable and flexible than manual methods.
△ Less
Submitted 9 September, 1996;
originally announced September 1996.