\xpatchcmd\IEEEkeywords

—-

Sequence Graph Network for Online Debate Analysis

Quan Mai, 1 Susan Gauch, 1 Douglas Adams, 2 Miaoqing Huang 1 1Department of Electrical Engineering and Computer Science, 2Department of Sociology and Criminology
University of Arkansas
Fayetteville, Arkansas, USA
{quanmai, sgauch, djadams, mqhuang}@uark.edu
Abstract

Online debates involve a dynamic exchange of ideas over time, where participants need to actively consider their opponents’ arguments, respond with counterarguments, reinforce their own points, and introduce more compelling arguments as the discussion unfolds. Modeling such a complex process is not a simple task, as it necessitates the incorporation of both sequential characteristics and the capability to capture interactions effectively. To address this challenge, we employ a sequence-graph approach. Building the conversation as a graph allows us to effectively model interactions between participants through directed edges. Simultaneously, the propagation of information along these edges in a sequential manner enables us to capture a more comprehensive representation of context. We also introduce a Sequence Graph Attention layer to illustrate the proposed information update scheme. The experimental results show that sequence graph networks achieve superior results to existing methods in online debates.

Keywords:
Graph neural networks; dialog modeling; sequence graph network; online debates.

I Introduction

Online debate has become an integral part of our digital age, transforming the way we engage in discourse and exchange ideas. In social media platforms (e.g., Facebook, Twitter (currently X), etc.), individuals from diverse backgrounds and geographical locations converge to discuss and deliberate on a wide array of topics, ranging from politics and ethics to music and science. Debating with a wide range of debaters requires participants to research and present well-informed arguments, encourages critical thinking, and challenges preconceived notions.

Like other forms of debate, online discussions are contingent on the flow of time (temporal dependency); each subsequent comment relies on the content of the previous comment it responds to. Participants interactively promote their point while countering the opponent’s [4]. Within a turn, debaters employ a variety of strategies, each of which plays a crucial role in determining the outcome of the debate. These strategies involve either directly addressing the opponent’s argument, presenting their own viewpoint, or skillfully combining both tactics. The latter approach often appears to be the most effective, allowing the debater to simultaneously achieve both objectives during their turn. However, one cannot always adopt that strategy as it depends on their position in the debate. For instance, if a debater is the first speaker in a debate, their primary task is to present their own ideas coherently and logically, as they do not have the opportunity to directly counter their opponent’s arguments at this stage. In such a scenario, the debater’s effectiveness lies in the clarity and persuasiveness of their presentation, making it challenging for the opposing side to refute their position. These strategies are also discussed in  [4], which examined the dynamics of information flow within online debates.

Refer to caption

Figure 1: A “what-should-we-mention” information flow scheme that mimics the interaction process of a debater. At each time step t𝑡titalic_t, the node features are updated by considering their peer nodes from the same turn and the connected nodes from previous turns, using Directed Graph Attention Network layers. Nodes associated with different debaters are colored differently. Each type of edge (colored arrows) contributes a corresponding representation, collectively forming 𝐡isubscript𝐡𝑖\mathbf{h}_{i}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The node’s utterance embedding 𝐡𝐡\mathbf{h}bold_h and the interaction representation 𝐡isubscript𝐡𝑖\mathbf{h}_{i}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are used to update the node feature 𝐡superscript𝐡\mathbf{h}^{\prime}bold_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

As the argument process is temporally dependent, Recurrent Neural Networks (RNNs), such as Long Short Term Memory (LSTM) [9] and Gated Recurrent Unit (GRU) [13], have been one of the most widely used techniques in argument-winning research as well as dialog extraction. Several studies employ RNNs as the encoder for utterances [5][7][10], leveraging their capacity to capture sequential dependencies and relationships within textual data. In addition to encoding individual utterances, sequence networks are employed to encode entire conversations by sequentially processing the arguments [11].

In a debate, however, participants engage in interactive turn-by-turn rebuttals to counter their opponents’ arguments, and sequencing the entire conversation fails to capture this dynamic interaction. In order to model the process of dialogical argumentation, [10] use a co-attention network to capture the interaction between the participants and achieve a promising performance on the prediction task. The focus of [7] is placed on identifying connections between the sentences of debaters. This approach is instrumental in capturing critical argumentative components, making it a pivotal factor for predicting the winner. The aforementioned studies compute “attention scores” for each pair of sentences belonging to two participants in order to assess the relevance of one sentence to another.

An alternative method for capturing these interaction dynamics is through the use of graphs. Graphs are an effective way to represent relationships and dependencies among entities, making them suitable for a wide range of applications, including social networks and recommendation systems [17, 18, 19]. The connection between two components of an argument can be effectively represented by a link (or edge) within the graph. Graphs can also serve as input to Graph Neural Networks (GNNs) for capturing the contextual information within the conversation. In their work, [12] employs a heterogeneous graph to represent the relationships among entities discussed in multi-party dialogues. In order to model the relationships between argument pairs, [5] incorporate intra-passage and cross-passage links to interconnect sentence nodes. Subsequently, they employ a Graph Convolutional Network (GCN) [15] for efficient information propagation.

Traditional GNNs, including GCNs and Graph Attention Networks (GAT) [3]), may not effectively capture the temporal dynamics within a conversation, particularly in a debate scenario in which participants engage in interactive exchanges to counter arguments or defend their own viewpoints. To tackle this challenge, we integrate the strengths of both RNNs and GNNs within a unified framework. In this framework, we conceptualize the debate as a graph, where argument components are depicted as nodes, and their features undergo sequential updates, according to the turn to which they correspond. We introduce the Sequence Graph Attention (SGA) cell, which resembles the traditional RNN-cell, to capture long-range dependencies in the debate (which is treated as a sequence of subgraphs). The experimental results demonstrate that our approach can capture the interaction between debaters and outperforms state-of-the-art models in accurately predicting the winner in several online debate datasets. The code and models are available at https://github.com/quanmai/SGA

The structure of the remainder of this paper is organized as follows: Section II describes the process of constructing a graph from a debate. In Section III, we introduce our proposed framework. The effectiveness of this method is evaluated in Section IV. Section V reviews some relevant literature. Finally, Section VI provides a summary of our findings and discusses potential avenues for future work.

II Preliminary

Refer to caption
Figure 2: Graph Construction from Debate: Nodes establish connections through three distinct edge types, indicated by colored arrows. Intra-argument edges (blue) link nodes within the same turn, reinforcement edges (green) connect nodes from the same debater across different turns, while countering edges (orange) connect nodes from a debater to their opponent’s, illustrating counter-argumentation. The sample debate is taken from data collected by [1].

Before describing the details of the proposed method, we first give a brief introduction to how we construct a graph for an online debate.

II-A Debate Format

Our primary focus lies in online debates wherein the victor emerges through the collective votes of an audience or a panel of judges. These debates adhere to the Oxford-style format, featuring two participants representing opposing viewpoints —one in favor of the claim (Pros) and the other in opposition (Cons) — who alternate in presenting their arguments on a given topic. After the debate, a winner is declared, unless a tie occurs. In this study, we define a turn as each instance when a debater presents their argument, and a round represents the stage in which opposing sides provide their arguments. Consequently, round 0 consists of turn 0 and turn 1, round 1 consists of turn 2 and turn 3, and so forth.

II-B Debate-to-Graph construction

Given a debate that contains a total of N𝑁Nitalic_N sentences, a directed, unweighted graph 𝒢=(𝒱,,)𝒢𝒱\mathcal{G}=(\mathcal{V},\mathcal{E},{\mathcal{H}})caligraphic_G = ( caligraphic_V , caligraphic_E , caligraphic_H ) is constructed based on sentences and their relationships (Figure  2). Sentences in the debate are represented by a set of nodes 𝒱𝒱\mathcal{V}caligraphic_V (|𝒱|=N𝒱𝑁|\mathcal{V}|=N| caligraphic_V | = italic_N), and a node attribute matrix RN×Dsuperscript𝑅𝑁𝐷\mathcal{H}\in R^{N\times D}caligraphic_H ∈ italic_R start_POSTSUPERSCRIPT italic_N × italic_D end_POSTSUPERSCRIPT, defined by D𝐷Ditalic_D-dimensional embedding vectors for each of the sentences. Sentences in the debate may be interconnected and these interconnections are represented by \mathcal{E}caligraphic_E, the set of edges in the graph.

Edge types: We define three different types of edges to elucidate the participants’ strategies throughout the debate. Each type is categorized based on the turn it corresponds to and the strategic role it plays. In Section III, we will delve into how each type contributes to node feature aggregation.

  1. 1.

    Logical and Coherent Edges: These edges emphasize the participants’ ability to construct logical and coherent arguments within their turn.

  2. 2.

    Reinforcement Edges: These edges serve to strengthen the points previously made by the debater in their previous rounds. We will interchangeably use the terms reinforcement edges and supporting edges.

  3. 3.

    Counterargument Edges: These edges highlight the participants’ skill in countering their opponents’ arguments effectively.

Intra-argument Links These edges connect sentences of the same turn. During a turn, edges are constructed based on the relative position among sentences. These Logical and Coherent edges capture coherency in an argument turn. Given two sentences, denoted as sitsubscriptsuperscript𝑠𝑡𝑖s^{t}_{i}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and sjtsubscriptsuperscript𝑠𝑡𝑗s^{t}_{j}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, both belonging to turn t𝑡titalic_t, we establish an edge eijintersubscriptsuperscript𝑒𝑖𝑛𝑡𝑒𝑟𝑖𝑗e^{inter}_{ij}italic_e start_POSTSUPERSCRIPT italic_i italic_n italic_t italic_e italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT from sjtsubscriptsuperscript𝑠𝑡𝑗s^{t}_{j}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to sitsubscriptsuperscript𝑠𝑡𝑖s^{t}_{i}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT if the positional difference 𝒟𝒟\mathcal{D}caligraphic_D between them is within a specified distance threshold d𝑑ditalic_d.

eijinter={1if 𝒟(sit,sjt)d0otherwisesubscriptsuperscript𝑒𝑖𝑛𝑡𝑒𝑟𝑖𝑗cases1if 𝒟subscriptsuperscript𝑠𝑡𝑖subscriptsuperscript𝑠𝑡𝑗𝑑0otherwisee^{inter}_{ij}=\begin{cases}1&\text{if }\mathcal{D}(s^{t}_{i},s^{t}_{j})\leq d% \\ 0&\text{otherwise}\end{cases}italic_e start_POSTSUPERSCRIPT italic_i italic_n italic_t italic_e italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = { start_ROW start_CELL 1 end_CELL start_CELL if caligraphic_D ( italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ≤ italic_d end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW

Cross-argument Links These edges interconnect sentences that belong to different turns and are categorized into two types: Reinforcement and Counterargument edges. The former connects nodes belonging to the same debater whereas the latter connects nodes belonging to different debaters. For example, nodes in the 3rd turn are connected to nodes from the 1st turn through Reinforcement edges and are also linked with their opponent’s nodes from the 2nd turn. Unlike intra-argument edges that rely on the relative positions of sentences, cross-argument edges are established using semantic textual similarity between sentences. In this work, we use cosine similarity Scsubscript𝑆𝑐S_{c}italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT to capture the semantic relationship of texts. An edge eijsubscript𝑒𝑖𝑗e_{ij}italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT links 2 nodes visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT if their similarity score Sc(𝐡i,𝐡j)subscript𝑆𝑐subscript𝐡𝑖subscript𝐡𝑗S_{c}(\mathbf{h}_{i},\mathbf{h}_{j})italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) meets a threshold value Sthsubscript𝑆𝑡S_{th}italic_S start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT

evi,vj={1if Sc(𝐡i,𝐡j)Sth0otherwisesubscript𝑒subscript𝑣𝑖subscript𝑣𝑗cases1if subscript𝑆𝑐subscript𝐡𝑖subscript𝐡𝑗subscript𝑆𝑡0otherwisee_{v_{i},v_{j}}=\begin{cases}1&\text{if }S_{c}(\mathbf{h}_{i},\mathbf{h}_{j})% \geq S_{th}\\ 0&\text{otherwise}\end{cases}italic_e start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { start_ROW start_CELL 1 end_CELL start_CELL if italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ≥ italic_S start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW

where 𝐡isubscript𝐡𝑖\mathbf{h}_{i}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝐡jsubscript𝐡𝑗\mathbf{h}_{j}bold_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are ithsuperscript𝑖𝑡i^{th}italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT and jthsuperscript𝑗𝑡j^{th}italic_j start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT rows in \mathcal{H}caligraphic_H, representing embedding vectors of sentences visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, respectively. Sthsubscript𝑆𝑡S_{th}italic_S start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT serves as a crucial hyper-parameter for evaluating the influence of participant interactions on the debate’s outcome. An alternative approach is to employ the top k𝑘kitalic_k similarities, allowing each node to establish connections with up to k𝑘kitalic_k cross-argument nodes that possess the highest similarity scores. We will evaluate the effectiveness of each approach on the predictive performance in Section IV. It is important to note that cross-argument edges consistently flow from nodes in previous turns to nodes in subsequent turns; there is no reverse direction.

III Proposed method

III-A Utterance Encoder

We encode each sentence using pre-trained sentence embedding (Sentence Transformer (SBERT)) [2]. In preliminary work, we found that this approach works better than using GloVe [6] word embeddings and a bidirectional LSTM to encode semantic vectors for sentences. This step gives us the sentence embedding matrix \mathcal{H}caligraphic_H, in which each row 𝐡isubscript𝐡𝑖\mathbf{h}_{i}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is an embedding vector for sentence sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Turn Embeddings: Participants employ distinct strategies during different debate turns. For instance, in the initial round consisting of two turns, the first participant presents their perspective on the topic while the second participant challenges their opponent’s arguments and introduces their own viewpoint. We incorporate the temporal turn information into the node features by concatenating it with the sentence embedding 𝐡isubscript𝐡𝑖\mathbf{h}_{i}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We opt for a 30-dimensional embedding vector 𝐡itR30subscript𝐡𝑖𝑡superscript𝑅30\mathbf{h}_{it}\in R^{30}bold_h start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT ∈ italic_R start_POSTSUPERSCRIPT 30 end_POSTSUPERSCRIPT to represent the turn information for each node.

𝐡i=𝐡i𝐡itsubscript𝐡𝑖conditionalsubscript𝐡𝑖subscript𝐡𝑖𝑡\mathbf{h}_{i}=\mathbf{h}_{i}\|\mathbf{h}_{it}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ bold_h start_POSTSUBSCRIPT italic_i italic_t end_POSTSUBSCRIPT (1)

Let B𝐵Bitalic_B denote the number of dimensions of the embedding vector of a sentence from SBERT, then D=B+30𝐷𝐵30D=B+30italic_D = italic_B + 30.

III-B Information flow

Graph Attention Layer: We employ a Graph Attention Network (GAT) [3] layer to update the node representation. The attention mechanism allows GAT to focus on and weigh the importance of different neighbors when aggregating information for each node, called the “attention score”. We are motivated to use GAT in our model because, intuitively, not all sentences in the debate carry equal importance. One can detect the opponent’s argumentative “vulnerable region” [7] and effectively counter it to win the debate. This layer takes as input a set of A𝐴Aitalic_A (AN𝐴𝑁A\leq Nitalic_A ≤ italic_N) node features 𝐡A×D𝐡superscript𝐴𝐷\mathbf{h}\in\mathbb{R}^{A\times D}bold_h ∈ blackboard_R start_POSTSUPERSCRIPT italic_A × italic_D end_POSTSUPERSCRIPT and produces a new set of node features 𝐡A×Dsuperscript𝐡superscript𝐴superscript𝐷\mathbf{h^{\prime}}\in\mathbb{R}^{A\times D^{\prime}}bold_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_A × italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT (D<Dsuperscript𝐷𝐷D^{\prime}<Ditalic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT < italic_D). The attention score of sentence j𝑗jitalic_j to sentence i𝑖iitalic_i is computed as:

αij=exp(LeakyReLU(𝐚T[𝐖𝐡i||𝐖𝐡j]))k𝒩exp(LeakyReLU(𝐚T[𝐖𝐡i||𝐖𝐡k])\alpha_{ij}=\frac{\exp(\mbox{LeakyReLU}(\mathbf{a}^{T}[\mathbf{W}\mathbf{h}_{i% }||\mathbf{W}\mathbf{h}_{j}]))}{\sum_{k\in\mathcal{N}}\exp(\mbox{LeakyReLU}(% \mathbf{a}^{T}[\mathbf{W}\mathbf{h}_{i}||\mathbf{W}\mathbf{h}_{k}])}italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG roman_exp ( LeakyReLU ( bold_a start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ bold_Wh start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | bold_Wh start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N end_POSTSUBSCRIPT roman_exp ( LeakyReLU ( bold_a start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ bold_Wh start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | bold_Wh start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ) end_ARG

where 𝐖D×D𝐖superscript𝐷superscript𝐷\mathbf{W}\in\mathbb{R}^{D\times D^{\prime}}bold_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_D × italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and 𝐚2D𝐚superscript2superscript𝐷\mathbf{a}\in\mathbb{R}^{2D^{\prime}}bold_a ∈ blackboard_R start_POSTSUPERSCRIPT 2 italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT are trainable weight matrix and vector of the layer. The output features of node i𝑖iitalic_i is the weighted sum of the features of its neighboring node set 𝒩isubscript𝒩𝑖\mathcal{N}_{i}caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT:

𝐡i=j𝒩iαij𝐖𝐡𝐣subscriptsuperscript𝐡𝑖subscript𝑗subscript𝒩𝑖subscript𝛼𝑖𝑗subscript𝐖𝐡𝐣\mathbf{h}^{\prime}_{i}=\sum_{j\in\mathcal{N}_{i}}\alpha_{ij}\mathbf{W}\mathbf% {\mathbf{h}_{j}}bold_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_Wh start_POSTSUBSCRIPT bold_j end_POSTSUBSCRIPT

In this work, we employ three distinct GAT layers, each responsible for aggregating information from a specific type of edge. We refer to these layers as GATI (intra-argument edge), GATC (counterargument edge), and GATS (supporting edge). At each turn, the GAT layer processes a specific set of input node features and produces a new set of features, called interaction representation of each sentence:

𝐡Itsubscriptsuperscript𝐡𝑡𝐼\displaystyle\mathbf{h}^{t}_{I}bold_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT =GATI(𝐡t;𝐚I,𝐖I)absentGATIsubscript𝐡subscript𝑡superscript𝐚𝐼superscript𝐖𝐼\displaystyle=\text{GATI}(\mathbf{h}_{\mathcal{I}_{t}};\mathbf{a}^{I},\mathbf{% W}^{I})= GATI ( bold_h start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; bold_a start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT , bold_W start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT ) (2)
𝐡Ctsubscriptsuperscript𝐡𝑡𝐶\displaystyle\mathbf{h}^{t}_{C}bold_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT =GATC(𝐡𝒥t;𝐚C,𝐖C)absentGATCsubscript𝐡subscript𝒥𝑡superscript𝐚𝐶superscript𝐖𝐶\displaystyle=\text{GATC}(\mathbf{h}_{\mathcal{J}_{t}};\mathbf{a}^{C},\mathbf{% W}^{C})= GATC ( bold_h start_POSTSUBSCRIPT caligraphic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; bold_a start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT , bold_W start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT ) (3)
𝐡Stsubscriptsuperscript𝐡𝑡𝑆\displaystyle\mathbf{h}^{t}_{S}bold_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT =GATS(𝐡𝒦t;𝐚S,𝐖S)absentGATSsubscript𝐡subscript𝒦𝑡superscript𝐚𝑆superscript𝐖𝑆\displaystyle=\text{GATS}(\mathbf{h}_{\mathcal{K}_{t}};\mathbf{a}^{S},\mathbf{% W}^{S})= GATS ( bold_h start_POSTSUBSCRIPT caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ; bold_a start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT , bold_W start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT ) (4)

where 𝐚superscript𝐚\mathbf{a}^{*}bold_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and 𝐖superscript𝐖\mathbf{W}^{*}bold_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT are vectors and matrices associated with each layer. Here, we have three sets of node features: 𝐡tsubscript𝐡subscript𝑡\mathbf{h}_{\mathcal{I}_{t}}bold_h start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT, 𝐡𝒥tsubscript𝐡subscript𝒥𝑡\mathbf{h}_{\mathcal{J}_{t}}bold_h start_POSTSUBSCRIPT caligraphic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and 𝐡𝒦tsubscript𝐡subscript𝒦𝑡\mathbf{h}_{\mathcal{K}_{t}}bold_h start_POSTSUBSCRIPT caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT, each corresponding to distinct node sets:

  • tsubscript𝑡\mathcal{I}_{t}caligraphic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT represents the set of nodes that pertain to the same time step, encompassing nodes within the current turn. 𝐡t={𝐡1t,𝐡2t,𝐡3t,}subscript𝐡subscript𝑡subscriptsuperscript𝐡𝑡1subscriptsuperscript𝐡𝑡2subscriptsuperscript𝐡𝑡3\mathbf{h}_{\mathcal{I}_{t}}=\{\mathbf{h}^{t}_{1},\mathbf{h}^{t}_{2},\mathbf{h% }^{t}_{3},...\}bold_h start_POSTSUBSCRIPT caligraphic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { bold_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , … } denotes features matrix of a set of nodes at time t𝑡titalic_t.

  • 𝒦tsubscript𝒦𝑡\mathcal{K}_{t}caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT comprises nodes from time steps t2𝑡2t-2italic_t - 2 and t𝑡titalic_t, all originating from the same debater and exhibiting a supportive relationship. This set characterizes argumentative enhancement or promotion. Note that the set of node features at time t2𝑡2t-2italic_t - 2 are updated in turn t2𝑡2t-2italic_t - 2. Therefore, 𝐡𝒥t={𝐡1t2,𝐡2t2,,𝐡1t,𝐡2t,}subscript𝐡subscript𝒥𝑡subscriptsuperscript𝐡𝑡21subscriptsuperscript𝐡𝑡22subscriptsuperscript𝐡𝑡1subscriptsuperscript𝐡𝑡2\mathbf{h}_{\mathcal{J}_{t}}=\{\mathbf{h}^{\prime t-2}_{1},\mathbf{h}^{\prime t% -2}_{2},...,\mathbf{h}^{t}_{1},\mathbf{h}^{t}_{2},...\}bold_h start_POSTSUBSCRIPT caligraphic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { bold_h start_POSTSUPERSCRIPT ′ italic_t - 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ′ italic_t - 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … } denotes the updated features matrix of a set of nodes at times t1𝑡1t-1italic_t - 1 and utterance matrix of nodes at t𝑡titalic_t.

  • In contrast, 𝒥tsubscript𝒥𝑡\mathcal{J}_{t}caligraphic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT encompasses nodes from time steps t1𝑡1t-1italic_t - 1 and t𝑡titalic_t and signifies an adversarial relation, capturing how a debater challenges an opponent’s position by considering nodes from the opponent’s previous turn (t1𝑡1t-1italic_t - 1). Because nodes feature at time t1𝑡1t-1italic_t - 1 are updated, 𝐡𝒦t={𝐡1t2,𝐡1t2,,𝐡1t,𝐡2t,}subscript𝐡subscript𝒦𝑡subscriptsuperscript𝐡𝑡21subscriptsuperscript𝐡𝑡21subscriptsuperscript𝐡𝑡1subscriptsuperscript𝐡𝑡2\mathbf{h}_{\mathcal{K}_{t}}=\{\mathbf{h}^{\prime t-2}_{1},\mathbf{h}^{\prime t% -2}_{1},...,\mathbf{h}^{t}_{1},\mathbf{h}^{t}_{2},...\}bold_h start_POSTSUBSCRIPT caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { bold_h start_POSTSUPERSCRIPT ′ italic_t - 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT ′ italic_t - 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … }.

Sequential Update

The node features are updated sequentially using a temporal attention mechanism. Information propagation occurs along directed edges, and the features of nodes at time t𝑡titalic_t are updated based on their neighboring nodes from the same turn (via intra-argument edges) as well as nodes from previous turns (via cross-argument edges) (Figure 1). This information flow scheme illustrates the cognitive process of a debater during their turn, as they must consider the opponent’s previous arguments, formulate counterarguments, reinforce their own points, and even introduce new ideas. The node features updated at time t𝑡titalic_t serve as the input when updating node features at times t+τ𝑡𝜏t+\tauitalic_t + italic_τ (τ{1,2}𝜏12\tau\in\{1,2\}italic_τ ∈ { 1 , 2 }). This process shares similarities with traditional RNNs like LSTM and GRU. However, it is important to note that our work focuses on handling a specific subset of nodes at each timestep. This distinction sets us apart from Gated Graph Sequence Neural Networks [8] that process the entire graph as input at each timestep. Similar to an RNN-Cell, that operates on a single input element at each time step and generates output that serves as a hidden feature for subsequent times, we introduce the SGA layer to manage the processing of a specific subset of nodes at time t𝑡titalic_t. The entire debate graph is processed sequentially subgraph-by-subgraph.

Refer to caption
Figure 3: The proposed architecture consists of three key modules: (1) Information propagation is driven by the SGA layers, updating node features sequentially using a graph attention mechanism. (2) The readout layer identifies representative vectors associated with each debater, which are subsequently supplied as input to (3) an MLP classifier for predicting the debate winner.

Given a debate 𝒮𝒮\mathcal{S}caligraphic_S that has T𝑇Titalic_T turns: 𝕊={St;t[0,T1]}𝕊subscript𝑆𝑡𝑡0𝑇1\mathbb{S}=\{S_{t};t\in[0,T-1]\}blackboard_S = { italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; italic_t ∈ [ 0 , italic_T - 1 ] }, St={sjt;j[0,Mt1]}subscript𝑆𝑡superscriptsubscript𝑠𝑗𝑡𝑗0subscript𝑀𝑡1S_{t}=\{s_{j}^{t};j\in[0,M_{t}-1]\}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ; italic_j ∈ [ 0 , italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 1 ] } denotes a debate turn consisting of Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT sentences sjtsuperscriptsubscript𝑠𝑗𝑡s_{j}^{t}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. It is noticeable that N=t=0T1Mt𝑁superscriptsubscript𝑡0𝑇1subscript𝑀𝑡N=\sum_{t=0}^{T-1}M_{t}italic_N = ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Let 𝐡jtsuperscriptsubscript𝐡𝑗𝑡\mathbf{h}_{j}^{t}bold_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT the utterance embedding of the sentence sjsubscript𝑠𝑗s_{j}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (from 1), the new node feature hjsubscriptsuperscript𝑗h^{\prime}_{j}italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is calculated using the SGA layer which executes the following operations (we discard the superscript t𝑡titalic_t for readability):

hj=SGA(𝐡j,𝐡,𝐡𝒥,𝐡𝒦)=𝐡j𝐡jsubscriptsuperscript𝑗𝑆𝐺𝐴subscript𝐡𝑗subscript𝐡subscript𝐡𝒥subscript𝐡𝒦tensor-productsubscript𝐡𝑗subscriptsuperscript𝐡𝑗h^{\prime}_{j}=SGA(\mathbf{h}_{j},\mathbf{h}_{\mathcal{I}},\mathbf{h}_{% \mathcal{J}},\mathbf{h}_{\mathcal{K}})=\mathbf{h}_{j}\otimes\mathbf{h}^{\text{% X }}_{j}italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_S italic_G italic_A ( bold_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT caligraphic_I end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT caligraphic_J end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT ) = bold_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⊗ bold_h start_POSTSUPERSCRIPT X end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (5)

where tensor-product\otimes is the update operator using GRU operations [13]. The 𝐡jsubscriptsuperscript𝐡𝑗\mathbf{h}^{\text{X }}_{j}bold_h start_POSTSUPERSCRIPT X end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT denotes the interaction representation feature at time t𝑡titalic_t, encompassing intra-argument coherency, counterarguments against the opponent’s points, and reinfordcement of the debater’s previous statements. It is calculated by concatenating the node features produced by three component GAT layers (equations 2, 3, 4):

𝐡j=𝐡jGATI  𝐡jGATC  𝐡jGATS subscriptsuperscript𝐡𝑗subscriptsuperscript𝐡GATI 𝑗norm subscriptsuperscript𝐡GATC 𝑗 subscriptsuperscript𝐡GATS 𝑗\mathbf{h}^{\text{X }}_{j}=\mathbf{h}^{\text{GATI }}_{j}||\text{ }\mathbf{h}^{% \text{GATC }}_{j}||\text{ }\mathbf{h}^{\text{GATS }}_{j}bold_h start_POSTSUPERSCRIPT X end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = bold_h start_POSTSUPERSCRIPT GATI end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | | bold_h start_POSTSUPERSCRIPT GATC end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | | bold_h start_POSTSUPERSCRIPT GATS end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (6)

It is important to observe that during the initial turn, denoted as t=0𝑡0t=0italic_t = 0, there are no counterarguments in the debater’s thoughts. As a result, we initialize 𝐡jGATC 0subscriptsuperscript𝐡subscriptGATC 0𝑗\mathbf{h}^{\text{GATC }_{0}}_{j}bold_h start_POSTSUPERSCRIPT GATC start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to be equal to 0. Additionally, a debater does not introduce a reinforcing argument until their second round (or when t2𝑡2t\geq 2italic_t ≥ 2). Consequently, both 𝐡jGATS 0subscriptsuperscript𝐡subscriptGATS 0𝑗\mathbf{h}^{\text{GATS }_{0}}_{j}bold_h start_POSTSUPERSCRIPT GATS start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and 𝐡jGATS 1subscriptsuperscript𝐡subscriptGATS 1𝑗\mathbf{h}^{\text{GATS }_{1}}_{j}bold_h start_POSTSUPERSCRIPT GATS start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are set to 0 during this period. The updated node features hjsubscriptsuperscript𝑗h^{\prime}_{j}italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are then employed to update the attributes of nodes in subsequent turns.

III-C Readout Layer

Once all the node features have been updated, we employ a readout layer to “summarize” the ideas presented by each participant during the debate. For each debater, we select a set of top r𝑟ritalic_r (e.g., r=3𝑟3r=3italic_r = 3) representatives, which are used as input for the prediction classifier. The process of selecting these representative nodes is determined by the highest ”attention scores” generated by each GATI, GATC, and GATS layers, denoted as αIsubscript𝛼𝐼\alpha_{I}italic_α start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT, αCsubscript𝛼𝐶\alpha_{C}italic_α start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT, and αSsubscript𝛼𝑆\alpha_{S}italic_α start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, respectively. During the feature update step, each node receives an attention score from its neighboring nodes. These scores emphasize the significance of a node in relation to others. The more significant a node is, the greater its contribution to a debater’s overall idea. The total attention received by each node is obtained by summing up its individual attention scores. Consider a node slsubscript𝑠𝑙s_{l}italic_s start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, its attention scores are:

αslI=iαi, αslC=j𝒥αj, αslS=k𝒦αkformulae-sequencesubscriptsuperscript𝛼𝐼subscript𝑠𝑙subscript𝑖subscript𝛼𝑖formulae-sequence subscriptsuperscript𝛼𝐶subscript𝑠𝑙subscript𝑗𝒥subscript𝛼𝑗 subscriptsuperscript𝛼𝑆subscript𝑠𝑙subscript𝑘𝒦subscript𝛼𝑘\alpha^{I}_{s_{l}}=\sum_{i\in\mathcal{I}}\alpha_{i},\text{ }\alpha^{C}_{s_{l}}% =\sum_{j\in\mathcal{J}}\alpha_{j},\text{ }\alpha^{S}_{s_{l}}=\sum_{k\in% \mathcal{K}}\alpha_{k}italic_α start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_α start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_J end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_α start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_K end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (7)

We opt to select the top r𝑟ritalic_r nodes with the highest scores for each type of attention. We then concatenate the feature vectors corresponding to these selected nodes to create a 3×r×D3𝑟superscript𝐷3\times r\times D^{\prime}3 × italic_r × italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT-dimensional vector, where Dsuperscript𝐷D^{\prime}italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the dimension of the node feature produced by SGA. The readout layer subsequently generates two “summary” vectors, each serving as a deep representation of each debater’s performance during the debate.

III-D Classification

The two vectors, 𝐐PROSsuperscript𝐐𝑃𝑅𝑂𝑆\mathbf{Q}^{PROS}bold_Q start_POSTSUPERSCRIPT italic_P italic_R italic_O italic_S end_POSTSUPERSCRIPT and 𝐐CONSsuperscript𝐐𝐶𝑂𝑁𝑆\mathbf{Q}^{CONS}bold_Q start_POSTSUPERSCRIPT italic_C italic_O italic_N italic_S end_POSTSUPERSCRIPT achieved by the readout layer are fed to the classifier to perform the prediction task. Each vector is mapped to a score value c1𝑐superscript1c\in\mathbb{R}^{1}italic_c ∈ blackboard_R start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT by linear transformation using a Fully Connected (FC) layer followed by an activation function (e.g., ReLU), Layer Norm (LN) [14] and dropout layer [24]. Let us denote a series of FC + ReLU + LN + Dropout an MLP, then

cPROSsuperscript𝑐𝑃𝑅𝑂𝑆\displaystyle c^{PROS}italic_c start_POSTSUPERSCRIPT italic_P italic_R italic_O italic_S end_POSTSUPERSCRIPT =MLP1(𝐐PROS)absentMLP1superscript𝐐𝑃𝑅𝑂𝑆\displaystyle=\textit{MLP1}(\mathbf{Q}^{PROS})= MLP1 ( bold_Q start_POSTSUPERSCRIPT italic_P italic_R italic_O italic_S end_POSTSUPERSCRIPT )
cCONSsuperscript𝑐𝐶𝑂𝑁𝑆\displaystyle c^{CONS}italic_c start_POSTSUPERSCRIPT italic_C italic_O italic_N italic_S end_POSTSUPERSCRIPT =MLP2(𝐐CONS)absentMLP2superscript𝐐𝐶𝑂𝑁𝑆\displaystyle=\textit{MLP2}(\mathbf{Q}^{CONS})= MLP2 ( bold_Q start_POSTSUPERSCRIPT italic_C italic_O italic_N italic_S end_POSTSUPERSCRIPT )

If the Pros side wins, we expect that cPROS>cCONSsuperscript𝑐𝑃𝑅𝑂𝑆superscript𝑐𝐶𝑂𝑁𝑆c^{PROS}>c^{CONS}italic_c start_POSTSUPERSCRIPT italic_P italic_R italic_O italic_S end_POSTSUPERSCRIPT > italic_c start_POSTSUPERSCRIPT italic_C italic_O italic_N italic_S end_POSTSUPERSCRIPT, and conversely when the Cons side wins. Here, we denote C+superscript𝐶C^{+}italic_C start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and Csuperscript𝐶C^{-}italic_C start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT as the scores of the winner and loser, respectively. Our objective is to maximize the difference between C+superscript𝐶C^{+}italic_C start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and Csuperscript𝐶C^{-}italic_C start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT as much as possible. To achieve this, we employ Pairwise Cross-Entropy (PCE) loss, that minimizes:

=PCE(C+,C)=log(1+exp(CC+))PCEsuperscript𝐶superscript𝐶1𝑒𝑥𝑝superscript𝐶superscript𝐶\mathcal{L}=\text{PCE}(C^{+},C^{-})=\log(1+exp(C^{-}-C^{+}))caligraphic_L = PCE ( italic_C start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_C start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) = roman_log ( 1 + italic_e italic_x italic_p ( italic_C start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT - italic_C start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) ) (8)

The network architecture is illustrated in Figure 3.

IV Evaluation

IV-A Dataset

Our study is conducted on the debate.org dataset collected by [1]. The dataset contains 78,376 debates on controversial topics, including abortion, death penalty, gay marriage, and affirmative action. Each debate consists of multiple rounds in which two participants from two opposing sides take turns expressing their opinions. Further details can be found in [1].

Winning criterion

The winner is determined by the criterion of “Made more convincing arguments”. We exclude debates with fewer than 5 voters and tie debates. Additionally, debates in which the winner has just one more vote than the loser are also classified as ties.

Preprocessing

To study the interaction among debates, we only keep debates that have at least 3 rounds (equivalent to 6 turns). Short arguments are also eliminated, i.e., we remove debates that have fewer than 5 sentences in each round (each graph thereby has at least 30 vertices). The first 3 rounds of longer debates are used for analysis. The dataset exhibits an imbalance, with the Cons side accounting for 65% of the winners whereas the Pros side wins only 35%. To create a balanced dataset, we also use the final 3 rounds of the debates where the Pros side wins and the debate comprises more than three rounds. This data augmentation step also increases the size of the dataset.

Statistics

After the experimental dataset selection step, there are a total of 2,445 debates available for model training and testing. Among these debates, the Pros side wins in 1,130 debates, while the Cons side secures victory in 1,325 debates. Additional statistical information is shown in table I. Observing the table, it becomes evident that the winning side tends to produce more sentences and more counterarguments compared to the losing side. Conversely, the losing side appears to prioritize reinforcing their own ideas rather than generating a higher number of counterarguments.

TABLE I: The number of sentences, number of Counterargument Edges, and number of Supporting Edges made by winner and loser in an argument turn. Cross-argument edges are constructed using a similarity threshold of 0.850.850.850.85.
#Sentences #Countering #Supporting
Winner 38.6 6.96 5.93
Loser 36.1 6.78 6.64

IV-B Experimental setup

Data Preprocessing

We randomly split the dataset with 60% for training, 20% for validation and 20% for testing. For text normalization, we employ the following steps: (1) replacing URLs with “website”, (2) replacing all the numbers with “number”, and (3) lowercasing text. Next, we employed spaCy [23] for sentence tokenization. Sentences are then encoded by SBERT’s “all-MiniLM-L6-v2” model that transforms a sentence into a 384-dimensional vector.

Parameter setting

We use a similarity threshold of 0.85 for cross-argument edge construction, other approaches regarding edge construction will be further discussed in the ablation study. The intra-argument distance threshold is d=3𝑑3d=3italic_d = 3. Each node within a turn links to nodes that share a relative positive correlation within a 3-node proximity. Node features updated by each GAT layer have D=32superscript𝐷32D^{\prime}=32italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 32 dimension. For the readout layer, we choose r=3𝑟3r=3italic_r = 3. We use a stack of three MLPs to transform the readout layer’s output into a score for each debater. The first layer reduces the vector from 3×r×D3𝑟𝐷3\times r\times D3 × italic_r × italic_D to half its size. The second layer further reduces the output of the first layer by half, and the final layer maps the second output vector to a real value. We apply the tanh function to ensure the value falls within the range [-1; 1]. For hyper-parameters, we apply the dropout rate of 0.2 for all GAT layers and the classifier. Optimization is performed using Adam [16]. The batch size is 32. We run the model for 50 epochs with early stopping. The learning rate is 0.0001.

Other settings

Deep learning frameworks are Pytorch [21] and Pytorch Lightning [22]. We use DGL package [20] as the graph deep learning framework. The networks are trained and tested on an NVIDIA Quadro RTX 8000 GPU with 50GB of memory.

IV-C Comparison baselines

Given that the Cons side accounts for 52.5% of wins in the test set, it serves as the majority baseline, representing the best prediction one can make regardless of the input features. We compare our model’s performance to SOTAs in debate winning prediction which adopt sequence approach in their work.

Sequence approach

In the study by [11], they aggregate the entire discussion into a single sequence and model it using LSTM with an attention mechanism applied to the sentences, referred to as the all-LSTM approach. They also incorporate implicit discourse relations using the Penn Discourse Tree Bank [25] discourse structure. While their research primarily centers on the Reddit dataset [35], we apply the same methodology to our debate dataset. Additionally, we find relevance in the work of [26], denoted as ASODP, which shares our focus on Oxford-style debates and employs a sequential approach for debate analysis. Furthermore, [27] introduces the DTDMN method, designed to process pairs of conversations and predict their persuasiveness. Similarly, we present the Pros and Cons sides as inputs to facilitate comparative analysis.

Graph approach

To highlight the significance of processing the debate on a turn-by-turn basis, we introduce two baseline models for graph analysis. The first baseline employs a 2-layer GAT network, while the second baseline utilizes a GGNN. These GNNs serve as information aggregators and feature extractors for the debate graph, simultaneously processing all nodes in the graph (and repeating this process 6 times, corresponding to 6 turns in the case of GGNN). In the case of GAT, the initial layer transforms the input into 64-dimensional vectors, and the subsequent layer maps the output from the first layer to 32-dimensional features. In GGNN, we also utilize a 32-dimensional output feature size to align with the output feature size of our SGA layer. To summarize the node features for each debater, we introduce a mean readout operation.

Temporal graph approach

Since no other sequential graph approach exists for debate winning prediction, we adopt the information flow method proposed in [33] (Graphflow), initially designed for machine comprehension. We utilize the output of the RGNN layer from the final turn, feeding it into the MLP layer for the prediction task.

IV-D Experimental results

TABLE II: Debate-winning prediction results. The best results are in bold. (**: Using the top 3 highest similarity scores to construct cross-argument edges, *: Using a threshold value of 0.85 to construct cross-argument edges).
Models Acc. F1
Majority Baseline 0.525
Sequence Baseline
all-LSTM 0.635 0.563
ASODP 0.656 0.623
DTDMN 0.660 0.625
Graph Baseline
GAT 0.541 0.472
GGNN 0.565 0.522
Sequence Graph Baseline
Graphflow 0.645 0.620
SGA
w/o GATI 0.621 0.523
w/o GATC 0.562 0.495
w/o GATS 0.629 0.534
FULL MODEL
S = 0.85 0.654 0.667
∗∗k = 3 0.675 0.625
Refer to caption
Figure 4: Impact of cross-argument construction values on network performance. Left: Edge construction using a threshold value. Right: Edge construction using top-k highest values.

The evaluation results are presented in Table II. The sequence baselines (all-LSTM, ASODP, DTDMN) all perform similarly, with DTDMN producing the best accuracy of this group at 66.0%. The Graph baselines perform more poorly, with the highest accuracy, 56.5% produced by SSGN. Graphflow, boasting an accuracy of 64.5%, outperforms traditional graph approaches. However, it still trails behind the robust benchmarks set by sequential approaches such as ASODP and DTDMN. Our full model, SGA with k=3, outperforms all baselines with an accuracy of 67.5%, a 1.5% absolute (2.3% relative) improvement over DTDMN. The F1-score, achieved by constructing cross-argument edges with a threshold of 0.85, significantly outperforms the baselines. It reaches 66.7%, representing a 4.2% absolute (or 6.7% relative) improvement over DTDMN. We thus demonstrate that we outperform state of the art models for this dataset.

The performances of all-LSTM and DTDMN are diminished when applied to the debate.org dataset. This can be attributed to a fundamental distinction between the two domains. In the context of debate.org, the ultimate determination of the winner is not based on subjective criteria but rather relies on the judgments of a panel of judges or the voters. The voters place substantial emphasis on the debaters’ ability to rigorously address and counter their opponents’ reasoning. Furthermore, they favor debaters who engage in high-quality and dynamic interactions throughout the debates.

Sequence matters

The results show a significant superiority of sequence-based baselines over graph-based ones when applied to the debate dataset. This highlights the critical significance of adopting a sequential approach, where the debate is processed turn-by-turn, rather than relying solely on graph-based methodologies.

Counter-argument is crucial

We extended our analysis by performing an ablation study to assess the individual impact of each GAT layer on our proposed SGA model. We observe that when we omit the counter-argument edges, the reduction in network performance was more significant compared to scenarios where we exclude either GATI or GATS layers. Specifically, accuracy drops by 11.3%, in contrast to 5.4% and 4.6%, respectively. This outcome can be elucidated by considering that if a debater disregards the opponent’s remarks from the preceding turn, their persuasive ability may diminish in the eyes of the voters or judges. In essence, acknowledging and responding to counter-arguments plays a pivotal role in constructing compelling arguments in a debate context.

IV-E Impact of graph parameters

We conduct a detailed analysis of the impact of graph construction parameters, such as Sthsubscript𝑆𝑡S_{th}italic_S start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT and k𝑘kitalic_k, on the network’s performance (Figure 4). In the context of employing a similarity threshold, it is noteworthy that a threshold value of 0.850.850.850.85 yields the highest performance in terms of accuracy and F1-score.

Regarding the top-k approach, it is worth highlighting that while k=3𝑘3k=3italic_k = 3 achieves the highest accuracy, as well as highest F1-score. These insights into parameter effects contribute to a deeper understanding of how to optimize network performance for specific objectives and trade-offs.

V Related work

Graph Neural Networks (GNNs) have proven to be powerful tools for harnessing insights into, and making predictions on, data structured as graphs, particularly in the realm of Natural Language Processing (NLP). Within NLP, GNNs have been applied to a wide spectrum of tasks including, but not limited to, dependency parsing [29], sentiment analysis [30][31], and semantic understanding [15]. In recent developments, researchers in NLP have extended GNNs by integrating them with RNNs to enable sequential processing of graph-structured data. Notably, [32] introduced a graph-to-sequence methodology for the AMR-to-text generation task, wherein they construct an Abstract Meaning Representation (AMR) graph and progressively update the entire graph during sequential generation. Furthermore, [33] made significant strides in the domain of machine comprehension by incorporating conversation history into their model. They adopt a graph-based approach, constructing a graph that evolves with each conversational turn. While our work shares a commonality in the sequential update of subgraphs, it is important to emphasize that the implementation details diverge significantly. Researchers have explored temporal graph approaches for tasks like traffic flow forecasting  [36][37] and skeleton-based action recognition [38]. However, the utilization of sequence graph approaches in conversation analysis, particularly within online debate and argumentative analysis contexts, remains relatively unexplored.

VI Conclusion and future work

In conclusion, the task of modeling online debates, characterized by the dynamic exchange of ideas, is a challenging endeavor. To tackle this complexity, we introduced a novel approach using sequence-graph modeling. By representing conversations as graphs, we effectively captured the interactions among participants through directed edges, while the sequential propagation of information along these edges enriched our understanding of context. Our incorporation of the SGA layer demonstrated the efficacy of our information update scheme. Our experimental results demonstrate the success of sequence graph networks in outperforming existing methods when applied to Oxford-style online debate dataset.

The proposed method not only advances the ability to model dynamic discussions but also highlights the potential of sequence-graph approaches for a wide range of tasks involving sequential interactions and context-rich data. As online debates continue to evolve, the techniques presented in this paper offer valuable insights into improving our understanding of complex conversational dynamics.

While the proposed method has demonstrated promising results in predicting debate outcomes, it does exhibit certain limitations. Firstly, the construction of cross-argument edges relies solely on similarity scores. While this approach may suffice for reinforcing connections, it may not consistently identify valid counterarguments. High similarity scores between two sentences do not guarantee a counterrelation. Secondly, the method overlooks the utilization of argument structures. The intra-argument links primarily capture temporal relationships by connecting adjacent sentences. However, this approach fails to account for potential relationships between sentences that are distant within an argument turn. There is room for improvement by incorporating pre-trained models that account for argumentative structures. For instance, [26] enhanced predictability on debate datasets by integrating argument structure introduced by [34].

Acknowledgment

This material is based upon work supported by the National Science Foundation under Award # OIA-1946391, “Data Analytics that are Robust and Trusted (DART)”.

References

  • [1] E. Durmus and C. Cardie, “A corpus for modeling user and language effects in argumentation on online debating,” Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Italy, pp. 602–607, 2019.
  • [2] N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, pp. 3982–3992, 2019.
  • [3] P. Veličković et al., “Graph attention networks,” Proceedings of the 6th International Conference on Learning Representations, Canada, 2018.
  • [4] J. Zhang, R. Kumar, S. Ravi, and C. Danescu-Niculescu-Mizil, “Conversational flow in Oxford-style debates,” Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, California, pp. 136–141, 2016.
  • [5] J. Bao et al., “Argument pair extraction with mutual guidance and inter-sentence relation graph,” Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3923–3934, 2021.
  • [6] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Qatar, pp. 1532–1543, 2014.
  • [7] Y. Jo et al., “Attentive interaction model: Modeling changes in view in argumentation,” in Proc. of the 2018 Conf. of the North American Chapter of the Assoc. for Computational Linguistics: Human Language Technologies, Vol. 1, pp. 103-116, 2018.
  • [8] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph sequence neural networks,” Proceedings of the 4th International Conference on Learning Representations, Puerto Rico, 2016.
  • [9] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
  • [10] L. Ji et al., “Incorporating Argument-Level Interactions for Persuasion Comments Evaluation using Co-attention Model,” Proceedings of the 27th International Conference on Computational Linguistics, USA, pp. 3703–3714, 2018.
  • [11] C. Hidey and K. McKeown, “Persuasive influence detection: The role of argument sequencing,” Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5173–5180, 2018.
  • [12] H. Chen, P. Hong, W. Han, N. Majumder, and S. Poria, “Dialogue relation extraction with document-level heterogeneous graph attention networks,” Cognitive Computation, vol. 15, no. 2, pp. 793-802, 2023.
  • [13] K. Cho et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Qatar, pp. 1724–1734, 2014.
  • [14] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
  • [15] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” Proceedings of the 5th International Conference on Learning Representations, France, 2017.
  • [16] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” in 3rd International Conference on Learning Representations, USA, 2015.
  • [17] S. Wu, F. Sun, W. Zhang, X. Xie, and B. Cui, “Graph neural networks in recommender systems: a survey,” ACM Computing Surveys, vol. 55, pp. 1-37, 2020.
  • [18] W. Fan et al., “Graph neural networks for social recommendation.” In The world wide web conference, pp. 417-426. 2019.
  • [19] Q. Cao, H. Shen, J. Gao, B. Wei, and X. Cheng, “Popularity prediction on social platforms with coupled graph neural networks,” Proceedings of the 13th international conference on web search and data mining, pp. 70-78. 2020.
  • [20] M. Wang et al., “Deep graph library: A graph-centric, highly-performant package for graph neural networks,” arXiv preprint arXiv:1909.01315, 2019.
  • [21] A. Paszke et al., “Pytorch: An imperative style, high-performance deep learning library,” In Advances in neural information processing systems, pp. 8024-8035. 2019.
  • [22] W. Falcon et al., “Pytorch lightning,” GitHub 3, 2019.
  • [23] M. Honnibal and I. Montani, “spaCy: Industrial-Strength Natural Language Processing in Python,” available at https://spacy.io, 2021.
  • [24] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research 15, no. 1, pp. 1929-1958, 2014.
  • [25] R. Prasad et al., “The penn discourse treebank 2.0 annotation manual,” December 17, 2007.
  • [26] J. Li, E. Durmus, and C. Cardie, “Exploring the role of argument structure in online debate persuasion,” Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 8905–8912, 2020.
  • [27] J. Zeng et al., “What changed your mind: The roles of dynamic topics and discourse in argumentation process,” Proceedings of The Web Conference, pp. 1502-1513, 2020.
  • [28] L. Wu et al., “Graph neural networks for natural language processing: A survey,” Foundations and Trends® in Machine Learning 16.2, pp. 119-328, 2023.
  • [29] T. Ji, W. Yuanbin, and L. Man, “Graph-based dependency parsing with graph neural networks,” Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Italy, pp. 2475-2485, 2019.
  • [30] B. Liang, S. Hang, G. Lin, C. Erik, and X. Ruifeng, “Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks,” Knowledge-Based Systems, vol. 235, p. 10764, 2021.
  • [31] R. Li et al., “Dual graph convolutional networks for aspect-based sentiment analysis,” Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 6319–6329, 2021.
  • [32] L. Song, Y. Zhang, Z. Wang, and D. Gildea, “A graph-to-sequence model for AMR-to-text generation,” Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Long Papers, Australia, pp. 1616-1626, 2018.
  • [33] Y. Chen, L. Wu, and M. J. Zaki, “Graphflow: Exploiting conversation flow with graph neural networks for conversational machine comprehension,” Proceedings of the 29th International Joint Conference on Artificial Intelligence, Japan, pp. 1230-1236, 2020.
  • [34] V. Niculae, J. Park, and C. Cardie, “Argument mining with structured SVMs and RNNs,” Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, pp. 985-995, 2017.
  • [35] C. Tan, V. Niculae, C. Danescu-Niculescu-Mizil, and L. Lee, “Winning arguments: Interaction dynamics and persuasion strategies in good-faith online discussions,” Proceedings of the 25th International Conference on World Wide Web, pp. 613-624, 2016.
  • [36] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting,” Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 3634-3640, 2018.
  • [37] S. Guo, Y. Lin, N. Feng, C. Song, and H. Wan, “Attention based spatial-temporal graph convolutional networks for traffic flow forecasting,” Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, pp. 922-929, 2019.
  • [38] L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-stream adaptive graph convolutional networks for skeleton-based action recognition,” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12026-12035, 2019.