subscribe to arXiv mailings

Benchmarks for Physical Reasoning AI

Authors: Andrew Melnik, Robin Schiewer, Moritz Lange, Andrei Muresanu, Mozhgan Saeidi, Animesh Garg, Helge Ritter

Abstract: Physical reasoning is a crucial aspect in the development of general AI systems, given that human learning starts with interacting with the physical world before progressing to more complex concepts. Although researchers have studied and assessed the physical reasoning of AI approaches through various specific benchmarks, there is no comprehensive approach to evaluating and measuring progress. The… ▽ More Physical reasoning is a crucial aspect in the development of general AI systems, given that human learning starts with interacting with the physical world before progressing to more complex concepts. Although researchers have studied and assessed the physical reasoning of AI approaches through various specific benchmarks, there is no comprehensive approach to evaluating and measuring progress. Therefore, we aim to offer an overview of existing benchmarks and their solution approaches and propose a unified perspective for measuring the physical reasoning capacity of AI systems. We select benchmarks that are designed to test algorithmic performance in physical reasoning tasks. While each of the selected benchmarks poses a unique challenge, their ensemble provides a comprehensive proving ground for an AI generalist agent with a measurable skill level for various physical reasoning concepts. This gives an advantage to such an ensemble of benchmarks over other holistic benchmarks that aim to simulate the real world by intertwining its complexity and many concepts. We group the presented set of physical reasoning benchmarks into subcategories so that more narrow generalist AI agents can be tested first on these groups. △ Less

Submitted 17 December, 2023; originally announced December 2023.

arXiv:2311.08195 [pdf, other]

Automated Fact-Checking in Dialogue: Are Specialized Models Needed?

Authors: Eric Chamoun, Marzieh Saeidi, Andreas Vlachos

Abstract: Prior research has shown that typical fact-checking models for stand-alone claims struggle with claims made in dialogues. As a solution, fine-tuning these models on labelled dialogue data has been proposed. However, creating separate models for each use case is impractical, and we show that fine-tuning models for dialogue results in poor performance on typical fact-checking. To overcome this chall… ▽ More Prior research has shown that typical fact-checking models for stand-alone claims struggle with claims made in dialogues. As a solution, fine-tuning these models on labelled dialogue data has been proposed. However, creating separate models for each use case is impractical, and we show that fine-tuning models for dialogue results in poor performance on typical fact-checking. To overcome this challenge, we present techniques that allow us to use the same models for both dialogue and typical fact-checking. These mainly focus on retrieval adaptation and transforming conversational inputs so that they can be accurately predicted by models trained on stand-alone claims. We demonstrate that a typical fact-checking model incorporating these techniques is competitive with state-of-the-art models fine-tuned for dialogue, while maintaining its accuracy on stand-alone claims. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: Accepted to EMNLP 2023

arXiv:2308.03676 [pdf, other]

A Tractable Handoff-aware Rate Outage Approximation with Applications to THz-enabled Vehicular Network Optimization

Authors: Mohammad Amin Saeidi, Haider Shoaib, Hina Tabassum

Abstract: In this paper, we first develop a tractable mathematical model of the handoff (HO)-aware rate outage experienced by a typical connected and autonomous vehicle (CAV) in a given THz vehicular network. The derived model captures the impact of line-of-sight (LOS) Nakagami-m fading channels, interference, and molecular absorption effects. We first derive the statistics of the interference-plus-molecula… ▽ More In this paper, we first develop a tractable mathematical model of the handoff (HO)-aware rate outage experienced by a typical connected and autonomous vehicle (CAV) in a given THz vehicular network. The derived model captures the impact of line-of-sight (LOS) Nakagami-m fading channels, interference, and molecular absorption effects. We first derive the statistics of the interference-plus-molecular absorption noise ratio and demonstrate that it can be approximated by Gamma distribution using Welch-Satterthwaite approximation. Then, we show that the distribution of signal-to-interference-plus-molecular absorption noise ratio (SINR) follows a generalized Beta prime distribution. Based on this, a closed-form HO-aware rate outage expression is derived. Finally, we formulate and solve a CAVs' traffic flow maximization problem to optimize the base-stations (BSs) density and speed of CAVs with collision avoidance, rate outage, and CAVs' minimum traffic flow constraint. The CAVs' traffic flow is modeled using Log-Normal distribution. Our numerical results validate the accuracy of the derived expressions using Monte-Carlo simulations and discuss useful insights related to optimal BS density and CAVs' speed as a function of crash intensity level, THz molecular absorption effects, minimum road-traffic flow and rate requirements, and maximum speed and rate outage limits. △ Less

Submitted 25 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

Comments: This paper has been accepted in the IEEE Global Communications (GLOBECOM) 2023 conference

arXiv:2306.11167 [pdf, other]

Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset

Authors: Saeid Naeini, Raeid Saqur, Mozhgan Saeidi, John Giorgi, Babak Taati

Abstract: The quest for human imitative AI has been an enduring topic in AI research since its inception. The technical evolution and emerging capabilities of the latest cohort of large language models (LLMs) have reinvigorated the subject beyond academia to the cultural zeitgeist. While recent NLP evaluation benchmark tasks test some aspects of human-imitative behaviour (e.g., BIG-bench's 'human-like behav… ▽ More The quest for human imitative AI has been an enduring topic in AI research since its inception. The technical evolution and emerging capabilities of the latest cohort of large language models (LLMs) have reinvigorated the subject beyond academia to the cultural zeitgeist. While recent NLP evaluation benchmark tasks test some aspects of human-imitative behaviour (e.g., BIG-bench's 'human-like behavior' tasks), few, if not none, examine creative problem solving abilities. Creative problem solving in humans is a well-studied topic in cognitive neuroscience with standardized tests that predominantly use the ability to associate (heterogeneous) connections among clue words as a metric for creativity. Exposure to misleading stimuli - distractors dubbed red herrings - impede human performance in such tasks via the fixation effect and Einstellung paradigm. In cognitive neuroscience studies, such fixations are experimentally induced by pre-exposing participants to orthographically similar incorrect words to subsequent word-fragments or clues. The popular British quiz show Only Connect's Connecting Wall segment essentially mimics Mednick's Remote Associates Test (RAT) formulation with built-in, deliberate red herrings, which makes it an ideal proxy dataset to explore and study fixation effect and Einstellung paradigm from cognitive neuroscience in LLMs. In this paper we present the novel Only Connect Wall (OCW) dataset and report results from our evaluation of selected pre-trained language models and LLMs on creative problem solving tasks like grouping clue words by heterogeneous connections, and identifying correct open knowledge domain connections in respective groups. We synthetically generate two additional datasets: OCW-Randomized, OCW-WordNet to further analyze our red-herrings hypothesis in language models. The code and link to the dataset are available at https://github.com/TaatiTeam/OCW. △ Less

Submitted 8 November, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: v4,v3: Mincor cosmetic adjustments, typo-fixes etc. from V2. Fixed Fig. 2 caption overlapping with text in S2.2. V2: with added OCW-Randomized and OCW-WordNet results in Section 4.3 (added). 22 pages with Appendix

ACM Class: I.2.7

arXiv:2306.08781 [pdf, ps, other]

Resource Allocation and Performance Analysis of Hybrid RSMA-NOMA in the Downlink

Authors: Mohammad Amin Saeidi, Hina Tabassum

Abstract: Rate splitting multiple access (RSMA) and non-orthogonal multiple access (NOMA) are the key enabling multiple access techniques to enable massive connectivity. However, it is unclear whether RSMA would consistently outperform NOMA from a system sum-rate perspective, users' fairness, as well as convergence and feasibility of the resource allocation solutions. This paper investigates the weighted su… ▽ More Rate splitting multiple access (RSMA) and non-orthogonal multiple access (NOMA) are the key enabling multiple access techniques to enable massive connectivity. However, it is unclear whether RSMA would consistently outperform NOMA from a system sum-rate perspective, users' fairness, as well as convergence and feasibility of the resource allocation solutions. This paper investigates the weighted sum-rate maximization problem to optimize power and rate allocations in a hybrid RSMA-NOMA network. In the hybrid RSMA-NOMA, by optimally allocating the maximum power budget to each scheme, the BS operates on NOMA and RSMA in two orthogonal channels, allowing users to simultaneously receive signals on both RSMA and NOMA. Based on the successive convex approximation (SCA) approach, we jointly optimize the power allocation of users in NOMA and RSMA, the rate allocation of users in RSMA, and the power budget allocation for NOMA and RSMA considering successive interference cancellation (SIC) constraints. Numerical results demonstrate the trade-offs that hybrid RSMA-NOMA access offers in terms of system sum rate, fairness, convergence, and feasibility of the solutions. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: This paper has been accepted in the 2023 IEEE 34th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)

arXiv:2306.01069 [pdf, other]

TimelineQA: A Benchmark for Question Answering over Timelines

Authors: Wang-Chiew Tan, Jane Dwivedi-Yu, Yuliang Li, Lambert Mathias, Marzieh Saeidi, Jing Nathan Yan, Alon Y. Halevy

Abstract: Lifelogs are descriptions of experiences that a person had during their life. Lifelogs are created by fusing data from the multitude of digital services, such as online photos, maps, shopping and content streaming services. Question answering over lifelogs can offer personal assistants a critical resource when they try to provide advice in context. However, obtaining answers to questions over life… ▽ More Lifelogs are descriptions of experiences that a person had during their life. Lifelogs are created by fusing data from the multitude of digital services, such as online photos, maps, shopping and content streaming services. Question answering over lifelogs can offer personal assistants a critical resource when they try to provide advice in context. However, obtaining answers to questions over lifelogs is beyond the current state of the art of question answering techniques for a variety of reasons, the most pronounced of which is that lifelogs combine free text with some degree of structure such as temporal and geographical information. We create and publicly release TimelineQA1, a benchmark for accelerating progress on querying lifelogs. TimelineQA generates lifelogs of imaginary people. The episodes in the lifelog range from major life episodes such as high school graduation to those that occur on a daily basis such as going for a run. We describe a set of experiments on TimelineQA with several state-of-the-art QA models. Our experiments reveal that for atomic queries, an extractive QA system significantly out-performs a state-of-the-art retrieval-augmented QA system. For multi-hop queries involving aggregates, we show that the best result is obtained with a state-of-the-art table QA technique, assuming the ground truth set of episodes for deriving the answer is available. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2212.07606 [pdf, other]

Multi-band Wireless Networks: Architectures, Challenges, and Comparative Analysis

Authors: Mohammad Amin Saeidi, Hina Tabassum, Mohamed-Slim Alouini

Abstract: This paper presents the vision of multi-band communication networks (MBN) in 6G, where optical and TeraHertz (THz) transmissions will coexist with the conventional radio frequency (RF) spectrum. This paper will first pin-point the fundamental challenges in MBN architectures at the PHYsical (PHY) and Medium Access (MAC) layer, such as unique channel propagation and estimation issues, user offloadin… ▽ More This paper presents the vision of multi-band communication networks (MBN) in 6G, where optical and TeraHertz (THz) transmissions will coexist with the conventional radio frequency (RF) spectrum. This paper will first pin-point the fundamental challenges in MBN architectures at the PHYsical (PHY) and Medium Access (MAC) layer, such as unique channel propagation and estimation issues, user offloading and resource allocation, multi-band transceiver design and antenna systems, mobility and handoff management, backhauling, etc. We then perform a quantitative performance assessment of the two fundamental MBN architectures, i.e., {stand-alone MBN} and {integrated MBN} considering critical factors like achievable rate, and capital/operational deployment cost. {Our results show that stand-alone deployment is prone to higher capital and operational expenses for a predefined data rate requirement. Stand-alone deployment, however, offers flexibility and enables controlling the number of access points in different transmission bands.} In addition, we propose a molecular absorption-aware user offloading metric for MBNs and demonstrate its performance gains over conventional user offloading schemes. Finally, open research directions are presented. △ Less

Submitted 20 June, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

Comments: This work has been accepted to be published in IEEE Communications Magazine

arXiv:2211.01482 [pdf, other]

RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question

Authors: Alireza Mohammadshahi, Thomas Scialom, Majid Yazdani, Pouya Yanki, Angela Fan, James Henderson, Marzieh Saeidi

Abstract: Existing metrics for evaluating the quality of automatically generated questions such as BLEU, ROUGE, BERTScore, and BLEURT compare the reference and predicted questions, providing a high score when there is a considerable lexical overlap or semantic similarity between the candidate and the reference questions. This approach has two major shortcomings. First, we need expensive human-provided refer… ▽ More Existing metrics for evaluating the quality of automatically generated questions such as BLEU, ROUGE, BERTScore, and BLEURT compare the reference and predicted questions, providing a high score when there is a considerable lexical overlap or semantic similarity between the candidate and the reference questions. This approach has two major shortcomings. First, we need expensive human-provided reference questions. Second, it penalises valid questions that may not have high lexical or semantic similarity to the reference questions. In this paper, we propose a new metric, RQUGE, based on the answerability of the candidate question given the context. The metric consists of a question-answering and a span scorer modules, using pre-trained models from existing literature, thus it can be used without any further training. We demonstrate that RQUGE has a higher correlation with human judgment without relying on the reference question. Additionally, RQUGE is shown to be more robust to several adversarial corruptions. Furthermore, we illustrate that we can significantly improve the performance of QA models on out-of-domain datasets by fine-tuning on synthetic data generated by a question generation model and re-ranked by RQUGE. △ Less

Submitted 26 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted to Findings of ACL 2023

arXiv:2205.12259 [pdf, other]

Policy Compliance Detection via Expression Tree Inference

Authors: Neema Kotonya, Andreas Vlachos, Majid Yazdani, Lambert Mathias, Marzieh Saeidi

Abstract: Policy Compliance Detection (PCD) is a task we encounter when reasoning over texts, e.g. legal frameworks. Previous work to address PCD relies heavily on modeling the task as a special case of Recognizing Textual Entailment. Entailment is applicable to the problem of PCD, however viewing the policy as a single proposition, as opposed to multiple interlinked propositions, yields poor performance an… ▽ More Policy Compliance Detection (PCD) is a task we encounter when reasoning over texts, e.g. legal frameworks. Previous work to address PCD relies heavily on modeling the task as a special case of Recognizing Textual Entailment. Entailment is applicable to the problem of PCD, however viewing the policy as a single proposition, as opposed to multiple interlinked propositions, yields poor performance and lacks explainability. To address this challenge, more recent proposals for PCD have argued for decomposing policies into expression trees consisting of questions connected with logic operators. Question answering is used to obtain answers to these questions with respect to a scenario. Finally, the expression tree is evaluated in order to arrive at an overall solution. However, this work assumes expression trees are provided by experts, thus limiting its applicability to new policies. In this work, we learn how to infer expression trees automatically from policy texts. We ensure the validity of the inferred trees by introducing constrained decoding using a finite state automaton to ensure the generation of valid trees. We determine through automatic evaluation that 63% of the expression trees generated by our constrained generation model are logically equivalent to gold trees. Human evaluation shows that 88% of trees generated by our model are correct. △ Less

Submitted 24 May, 2022; originally announced May 2022.

arXiv:2204.01172 [pdf, other]

PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models

Authors: Rabeeh Karimi Mahabadi, Luke Zettlemoyer, James Henderson, Marzieh Saeidi, Lambert Mathias, Veselin Stoyanov, Majid Yazdani

Abstract: Current methods for few-shot fine-tuning of pretrained masked language models (PLMs) require carefully engineered prompts and verbalizers for each new task to convert examples into a cloze-format that the PLM can score. In this work, we propose PERFECT, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, which is highly effective given as few as… ▽ More Current methods for few-shot fine-tuning of pretrained masked language models (PLMs) require carefully engineered prompts and verbalizers for each new task to convert examples into a cloze-format that the PLM can score. In this work, we propose PERFECT, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, which is highly effective given as few as 32 data points. PERFECT makes two key design choices: First, we show that manually engineered task prompts can be replaced with task-specific adapters that enable sample-efficient fine-tuning and reduce memory and storage costs by roughly factors of 5 and 100, respectively. Second, instead of using handcrafted verbalizers, we learn new multi-token label embeddings during fine-tuning, which are not tied to the model vocabulary and which allow us to avoid complex auto-regressive decoding. These embeddings are not only learnable from limited data but also enable nearly 100x faster training and inference. Experiments on a wide range of few-shot NLP tasks demonstrate that PERFECT, while being simple and efficient, also outperforms existing state-of-the-art few-shot learning methods. Our code is publicly available at https://github.com/facebookresearch/perfect.git. △ Less

Submitted 25 April, 2022; v1 submitted 3 April, 2022; originally announced April 2022.

Comments: ACL, 2022

arXiv:2109.14497 [pdf, other]

Ruler Wrapping

Authors: Travis Gagie, Mozhgan Saeidi, Allan Sapucaia

Abstract: In 1985 Hopcroft, Joseph and Whitesides showed it is NP-complete to decide whether a carpenter's ruler with segments of given positive lengths can be folded into a line of at most a given length, such that the folded hinges alternate between 180 degrees clockwise and 180 degrees counter-clockwise. At the open-problem session of 33rd Canadian Conference on Computational Geometry (CCCG '21), O'Rourk… ▽ More In 1985 Hopcroft, Joseph and Whitesides showed it is NP-complete to decide whether a carpenter's ruler with segments of given positive lengths can be folded into a line of at most a given length, such that the folded hinges alternate between 180 degrees clockwise and 180 degrees counter-clockwise. At the open-problem session of 33rd Canadian Conference on Computational Geometry (CCCG '21), O'Rourke proposed a natural variation of this problem called {\em ruler wrapping}, in which all folded hinges must be folded the same way. In this paper we show O'Rourke's variation has an linear-time solution. We also show how, given a sequence of positive numbers, in linear time we can partition it into the maximum number of substrings whose totals are non-decreasing. △ Less

Submitted 9 January, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

arXiv:2109.03731 [pdf, other]

Cross-Policy Compliance Detection via Question Answering

Authors: Marzieh Saeidi, Majid Yazdani, Andreas Vlachos

Abstract: Policy compliance detection is the task of ensuring that a scenario conforms to a policy (e.g. a claim is valid according to government rules or a post in an online platform conforms to community guidelines). This task has been previously instantiated as a form of textual entailment, which results in poor accuracy due to the complexity of the policies. In this paper we propose to address policy co… ▽ More Policy compliance detection is the task of ensuring that a scenario conforms to a policy (e.g. a claim is valid according to government rules or a post in an online platform conforms to community guidelines). This task has been previously instantiated as a form of textual entailment, which results in poor accuracy due to the complexity of the policies. In this paper we propose to address policy compliance detection via decomposing it into question answering, where questions check whether the conditions stated in the policy apply to the scenario, and an expression tree combines the answers to obtain the label. Despite the initial upfront annotation cost, we demonstrate that this approach results in better accuracy, especially in the cross-policy setup where the policies during testing are unseen in training. In addition, it allows us to use existing question answering models pre-trained on existing large datasets. Finally, it explicitly identifies the information missing from a scenario in case policy compliance cannot be determined. We conduct our experiments using a recent dataset consisting of government policies, which we augment with expert annotations and find that the cost of annotating question answering decomposition is largely offset by improved inter-annotator agreement and speed. △ Less

Submitted 8 September, 2021; originally announced September 2021.

Journal ref: EMNLP 2021

arXiv:2106.01074 [pdf, other]

Database Reasoning Over Text

Authors: James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri, Sebastian Riedel, Alon Halevy

Abstract: Neural models have shown impressive performance gains in answering queries from natural language text. However, existing works are unable to support database queries, such as "List/Count all female athletes who were born in 20th century", which require reasoning over sets of relevant facts with operations such as join, filtering and aggregation. We show that while state-of-the-art transformer mode… ▽ More Neural models have shown impressive performance gains in answering queries from natural language text. However, existing works are unable to support database queries, such as "List/Count all female athletes who were born in 20th century", which require reasoning over sets of relevant facts with operations such as join, filtering and aggregation. We show that while state-of-the-art transformer models perform very well for small databases, they exhibit limitations in processing noisy data, numerical operations, and queries that aggregate facts. We propose a modular architecture to answer these database-style queries over multiple spans from text and aggregating these at scale. We evaluate the architecture using WikiNLDB, a novel dataset for exploring such queries. Our architecture scales to databases containing thousands of facts whereas contemporary models are limited by how many facts can be encoded. In direct comparison on small databases, our approach increases overall answer accuracy from 85% to 90%. On larger databases, our approach retains its accuracy whereas transformer baselines could not encode the context. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: To appear at ACL2021

arXiv:2012.12518 [pdf, ps, other]

If This Context Then That Concern: Exploring users' concerns with IFTTT applets

Authors: Mahsa Saeidi, McKenzie Calvert, Audrey W. Au, Anita Sarma, Rakesh B. Bobba

Abstract: End users are increasingly using trigger-action platforms like, If-This-Then-That (IFTTT) to create applets to connect smart home devices and services. However, there are inherent risks in using such applets -- even non-malicious ones -- as sensitive information may leak through their use in certain contexts (e.g., where the device is located, who can observe the resultant action). This work aims… ▽ More End users are increasingly using trigger-action platforms like, If-This-Then-That (IFTTT) to create applets to connect smart home devices and services. However, there are inherent risks in using such applets -- even non-malicious ones -- as sensitive information may leak through their use in certain contexts (e.g., where the device is located, who can observe the resultant action). This work aims to understand how well end users can assess this risk. We do so by exploring users' concerns with using IFTTT applets and more importantly if and how those concerns change based on different contextual factors. Through a Mechanical Turk survey of 386 participants on 49 smart-home IFTTT applets, we found that nudging the participants to think about different usage contexts led them to think deeper about the associated risks and raise their concerns. Qualitative analysis reveals that participants had a nuanced understanding of contextual factors and how these factors could lead to leakage of sensitive data and allow unauthorized access to applets and data. △ Less

Submitted 23 December, 2020; originally announced December 2020.

arXiv:2011.05448 [pdf, other]

Generating Fact Checking Briefs

Authors: Angela Fan, Aleksandra Piktus, Fabio Petroni, Guillaume Wenzek, Marzieh Saeidi, Andreas Vlachos, Antoine Bordes, Sebastian Riedel

Abstract: Fact checking at scale is difficult -- while the number of active fact checking websites is growing, it remains too small for the needs of the contemporary media ecosystem. However, despite good intentions, contributions from volunteers are often error-prone, and thus in practice restricted to claim detection. We investigate how to increase the accuracy and efficiency of fact checking by providing… ▽ More Fact checking at scale is difficult -- while the number of active fact checking websites is growing, it remains too small for the needs of the contemporary media ecosystem. However, despite good intentions, contributions from volunteers are often error-prone, and thus in practice restricted to claim detection. We investigate how to increase the accuracy and efficiency of fact checking by providing information about the claim before performing the check, in the form of natural language briefs. We investigate passage-based briefs, containing a relevant passage from Wikipedia, entity-centric ones consisting of Wikipedia pages of mentioned entities, and Question-Answering Briefs, with questions decomposing the claim, and their answers. To produce QABriefs, we develop QABriefer, a model that generates a set of questions conditioned on the claim, searches the web for evidence, and generates answers. To train its components, we introduce QABriefDataset which we collected via crowdsourcing. We show that fact checking with briefs -- in particular QABriefs -- increases the accuracy of crowdworkers by 10% while slightly decreasing the time taken. For volunteer (unpaid) fact checkers, QABriefs slightly increase accuracy and reduce the time required by around 20%. △ Less

Submitted 10 November, 2020; originally announced November 2020.

arXiv:2010.06973 [pdf, other]

Neural Databases

Authors: James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri, Sebastian Riedel, Alon Halevy

Abstract: In recent years, neural networks have shown impressive performance gains on long-standing AI problems, and in particular, answering queries from natural language text. These advances raise the question of whether they can be extended to a point where we can relax the fundamental assumption of database management, namely, that our data is represented as fields of a pre-defined schema. This paper… ▽ More In recent years, neural networks have shown impressive performance gains on long-standing AI problems, and in particular, answering queries from natural language text. These advances raise the question of whether they can be extended to a point where we can relax the fundamental assumption of database management, namely, that our data is represented as fields of a pre-defined schema. This paper presents a first step in answering that question. We describe NeuralDB, a database system with no pre-defined schema, in which updates and queries are given in natural language. We develop query processing techniques that build on the primitives offered by the state of the art Natural Language Processing methods. We begin by demonstrating that at the core, recent NLP transformers, powered by pre-trained language models, can answer select-project-join queries if they are given the exact set of relevant facts. However, they cannot scale to non-trivial databases and cannot perform aggregation queries. Based on these findings, we describe a NeuralDB architecture that runs multiple Neural SPJ operators in parallel, each with a set of database sentences that can produce one of the answers to the query. The result of these operators is fed to an aggregation operator if needed. We describe an algorithm that learns how to create the appropriate sets of facts to be fed into each of the Neural SPJ operators. Importantly, this algorithm can be trained by the Neural SPJ operator itself. We experimentally validate the accuracy of NeuralDB and its components, showing that we can answer queries over thousands of sentences with very high accuracy. △ Less

Submitted 14 October, 2020; originally announced October 2020.

Comments: Submitted to PVLDB vol 14

arXiv:2010.01339 [pdf, ps, other]

Weighted Sum-Rate Maximization for Multi-IRS-assisted Full-Duplex Systems with Hardware Impairments

Authors: Mohammad Amin Saeidi, Mohammad Javad Emadi, Hamed Masoumi, Mohammad Robat Mili, Derrick Wing Kwan Ng, Ioannis Krikidis

Abstract: Smart and reconfigurable wireless communication environments can be established by exploiting well-designed intelligent reflecting surfaces (IRSs) to shape the communication channels. In this paper, we investigate how multiple IRSs affect the performance of multi-user full-duplex communication systems under hardware impairment at each node, wherein the base station (BS) and the uplink users are su… ▽ More Smart and reconfigurable wireless communication environments can be established by exploiting well-designed intelligent reflecting surfaces (IRSs) to shape the communication channels. In this paper, we investigate how multiple IRSs affect the performance of multi-user full-duplex communication systems under hardware impairment at each node, wherein the base station (BS) and the uplink users are subject to maximum transmission power constraints. Firstly, the uplink-downlink system weighted sum-rate (SWSR) is derived which serves as a system performance metric. Then, we formulate the resource allocation design for the maximization of SWSR as an optimization problem which jointly optimizes the beamforming and the combining vectors at the BS, the transmit powers of the uplink users, and the phase shifts of multiple IRSs. Since the SWSR optimization problem is non-convex, an efficient iterative alternating approach is proposed to obtain a suboptimal solution for the design problem considered and its complexity is also discussed. In particular, we firstly reformulate the main problem into an equivalent weighted minimum mean-square-error form and then transform it into several convex sub-problems which can be analytically solved for given phase shifts. Then, the IRSs phases are optimized via a gradient ascent-based algorithm. Finally, numerical results are presented to clarify how multiple IRSs enhance the performance metric under hardware impairment. △ Less

Submitted 3 October, 2020; originally announced October 2020.

Comments: 30 pages, This work has been submitted for possible publication

arXiv:2009.10311 [pdf, other]

Preserving Integrity in Online Social Networks

Authors: Alon Halevy, Cristian Canton Ferrer, Hao Ma, Umut Ozertem, Patrick Pantel, Marzieh Saeidi, Fabrizio Silvestri, Ves Stoyanov

Abstract: Online social networks provide a platform for sharing information and free expression. However, these networks are also used for malicious purposes, such as distributing misinformation and hate speech, selling illegal drugs, and coordinating sex trafficking or child exploitation. This paper surveys the state of the art in keeping online platforms and their users safe from such harm, also known as… ▽ More Online social networks provide a platform for sharing information and free expression. However, these networks are also used for malicious purposes, such as distributing misinformation and hate speech, selling illegal drugs, and coordinating sex trafficking or child exploitation. This paper surveys the state of the art in keeping online platforms and their users safe from such harm, also known as the problem of preserving integrity. This survey comes from the perspective of having to combat a broad spectrum of integrity violations at Facebook. We highlight the techniques that have been proven useful in practice and that deserve additional attention from the academic community. Instead of discussing the many individual violation types, we identify key aspects of the social-media eco-system, each of which is common to a wide variety violation types. Furthermore, each of these components represents an area for research and development, and the innovations that are found can be applied widely. △ Less

Submitted 25 September, 2020; v1 submitted 22 September, 2020; originally announced September 2020.

arXiv:2008.06274 [pdf, other]

Graph-based Modeling of Online Communities for Fake News Detection

Authors: Shantanu Chandra, Pushkar Mishra, Helen Yannakoudakis, Madhav Nimishakavi, Marzieh Saeidi, Ekaterina Shutova

Abstract: Over the past few years, there has been a substantial effort towards automated detection of fake news on social media platforms. Existing research has modeled the structure, style, content, and patterns in dissemination of online posts, as well as the demographic traits of users who interact with them. However, no attention has been directed towards modeling the properties of online communities th… ▽ More Over the past few years, there has been a substantial effort towards automated detection of fake news on social media platforms. Existing research has modeled the structure, style, content, and patterns in dissemination of online posts, as well as the demographic traits of users who interact with them. However, no attention has been directed towards modeling the properties of online communities that interact with the posts. In this work, we propose a novel social context-aware fake news detection framework, SAFER, based on graph neural networks (GNNs). The proposed framework aggregates information with respect to: 1) the nature of the content disseminated, 2) content-sharing behavior of users, and 3) the social network of those users. We furthermore perform a systematic comparison of several GNN models for this task and introduce novel methods based on relational and hyperbolic GNNs, which have not been previously used for user or community modeling within NLP. We empirically demonstrate that our framework yields significant improvements over existing text-based techniques and achieves state-of-the-art results on fake news datasets from two different domains. △ Less

Submitted 23 November, 2020; v1 submitted 14 August, 2020; originally announced August 2020.

arXiv:1809.01494 [pdf, other]

Interpretation of Natural Language Rules in Conversational Machine Reading

Authors: Marzieh Saeidi, Max Bartolo, Patrick Lewis, Sameer Singh, Tim Rocktäschel, Mike Sheldon, Guillaume Bouchard, Sebastian Riedel

Abstract: Most work in machine reading focuses on question answering problems where the answer is directly expressed in the text to read. However, many real-world question answering problems require the reading of text not because it contains the literal answer, but because it contains a recipe to derive an answer together with the reader's background knowledge. One example is the task of interpreting regul… ▽ More Most work in machine reading focuses on question answering problems where the answer is directly expressed in the text to read. However, many real-world question answering problems require the reading of text not because it contains the literal answer, but because it contains a recipe to derive an answer together with the reader's background knowledge. One example is the task of interpreting regulations to answer "Can I...?" or "Do I have to...?" questions such as "I am working in Canada. Do I have to carry on paying UK National Insurance?" after reading a UK government website about this topic. This task requires both the interpretation of rules and the application of background knowledge. It is further complicated due to the fact that, in practice, most questions are underspecified, and a human assistant will regularly have to ask clarification questions such as "How long have you been working abroad?" when the answer cannot be directly derived from the question and text. In this paper, we formalise this task and develop a crowd-sourcing strategy to collect 32k task instances based on real-world rules and crowd-generated questions and scenarios. We analyse the challenges of this task and assess its difficulty by evaluating the performance of rule-based and machine-learning baselines. We observe promising results when no background knowledge is necessary, and substantial room for improvement whenever background knowledge is needed. △ Less

Submitted 28 August, 2018; originally announced September 2018.

Comments: EMNLP 2018

arXiv:1708.01680 [pdf, other]

doi 10.22152/programming-journal.org/2018/2/2

On the Effect of Semantically Enriched Context Models on Software Modularization

Authors: Amir Saeidi, Jurriaan Hage, Ravi Khadka, Slinger Jansen

Abstract: Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the seman… ▽ More Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis. △ Less

Submitted 4 August, 2017; originally announced August 2017.

Journal ref: The Art, Science, and Engineering of Programming, 2018, Vol. 2, Issue 1, Article 2

arXiv:1701.04653 [pdf, other]

Community Question Answering Platforms vs. Twitter for Predicting Characteristics of Urban Neighbourhoods

Authors: Marzieh Saeidi, Alessandro Venerandi, Licia Capra, Sebastian Riedel

Abstract: In this paper, we investigate whether text from a Community Question Answering (QA) platform can be used to predict and describe real-world attributes. We experiment with predicting a wide range of 62 demographic attributes for neighbourhoods of London. We use the text from QA platform of Yahoo! Answers and compare our results to the ones obtained from Twitter microblogs. Outcomes show that the co… ▽ More In this paper, we investigate whether text from a Community Question Answering (QA) platform can be used to predict and describe real-world attributes. We experiment with predicting a wide range of 62 demographic attributes for neighbourhoods of London. We use the text from QA platform of Yahoo! Answers and compare our results to the ones obtained from Twitter microblogs. Outcomes show that the correlation between the predicted demographic attributes using text from Yahoo! Answers discussions and the observed demographic attributes can reach an average Pearson correlation coefficient of \r{ho} = 0.54, slightly higher than the predictions obtained using Twitter data. Our qualitative analysis indicates that there is semantic relatedness between the highest correlated terms extracted from both datasets and their relative demographic attributes. Furthermore, the correlations highlight the different natures of the information contained in Yahoo! Answers and Twitter. While the former seems to offer a more encyclopedic content, the latter provides information related to the current sociocultural aspects or phenomena. △ Less

Submitted 17 January, 2017; originally announced January 2017.

Comments: Submitted to ICWSM2017

arXiv:1610.03771 [pdf, other]

SentiHood: Targeted Aspect Based Sentiment Analysis Dataset for Urban Neighbourhoods

Authors: Marzieh Saeidi, Guillaume Bouchard, Maria Liakata, Sebastian Riedel

Abstract: In this paper, we introduce the task of targeted aspect-based sentiment analysis. The goal is to extract fine-grained information with respect to entities mentioned in user comments. This work extends both aspect-based sentiment analysis that assumes a single entity per document and targeted sentiment analysis that assumes a single sentiment towards a target entity. In particular, we identify the… ▽ More In this paper, we introduce the task of targeted aspect-based sentiment analysis. The goal is to extract fine-grained information with respect to entities mentioned in user comments. This work extends both aspect-based sentiment analysis that assumes a single entity per document and targeted sentiment analysis that assumes a single sentiment towards a target entity. In particular, we identify the sentiment towards each aspect of one or more entities. As a testbed for this task, we introduce the SentiHood dataset, extracted from a question answering (QA) platform where urban neighbourhoods are discussed by users. In this context units of text often mention several aspects of one or more neighbourhoods. This is the first time that a generic social media platform in this case a QA platform, is used for fine-grained opinion mining. Text coming from QA platforms is far less constrained compared to text from review specific platforms which current datasets are based on. We develop several strong baselines, relying on logistic regression and state-of-the-art recurrent neural networks. △ Less

Submitted 12 October, 2016; originally announced October 2016.

Comments: Accepted at COLING 2016

Showing 1–23 of 23 results for author: Saeidi, M