subscribe to arXiv mailings

Towards Building a Robust Toxicity Predictor

Authors: Dmitriy Bespalov, Sourav Bhabesh, Yi Xiang, Liutong Zhou, Yanjun Qi

Abstract: Recent NLP literature pays little attention to the robustness of toxicity language predictors, while these systems are most likely to be used in adversarial contexts. This paper presents a novel adversarial attack, \texttt{ToxicTrap}, introducing small word-level perturbations to fool SOTA text classifiers to predict toxic text samples as benign. ToxicTrap exploits greedy based search strategies t… ▽ More Recent NLP literature pays little attention to the robustness of toxicity language predictors, while these systems are most likely to be used in adversarial contexts. This paper presents a novel adversarial attack, \texttt{ToxicTrap}, introducing small word-level perturbations to fool SOTA text classifiers to predict toxic text samples as benign. ToxicTrap exploits greedy based search strategies to enable fast and effective generation of toxic adversarial examples. Two novel goal function designs allow ToxicTrap to identify weaknesses in both multiclass and multilabel toxic language detectors. Our empirical results show that SOTA toxicity text classifiers are indeed vulnerable to the proposed attacks, attaining over 98\% attack success rates in multilabel cases. We also show how a vanilla adversarial training and its improved version can help increase robustness of a toxicity detector even against unseen attacks. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: ACL 2023 /

arXiv:2312.04684 [pdf, other]

Latent Skill Discovery for Chain-of-Thought Reasoning

Authors: Zifan Xu, Haozhu Wang, Dmitriy Bespalov, Peter Stone, Yanjun Qi

Abstract: Recent advances in Large Language Models (LLMs) have led to an emergent ability of chain-of-thought (CoT) prompting, a prompt reasoning strategy that adds intermediate rationale steps between questions and answers to construct prompts. Conditioned on these prompts, LLMs can effectively learn in context to generate rationales that lead to more accurate answers than when answering the same question… ▽ More Recent advances in Large Language Models (LLMs) have led to an emergent ability of chain-of-thought (CoT) prompting, a prompt reasoning strategy that adds intermediate rationale steps between questions and answers to construct prompts. Conditioned on these prompts, LLMs can effectively learn in context to generate rationales that lead to more accurate answers than when answering the same question directly. To design LLM prompts, one important setting, called demonstration selection, considers selecting demonstrations from an example bank. Existing methods use various heuristics for this selection, but for CoT prompting, which involves unique rationales, it is essential to base the selection upon the intrinsic skills that CoT rationales need, for instance, the skills of addition or subtraction for math word problems. To address this requirement, we introduce a novel approach named Reasoning Skill Discovery (RSD) that use unsupervised learning to create a latent space representation of rationales, called a reasoning skill. Simultaneously, RSD learns a reasoning policy to determine the required reasoning skill for a given question. This can then guide the selection of examples that demonstrate the required reasoning skills. Our approach offers several desirable properties: it is (1) theoretically grounded, (2) sample-efficient, requiring no LLM inference or manual prompt design, and (3) LLM-agnostic. Empirically, RSD outperforms existing methods by up to 6% in terms of the answer accuracy across multiple reasoning tasks. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:1102.3563 [pdf, ps, other]

Parallel algorithms for SAT in application to inversion problems of some discrete functions

Authors: Alexander Semenov, Oleg Zaikin, Dmitry Bespalov, Mikhail Posypkin

Abstract: In this article we consider the inversion problem for polynomially computable discrete functions. These functions describe behavior of many discrete systems and are used in model checking, hardware verification, cryptanalysis, computer biology and other domains. Quite often it is necessary to invert these functions, i.e. to find an unknown preimage if an image and algorithm of function computation… ▽ More In this article we consider the inversion problem for polynomially computable discrete functions. These functions describe behavior of many discrete systems and are used in model checking, hardware verification, cryptanalysis, computer biology and other domains. Quite often it is necessary to invert these functions, i.e. to find an unknown preimage if an image and algorithm of function computation are given. In general case this problem is computationally intractable. However, many of it's special cases are very important in practical applications. Thus development of algorithms that are applicable to these special cases is of importance. The practical applicability of such algorithms can be validated by their ability to solve the problems that are considered to be computationally hard (for example cryptanalysis problems). In this article we propose the technology of solving the inversion problem for polynomially computable discrete functions. This technology was implemented in distributed computing environments (parallel clusters and Grid-systems). It is based on reducing the inversion problem for the considered function to some SAT problem. We describe a general approach to coarse-grained parallelization for obtained SAT problems. Efficiency of each parallelization scheme is determined by the means of a special predictive function. The proposed technology was validated by successful solving of cryptanalysis problems for some keystream generators. The main practical result of this work is a complete cryptanalysis of keystream generator A5/1 which was performed in a Grid system specially built for this task. △ Less

Submitted 17 February, 2011; originally announced February 2011.

Comments: 16 pages, 8 figures

Showing 1–3 of 3 results for author: Bespalov, D