subscribe to arXiv mailings

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

Authors: Anselm Paulus, Arman Zharmagambetov, Chuan Guo, Brandon Amos, Yuandong Tian

Abstract: While recently Large Language Models (LLMs) have achieved remarkable successes, they are vulnerable to certain jailbreaking attacks that lead to generation of inappropriate or harmful content. Manual red-teaming requires finding adversarial prompts that cause such jailbreaking, e.g. by appending a suffix to a given instruction, which is inefficient and time-consuming. On the other hand, automatic… ▽ More While recently Large Language Models (LLMs) have achieved remarkable successes, they are vulnerable to certain jailbreaking attacks that lead to generation of inappropriate or harmful content. Manual red-teaming requires finding adversarial prompts that cause such jailbreaking, e.g. by appending a suffix to a given instruction, which is inefficient and time-consuming. On the other hand, automatic adversarial prompt generation often leads to semantically meaningless attacks that can easily be detected by perplexity-based filters, may require gradient information from the TargetLLM, or do not scale well due to time-consuming discrete optimization processes over the token space. In this paper, we present a novel method that uses another LLM, called the AdvPrompter, to generate human-readable adversarial prompts in seconds, $\sim800\times$ faster than existing optimization-based approaches. We train the AdvPrompter using a novel algorithm that does not require access to the gradients of the TargetLLM. This process alternates between two steps: (1) generating high-quality target adversarial suffixes by optimizing the AdvPrompter predictions, and (2) low-rank fine-tuning of the AdvPrompter with the generated adversarial suffixes. The trained AdvPrompter generates suffixes that veil the input instruction without changing its meaning, such that the TargetLLM is lured to give a harmful response. Experimental results on popular open source TargetLLMs show state-of-the-art results on the AdvBench dataset, that also transfer to closed-source black-box LLM APIs. Further, we demonstrate that by fine-tuning on a synthetic dataset generated by AdvPrompter, LLMs can be made more robust against jailbreaking attacks while maintaining performance, i.e. high MMLU scores. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 32 pages, 9 figures, 7 tables

arXiv:2310.02442 [pdf, other]

GenCO: Generating Diverse Designs with Combinatorial Constraints

Authors: Aaron Ferber, Arman Zharmagambetov, Taoan Huang, Bistra Dilkina, Yuandong Tian

Abstract: Deep generative models like GAN and VAE have shown impressive results in generating unconstrained objects like images. However, many design settings arising in industrial design, material science, computer graphics and more require that the generated objects satisfy hard combinatorial constraints or meet objectives in addition to modeling a data distribution. To address this, we propose GenCO, a g… ▽ More Deep generative models like GAN and VAE have shown impressive results in generating unconstrained objects like images. However, many design settings arising in industrial design, material science, computer graphics and more require that the generated objects satisfy hard combinatorial constraints or meet objectives in addition to modeling a data distribution. To address this, we propose GenCO, a generative framework that guarantees constraint satisfaction throughout training by leveraging differentiable combinatorial solvers to enforce feasibility. GenCO imposes the generative loss on provably feasible solutions rather than intermediate soft solutions, meaning that the deep generative network can focus on ensuring the generated objects match the data distribution without having to also capture feasibility. This shift enables practitioners to enforce hard constraints on the generated outputs during end-to-end training, enabling assessments of their feasibility and introducing additional combinatorial loss components to deep generative training. We demonstrate the effectiveness of our approach on a variety of generative combinatorial tasks, including game level generation, map creation for path planning, and photonic device design, consistently demonstrating its capability to yield diverse, high-quality solutions that verifiably adhere to user-specified combinatorial properties. △ Less

Submitted 6 June, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: Accepted to ICML 2024

arXiv:2307.08964 [pdf, other]

Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information

Authors: Arman Zharmagambetov, Brandon Amos, Aaron Ferber, Taoan Huang, Bistra Dilkina, Yuandong Tian

Abstract: Recent works in learning-integrated optimization have shown promise in settings where the optimization problem is only partially observed or where general-purpose optimizers perform poorly without expert tuning. By learning an optimizer $\mathbf{g}$ to tackle these challenging problems with $f$ as the objective, the optimization process can be substantially accelerated by leveraging past experienc… ▽ More Recent works in learning-integrated optimization have shown promise in settings where the optimization problem is only partially observed or where general-purpose optimizers perform poorly without expert tuning. By learning an optimizer $\mathbf{g}$ to tackle these challenging problems with $f$ as the objective, the optimization process can be substantially accelerated by leveraging past experience. The optimizer can be trained with supervision from known optimal solutions or implicitly by optimizing the compound function $f\circ \mathbf{g}$. The implicit approach may not require optimal solutions as labels and is capable of handling problem uncertainty; however, it is slow to train and deploy due to frequent calls to optimizer $\mathbf{g}$ during both training and testing. The training is further challenged by sparse gradients of $\mathbf{g}$, especially for combinatorial solvers. To address these challenges, we propose using a smooth and learnable Landscape Surrogate $M$ as a replacement for $f\circ \mathbf{g}$. This surrogate, learnable by neural networks, can be computed faster than the solver $\mathbf{g}$, provides dense and smooth gradients during training, can generalize to unseen optimization problems, and is efficiently learned via alternating optimization. We test our approach on both synthetic problems, including shortest path and multidimensional knapsack, and real-world problems such as portfolio optimization, achieving comparable or superior objective values compared to state-of-the-art baselines while reducing the number of calls to $\mathbf{g}$. Notably, our approach outperforms existing methods for computationally expensive high-dimensional problems. △ Less

Submitted 2 November, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2104.02922 [pdf, ps, other]

doi 10.1007/s10618-022-00892-7

Sparse Oblique Decision Trees: A Tool to Understand and Manipulate Neural Net Features

Authors: Suryabhan Singh Hada, Miguel Á. Carreira-Perpiñán, Arman Zharmagambetov

Abstract: The widespread deployment of deep nets in practical applications has lead to a growing desire to understand how and why such black-box methods perform prediction. Much work has focused on understanding what part of the input pattern (an image, say) is responsible for a particular class being predicted, and how the input may be manipulated to predict a different class. We focus instead on understan… ▽ More The widespread deployment of deep nets in practical applications has lead to a growing desire to understand how and why such black-box methods perform prediction. Much work has focused on understanding what part of the input pattern (an image, say) is responsible for a particular class being predicted, and how the input may be manipulated to predict a different class. We focus instead on understanding which of the internal features computed by the neural net are responsible for a particular class. We achieve this by mimicking part of the neural net with an oblique decision tree having sparse weight vectors at the decision nodes. Using the recently proposed Tree Alternating Optimization (TAO) algorithm, we are able to learn trees that are both highly accurate and interpretable. Such trees can faithfully mimic the part of the neural net they replaced, and hence they can provide insights into the deep net black box. Further, we show we can easily manipulate the neural net features in order to make the net predict, or not predict, a given class, thus showing that it is possible to carry out adversarial attacks at the level of the features. These insights and manipulations apply globally to the entire training and test set, not just at a local (single-instance) level. We demonstrate this robustly in the MNIST and ImageNet datasets with LeNet5 and VGG networks. △ Less

Submitted 30 January, 2023; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: Appears in Data Mining and Knowledge Discovery (2023), Special Issue on Explainable and Interpretable Machine Learning and Data Mining

arXiv:1911.03054 [pdf, ps, other]

An Experimental Comparison of Old and New Decision Tree Algorithms

Authors: Arman Zharmagambetov, Suryabhan Singh Hada, Miguel Á. Carreira-Perpiñán, Magzhan Gabidolla

Abstract: This paper presents a detailed comparison of a recently proposed algorithm for optimizing decision trees, tree alternating optimization (TAO), with other popular, established algorithms. We compare their performance on a number of classification and regression datasets of various complexity, different size and dimensionality, across different performance factors: accuracy and tree size (in terms o… ▽ More This paper presents a detailed comparison of a recently proposed algorithm for optimizing decision trees, tree alternating optimization (TAO), with other popular, established algorithms. We compare their performance on a number of classification and regression datasets of various complexity, different size and dimensionality, across different performance factors: accuracy and tree size (in terms of the number of leaves or the depth of the tree). We find that TAO achieves higher accuracy in nearly all datasets, often by a large margin. △ Less

Submitted 19 March, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: 12 pages, 0 figures

Showing 1–5 of 5 results for author: Zharmagambetov, A