Skip to main content

Showing 1–50 of 111 results for author: Gong, N

  1. arXiv:2407.09050  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    Refusing Safe Prompts for Multi-modal Large Language Models

    Authors: Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong

    Abstract: Multimodal large language models (MLLMs) have become the cornerstone of today's generative AI ecosystem, sparking intense competition among tech giants and startups. In particular, an MLLM generates a text response given a prompt consisting of an image and a question. While state-of-the-art MLLMs use safety filters and alignment techniques to refuse unsafe prompts, in this work, we introduce MLLM-… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  2. arXiv:2407.07221  [pdf, other

    cs.CV cs.CR

    Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning

    Authors: Yuqi Jia, Minghong Fang, Hongbin Liu, Jinghuai Zhang, Neil Zhenqiang Gong

    Abstract: Poisoning attacks compromise the training phase of federated learning (FL) such that the learned global model misclassifies attacker-chosen inputs called target inputs. Existing defenses mainly focus on protecting the training phase of FL such that the learnt global model is poison free. However, these defenses often achieve limited effectiveness when the clients' local training data is highly non… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  3. arXiv:2407.04086  [pdf, other

    cs.CR cs.CV cs.LG

    Certifiably Robust Image Watermark

    Authors: Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Jinyuan Jia, Neil Zhenqiang Gong

    Abstract: Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns. Watermarking AI-generated content is a key technology to address these concerns and has been widely deployed in industry. However, watermarking is vulnerable to removal attacks and forgery attacks. In this work, we propose the first image watermarks with certified robustness guarantees against rem… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  4. arXiv:2407.01505  [pdf, other

    cs.CL cs.AI

    Self-Cognition in Large Language Models: An Exploratory Study

    Authors: Dongping Chen, Jiawen Shi, Yao Wan, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun

    Abstract: While Large Language Models (LLMs) have achieved remarkable success across various applications, they also raise concerns regarding self-cognition. In this paper, we perform a pioneering study to explore self-cognition in LLMs. Specifically, we first construct a pool of self-cognition instruction prompts to evaluate where an LLM exhibits self-cognition and four well-designed principles to quantify… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted at ICML 2024 Large Language Models and Cognition Workshop

  5. arXiv:2406.15968  [pdf, other

    cs.CL cs.LG

    ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods

    Authors: Roy Xie, Junlin Wang, Ruomin Huang, Minxing Zhang, Rong Ge, Jian Pei, Neil Zhenqiang Gong, Bhuwan Dhingra

    Abstract: The rapid scaling of large language models (LLMs) has raised concerns about the transparency and fair use of the pretraining data used for training them. Detecting such content is challenging due to the scale of the data and limited exposure of each instance during training. We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) to detect LLMs' pretraini… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  6. arXiv:2406.10416  [pdf, other

    cs.CR cs.DC cs.LG

    Byzantine-Robust Decentralized Federated Learning

    Authors: Minghong Fang, Zifan Zhang, Hairi, Prashant Khanduri, Jia Liu, Songtao Lu, Yuchen Liu, Neil Gong

    Abstract: Federated learning (FL) enables multiple clients to collaboratively train machine learning models without revealing their private training data. In conventional FL, the system follows the server-assisted architecture (server-assisted FL), where the training process is coordinated by a central server. However, the server-assisted FL framework suffers from poor scalability due to a communication bot… ▽ More

    Submitted 13 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: To appear in ACM Conference on Computer and Communications Security 2024 (CCS '24)

  7. arXiv:2406.06979  [pdf, other

    cs.LG cs.CR cs.SD eess.AS

    AudioMarkBench: Benchmarking Robustness of Audio Watermarking

    Authors: Hongbin Liu, Moyang Guo, Zhengyuan Jiang, Lun Wang, Neil Zhenqiang Gong

    Abstract: The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios. However, the robustness of audio watermarking against common/adversarial perturbations remains understudied. We present A… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  8. arXiv:2405.16203  [pdf, other

    cs.LG

    Evolutionary Large Language Model for Automated Feature Transformation

    Authors: Nanxu Gong, Chandan K. Reddy, Wangyang Ying, Yanjie Fu

    Abstract: Feature transformation aims to reconstruct the feature space of raw features to enhance the performance of downstream models. However, the exponential growth in the combinations of features and operations poses a challenge, making it difficult for existing methods to efficiently explore a wide space. Additionally, their optimization is solely driven by the accuracy of downstream models in specific… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  9. arXiv:2405.07145  [pdf, other

    cs.CR cs.CV

    Stable Signature is Unstable: Removing Image Watermark from Diffusion Models

    Authors: Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong

    Abstract: Watermark has been widely deployed by industry to detect AI-generated images. A recent watermarking framework called \emph{Stable Signature} (proposed by Meta) roots watermark into the parameters of a diffusion model's decoder such that its generated images are inherently watermarked. Stable Signature makes it possible to watermark images generated by \emph{open-source} diffusion models and was cl… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  10. arXiv:2405.06823  [pdf, other

    cs.CR cs.AI cs.LG

    PLeak: Prompt Leaking Attacks against Large Language Model Applications

    Authors: Bo Hui, Haolin Yuan, Neil Gong, Philippe Burlina, Yinzhi Cao

    Abstract: Large Language Models (LLMs) enable a new ecosystem with many downstream applications, called LLM applications, with different natural language processing tasks. The functionality and performance of an LLM application highly depend on its system prompt, which instructs the backend LLM on what task to perform. Therefore, an LLM application developer often keeps a system prompt confidential to prote… ▽ More

    Submitted 14 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

    Comments: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024

  11. arXiv:2405.06206  [pdf, other

    cs.CR cs.AI cs.LG

    Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning

    Authors: Yujie Zhang, Neil Gong, Michael K. Reiter

    Abstract: Federated Learning (FL) is a decentralized machine learning method that enables participants to collaboratively train a model without sharing their private data. Despite its privacy and scalability benefits, FL is susceptible to backdoor attacks, where adversaries poison the local training data of a subset of clients using a backdoor trigger, aiming to make the aggregated model produce malicious r… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  12. arXiv:2405.05784  [pdf, other

    cs.CR cs.LG

    Link Stealing Attacks Against Inductive Graph Neural Networks

    Authors: Yixin Wu, Xinlei He, Pascal Berrang, Mathias Humbert, Michael Backes, Neil Zhenqiang Gong, Yang Zhang

    Abstract: A graph neural network (GNN) is a type of neural network that is specifically designed to process graph-structured data. Typically, GNNs can be implemented in two settings, including the transductive setting and the inductive setting. In the transductive setting, the trained model can only predict the labels of nodes that were observed at the training time. In the inductive setting, the trained mo… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: To appear in the 24th Privacy Enhancing Technologies Symposium (PETS 2024), July 15-20, 2024

  13. arXiv:2404.17157  [pdf, other

    cs.LG

    Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation

    Authors: Nanxu Gong, Wangyang Ying, Dongjie Wang, Yanjie Fu

    Abstract: Feature selection aims to identify the optimal feature subset for enhancing downstream models. Effective feature selection can remove redundant features, save computational resources, accelerate the model learning process, and improve the model overall performance. However, existing works are often time-intensive to identify the effective feature subset within high-dimensional feature spaces. Mean… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  14. arXiv:2404.15909  [pdf, other

    cs.CV

    Learning Long-form Video Prior via Generative Pre-Training

    Authors: Jinheng Xie, Jiajun Feng, Zhaoxu Tian, Kevin Qinghong Lin, Yawen Huang, Xi Xia, Nanxu Gong, Xu Zuo, Jiaqi Yang, Yefeng Zheng, Mike Zheng Shou

    Abstract: Concepts involved in long-form videos such as people, objects, and their interactions, can be viewed as following an implicit prior. They are notably complex and continue to pose challenges to be comprehensively learned. In recent years, generative pre-training (GPT) has exhibited versatile capacities in modeling any kind of text content even visual locations. Can this manner work for learning lon… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  15. arXiv:2404.15611  [pdf, other

    cs.CR

    Model Poisoning Attacks to Federated Learning via Multi-Round Consistency

    Authors: Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong

    Abstract: Model poisoning attacks are critical security threats to Federated Learning (FL). Existing model poisoning attacks suffer from two key limitations: 1) they achieve suboptimal effectiveness when defenses are deployed, and/or 2) they require knowledge of the model updates or local training data on genuine clients. In this work, we make a key observation that their suboptimal effectiveness arises fro… ▽ More

    Submitted 6 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  16. arXiv:2404.05403  [pdf, other

    cs.CR cs.AI

    SoK: Gradient Leakage in Federated Learning

    Authors: Jiacheng Du, Jiahui Hu, Zhibo Wang, Peng Sun, Neil Zhenqiang Gong, Kui Ren

    Abstract: Federated learning (FL) enables collaborative model training among multiple clients without raw data exposure. However, recent studies have shown that clients' private training data can be reconstructed from the gradients they share in FL, known as gradient inversion attacks (GIAs). While GIAs have demonstrated effectiveness under \emph{ideal settings and auxiliary assumptions}, their actual effic… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  17. arXiv:2404.04254  [pdf, other

    cs.CR cs.AI cs.CL cs.CV cs.LG

    Watermark-based Detection and Attribution of AI-Generated Content

    Authors: Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Neil Zhenqiang Gong

    Abstract: Several companies--such as Google, Microsoft, and OpenAI--have deployed techniques to watermark AI-generated content to enable proactive detection. However, existing literature mainly focuses on user-agnostic detection. Attribution aims to further trace back the user of a generative-AI service who generated a given content detected as AI-generated. Despite its growing importance, attribution is la… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  18. arXiv:2403.17710  [pdf, other

    cs.CR cs.AI

    Optimization-based Prompt Injection Attack to LLM-as-a-Judge

    Authors: Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong

    Abstract: LLM-as-a-Judge is a novel solution that can assess textual information with large language models (LLMs). Based on existing research studies, LLMs demonstrate remarkable performance in providing a compelling alternative to traditional human assessment. However, the robustness of these systems against prompt injection attacks remains an open question. In this work, we introduce JudgeDeceiver, a nov… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  19. arXiv:2403.17303  [pdf, other

    cs.CR

    Two Birds with One Stone: Differential Privacy by Low-power SRAM Memory

    Authors: Jianqing Liu, Na Gong, Hritom Das

    Abstract: The software-based implementation of differential privacy mechanisms has been shown to be neither friendly for lightweight devices nor secure against side-channel attacks. In this work, we aim to develop a hardware-based technique to achieve differential privacy by design. In contrary to the conventional software-based noise generation and injection process, our design realizes local differential… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 15 pages, with 2 pages of Appendix

    Journal ref: IEEE Transactions on Dependable and Secure Computing, 2024

  20. arXiv:2403.15365  [pdf, other

    cs.CR cs.CL cs.LG

    A Transfer Attack to Image Watermarks

    Authors: Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong

    Abstract: Watermark has been widely deployed by industry to detect AI-generated images. The robustness of such watermark-based detector against evasion attacks in the white-box and black-box settings is well understood in the literature. However, the robustness in the no-box setting is much less understood. In particular, multiple studies claimed that image watermark is robust in such setting. In this work,… ▽ More

    Submitted 24 March, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  21. arXiv:2403.03149  [pdf, other

    cs.CR cs.DC cs.LG

    Robust Federated Learning Mitigates Client-side Training Data Distribution Inference Attacks

    Authors: Yichang Xu, Ming Yin, Minghong Fang, Neil Zhenqiang Gong

    Abstract: Recent studies have revealed that federated learning (FL), once considered secure due to clients not sharing their private data with the server, is vulnerable to attacks such as client-side training data distribution inference, where a malicious client can recreate the victim's data. While various countermeasures exist, they are not practical, often assuming server access to some training data or… ▽ More

    Submitted 4 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: To appear in The Web Conference 2024 (WWW '24)

  22. arXiv:2402.14977  [pdf, other

    cs.CR cs.CV cs.LG

    Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models

    Authors: Hongbin Liu, Michael K. Reiter, Neil Zhenqiang Gong

    Abstract: Foundation model has become the backbone of the AI ecosystem. In particular, a foundation model can be used as a general-purpose feature extractor to build various downstream classifiers. However, foundation models are vulnerable to backdoor attacks and a backdoored foundation model is a single-point-of-failure of the AI ecosystem, e.g., multiple downstream classifiers inherit the backdoor vulnera… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: To appear in USENIX Security Symposium, 2024

  23. arXiv:2402.14683  [pdf, other

    cs.CV cs.AI cs.LG

    Visual Hallucinations of Multi-modal Large Language Models

    Authors: Wen Huang, Hongbin Liu, Minxin Guo, Neil Zhenqiang Gong

    Abstract: Visual hallucination (VH) means that a multi-modal LLM (MLLM) imagines incorrect details about an image in visual question answering. Existing studies find VH instances only in existing image datasets, which results in biased understanding of MLLMs' performance under VH due to limited diversity of such VH instances. In this work, we propose a tool called VHTest to generate a diverse set of VH inst… ▽ More

    Submitted 16 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: To appear in ACL Findings, 2024

  24. arXiv:2402.13494  [pdf, other

    cs.CL cs.CR

    GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis

    Authors: Yueqi Xie, Minghong Fang, Renjie Pi, Neil Gong

    Abstract: Large Language Models (LLMs) face threats from jailbreak prompts. Existing methods for detecting jailbreak prompts are primarily online moderation APIs or finetuned LLMs. These strategies, however, often require extensive and resource-intensive data collection and training processes. In this study, we propose GradSafe, which effectively detects jailbreak prompts by scrutinizing the gradients of sa… ▽ More

    Submitted 29 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024 Main

  25. arXiv:2402.11637  [pdf, other

    cs.CR cs.IR cs.LG

    Poisoning Federated Recommender Systems with Fake Users

    Authors: Ming Yin, Yichang Xu, Minghong Fang, Neil Zhenqiang Gong

    Abstract: Federated recommendation is a prominent use case within federated learning, yet it remains susceptible to various attacks, from user to server-side vulnerabilities. Poisoning attacks are particularly notable among user-side attacks, as participants upload malicious model updates to deceive the global model, often intending to promote or demote specific targeted items. This study investigates strat… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: To appear in The Web Conference 2024 (WWW '24)

  26. arXiv:2401.05561  [pdf, other

    cs.CL

    TrustLLM: Trustworthiness in Large Language Models

    Authors: Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang , et al. (45 additional authors not shown)

    Abstract: Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in… ▽ More

    Submitted 17 March, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

    Comments: This work is still under work and we welcome your contribution

  27. arXiv:2312.01537  [pdf, ps, other

    cs.LG cs.AI

    Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents

    Authors: Yuqi Jia, Saeed Vahidian, Jingwei Sun, Jianyi Zhang, Vyacheslav Kungurtsev, Neil Zhenqiang Gong, Yiran Chen

    Abstract: Data heterogeneity presents significant challenges for federated learning (FL). Recently, dataset distillation techniques have been introduced, and performed at the client level, to attempt to mitigate some of these challenges. In this paper, we propose a highly efficient FL dataset distillation framework on the server side, significantly reducing both the computational and communication demands o… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  28. arXiv:2312.01281  [pdf, other

    cs.CR cs.LG

    Mendata: A Framework to Purify Manipulated Training Data

    Authors: Zonghao Huang, Neil Gong, Michael K. Reiter

    Abstract: Untrusted data used to train a model might have been manipulated to endow the learned model with hidden properties that the data contributor might later exploit. Data purification aims to remove such manipulations prior to training the model. We propose Mendata, a novel framework to purify manipulated training data. Starting from a small reference dataset in which a large majority of the inputs ar… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  29. arXiv:2310.13862  [pdf, other

    cs.LG cs.CR

    Competitive Advantage Attacks to Decentralized Federated Learning

    Authors: Yuqi Jia, Minghong Fang, Neil Zhenqiang Gong

    Abstract: Decentralized federated learning (DFL) enables clients (e.g., hospitals and banks) to jointly train machine learning models without a central orchestration server. In each global training round, each client trains a local model on its own training data and then they exchange local models for aggregation. In this work, we propose SelfishAttack, a new family of attacks to DFL. In SelfishAttack, a se… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  30. arXiv:2310.12815  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Formalizing and Benchmarking Prompt Injection Attacks and Defenses

    Authors: Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong

    Abstract: A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a framework… ▽ More

    Submitted 1 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: To appear in USENIX Security Symposium 2024

  31. arXiv:2310.03128  [pdf, other

    cs.SE cs.CL

    MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use

    Authors: Yue Huang, Jiawen Shi, Yuan Li, Chenrui Fan, Siyuan Wu, Qihui Zhang, Yixin Liu, Pan Zhou, Yao Wan, Neil Zhenqiang Gong, Lichao Sun

    Abstract: Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities. Recently, many studies have focused on the tool utilization ability of LLMs. They primarily investigated how LLMs effectively collaborate with given specific tools. However, in scenarios where LLMs serve as intelligent agents, as seen in applications like AutoGPT… ▽ More

    Submitted 23 February, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  32. arXiv:2309.17167  [pdf, other

    cs.AI cs.CL cs.LG

    DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks

    Authors: Kaijie Zhu, Jiaao Chen, Jindong Wang, Neil Zhenqiang Gong, Diyi Yang, Xing Xie

    Abstract: Large language models (LLMs) have achieved remarkable performance in various evaluation benchmarks. However, concerns are raised about potential data contamination in their considerable volume of training corpus. Moreover, the static nature and fixed complexity of current benchmarks may inadequately gauge the advancing capabilities of LLMs. In this paper, we introduce DyVal, a general and flexible… ▽ More

    Submitted 14 March, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: ICLR 2024 spotlight; 38 pages; code is at aka.ms/dyval

  33. arXiv:2306.07992  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework

    Authors: Minglei Yin, Bin Liu, Neil Zhenqiang Gong, Xin Li

    Abstract: With rich visual data, such as images, becoming readily associated with items, visually-aware recommendation systems (VARS) have been widely used in different applications. Recent studies have shown that VARS are vulnerable to item-image adversarial attacks, which add human-imperceptible perturbations to the clean images associated with those items. Attacks on VARS pose new security challenges to… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

  34. arXiv:2306.04528  [pdf, other

    cs.CL cs.CR cs.LG

    PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

    Authors: Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Yue Zhang, Neil Zhenqiang Gong, Xing Xie

    Abstract: The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptRobust, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple level… ▽ More

    Submitted 16 July, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Technical report; code is at: https://github.com/microsoft/promptbench

  35. arXiv:2305.12082  [pdf, other

    cs.LG

    SneakyPrompt: Jailbreaking Text-to-image Generative Models

    Authors: Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, Yinzhi Cao

    Abstract: Text-to-image generative models such as Stable Diffusion and DALL$\cdot$E raise many ethical concerns due to the generation of harmful images such as Not-Safe-for-Work (NSFW) ones. To address these ethical concerns, safety filters are often adopted to prevent the generation of NSFW images. In this work, we propose SneakyPrompt, the first automated attack framework, to jailbreak text-to-image gener… ▽ More

    Submitted 10 November, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: To appear in the Proceedings of the IEEE Symposium on Security and Privacy (Oakland), 2024

  36. arXiv:2305.03807  [pdf, other

    cs.LG cs.CR cs.CV

    Evading Watermark based Detection of AI-Generated Content

    Authors: Zhengyuan Jiang, Jinghuai Zhang, Neil Zhenqiang Gong

    Abstract: A generative AI model can generate extremely realistic-looking content, posing growing challenges to the authenticity of information. To address the challenges, watermark has been leveraged to detect AI-generated content. Specifically, a watermark is embedded into an AI-generated content before it is released. A content is detected as AI-generated if a similar watermark can be decoded from it. In… ▽ More

    Submitted 8 November, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: To appear in ACM Conference on Computer and Communications Security (CCS), 2023

  37. arXiv:2303.14601  [pdf, other

    cs.CR cs.IR cs.LG

    PORE: Provably Robust Recommender Systems against Data Poisoning Attacks

    Authors: Jinyuan Jia, Yupei Liu, Yuepeng Hu, Neil Zhenqiang Gong

    Abstract: Data poisoning attacks spoof a recommender system to make arbitrary, attacker-desired recommendations via injecting fake users with carefully crafted rating scores into the recommender system. We envision a cat-and-mouse game for such data poisoning attacks and their defenses, i.e., new defenses are designed to defend against existing attacks and new attacks are designed to break them. To prevent… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: To appear in USENIX Security Symposium, 2023

  38. arXiv:2303.01959  [pdf, other

    cs.CR cs.AI cs.CV

    PointCert: Point Cloud Classification with Deterministic Certified Robustness Guarantees

    Authors: Jinghuai Zhang, Jinyuan Jia, Hongbin Liu, Neil Zhenqiang Gong

    Abstract: Point cloud classification is an essential component in many security-critical applications such as autonomous driving and augmented reality. However, point cloud classifiers are vulnerable to adversarially perturbed point clouds. Existing certified defenses against adversarial point clouds suffer from a key limitation: their certified robustness guarantees are probabilistic, i.e., they produce an… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  39. arXiv:2301.02905  [pdf, other

    cs.CR cs.LG

    REaaS: Enabling Adversarially Robust Downstream Classifiers via Robust Encoder as a Service

    Authors: Wenjie Qu, Jinyuan Jia, Neil Zhenqiang Gong

    Abstract: Encoder as a service is an emerging cloud service. Specifically, a service provider first pre-trains an encoder (i.e., a general-purpose feature extractor) via either supervised learning or self-supervised learning and then deploys it as a cloud service API. A client queries the cloud service API to obtain feature vectors for its training/testing inputs when training/testing its classifier (called… ▽ More

    Submitted 7 January, 2023; originally announced January 2023.

    Comments: To appear in Network and Distributed System Security (NDSS) Symposium, 2023

  40. arXiv:2212.06325  [pdf, other

    cs.CR cs.DC cs.LG

    AFLGuard: Byzantine-robust Asynchronous Federated Learning

    Authors: Minghong Fang, Jia Liu, Neil Zhenqiang Gong, Elizabeth S. Bentley

    Abstract: Federated learning (FL) is an emerging machine learning paradigm, in which clients jointly learn a model with the help of a cloud server. A fundamental challenge of FL is that the clients are often heterogeneous, e.g., they have different computing powers, and thus the clients may send model updates to the server with substantially different delays. Asynchronous FL aims to address this challenge b… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: Accepted by ACSAC 2022

  41. arXiv:2212.03334  [pdf, other

    cs.CR cs.CV cs.LG

    Pre-trained Encoders in Self-Supervised Learning Improve Secure and Privacy-preserving Supervised Learning

    Authors: Hongbin Liu, Wenjie Qu, Jinyuan Jia, Neil Zhenqiang Gong

    Abstract: Classifiers in supervised learning have various security and privacy issues, e.g., 1) data poisoning attacks, backdoor attacks, and adversarial examples on the security side as well as 2) inference attacks and the right to be forgotten for the training data on the privacy side. Various secure and privacy-preserving supervised learning algorithms with formal guarantees have been proposed to address… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  42. arXiv:2211.12087  [pdf, other

    cs.CR cs.NI

    SoK: Secure Human-centered Wireless Sensing

    Authors: Wei Sun, Tingjun Chen, Neil Gong

    Abstract: Human-centered wireless sensing (HCWS) aims to understand the fine-grained environment and activities of a human using the diverse wireless signals around him/her. While the sensed information about a human can be used for many good purposes such as enhancing life quality, an adversary can also abuse it to steal private information about the human (e.g., location and person's identity). However, t… ▽ More

    Submitted 8 March, 2024; v1 submitted 22 November, 2022; originally announced November 2022.

    Journal ref: 24th Privacy Enhancing Technologies Symposium (PETS 2024)

  43. arXiv:2211.08229  [pdf, other

    cs.CR cs.CV cs.LG

    CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning

    Authors: Jinghuai Zhang, Hongbin Liu, Jinyuan Jia, Neil Zhenqiang Gong

    Abstract: Contrastive learning (CL) pre-trains general-purpose encoders using an unlabeled pre-training dataset, which consists of images or image-text pairs. CL is vulnerable to data poisoning based backdoor attacks (DPBAs), in which an attacker injects poisoned inputs into the pre-training dataset so the encoder is backdoored. However, existing DPBAs achieve limited effectiveness. In this work, we take th… ▽ More

    Submitted 29 February, 2024; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: CVPR 2024

  44. arXiv:2210.15025  [pdf, other

    cs.CV cs.LG

    Addressing Heterogeneity in Federated Learning via Distributional Transformation

    Authors: Haolin Yuan, Bo Hui, Yuchen Yang, Philippe Burlina, Neil Zhenqiang Gong, Yinzhi Cao

    Abstract: Federated learning (FL) allows multiple clients to collaboratively train a deep learning model. One major challenge of FL is when data distribution is heterogeneous, i.e., differs from one client to another. Existing personalized FL algorithms are only applicable to narrow cases, e.g., one or two data classes per client, and therefore they do not satisfactorily address FL under varying levels of d… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: In the Proceedings of European Conference on Computer Vision (ECCV), 2022

  45. arXiv:2210.10936  [pdf, other

    cs.CR cs.AI cs.LG

    FedRecover: Recovering from Poisoning Attacks in Federated Learning using Historical Information

    Authors: Xiaoyu Cao, Jinyuan Jia, Zaixi Zhang, Neil Zhenqiang Gong

    Abstract: Federated learning is vulnerable to poisoning attacks in which malicious clients poison the global model via sending malicious model updates to the server. Existing defenses focus on preventing a small number of malicious clients from poisoning the global model via robust federated learning methods and detecting malicious clients when there are a large number of them. However, it is still an open… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: To appear in IEEE S&P 2023

  46. arXiv:2210.01111  [pdf, other

    cs.CR cs.LG

    MultiGuard: Provably Robust Multi-label Classification against Adversarial Examples

    Authors: Jinyuan Jia, Wenjie Qu, Neil Zhenqiang Gong

    Abstract: Multi-label classification, which predicts a set of labels for an input, has many applications. However, multiple recent studies showed that multi-label classification is vulnerable to adversarial examples. In particular, an attacker can manipulate the labels predicted by a multi-label classifier for an input via adding carefully crafted, human-imperceptible perturbation to it. Existing provable d… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

    Comments: Accepted by NeurIPS 2022

  47. arXiv:2210.00584  [pdf, other

    cs.CR cs.AI cs.LG

    FLCert: Provably Secure Federated Learning against Poisoning Attacks

    Authors: Xiaoyu Cao, Zaixi Zhang, Jinyuan Jia, Neil Zhenqiang Gong

    Abstract: Due to its distributed nature, federated learning is vulnerable to poisoning attacks, in which malicious clients poison the training process via manipulating their local training data and/or local model updates sent to the cloud server, such that the poisoned global model misclassifies many indiscriminate test inputs or attacker-chosen ones. Existing defenses mainly leverage Byzantine-robust feder… ▽ More

    Submitted 3 October, 2022; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: To appear in Transactions on Information Forensics and Security. arXiv admin note: text overlap with arXiv:2102.01854

  48. arXiv:2207.12535  [pdf, other

    cs.CR cs.CV cs.LG

    Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning

    Authors: Xinlei He, Hongbin Liu, Neil Zhenqiang Gong, Yang Zhang

    Abstract: Semi-supervised learning (SSL) leverages both labeled and unlabeled data to train machine learning (ML) models. State-of-the-art SSL methods can achieve comparable performance to supervised learning by leveraging much fewer labeled data. However, most existing works focus on improving the performance of SSL. In this work, we take a different angle by studying the training data privacy of SSL. Spec… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022

  49. arXiv:2207.09209  [pdf, other

    cs.CR cs.AI

    FLDetector: Defending Federated Learning Against Model Poisoning Attacks via Detecting Malicious Clients

    Authors: Zaixi Zhang, Xiaoyu Cao, Jinyuan Jia, Neil Zhenqiang Gong

    Abstract: Federated learning (FL) is vulnerable to model poisoning attacks, in which malicious clients corrupt the global model via sending manipulated model updates to the server. Existing defenses mainly rely on Byzantine-robust FL methods, which aim to learn an accurate global model even if some clients are malicious. However, they can only resist a small number of malicious clients in practice. It is st… ▽ More

    Submitted 23 October, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

    Comments: Accepted by KDD 2022 (Research Track)

  50. arXiv:2207.04447  [pdf, other

    cs.CL

    Human-Centric Research for NLP: Towards a Definition and Guiding Questions

    Authors: Bhushan Kotnis, Kiril Gashteovski, Julia Gastinger, Giuseppe Serra, Francesco Alesiani, Timo Sztyler, Ammar Shaker, Na Gong, Carolin Lawrence, Zhao Xu

    Abstract: With Human-Centric Research (HCR) we can steer research activities so that the research outcome is beneficial for human stakeholders, such as end users. But what exactly makes research human-centric? We address this question by providing a working definition and define how a research pipeline can be split into different stages in which human-centric components can be added. Additionally, we discus… ▽ More

    Submitted 10 July, 2022; originally announced July 2022.