subscribe to arXiv mailings

P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction from Remote Sensing Images

Authors: Tao Zhang, Shiqing Wei, Yikang Zhou, Muying Luo, Wenling You, Shunping Ji

Abstract: Extracting building contours from remote sensing imagery is a significant challenge due to buildings' complex and diverse shapes, occlusions, and noise. Existing methods often struggle with irregular contours, rounded corners, and redundancy points, necessitating extensive post-processing to produce regular polygonal building contours. To address these challenges, we introduce a novel, streamlined… ▽ More Extracting building contours from remote sensing imagery is a significant challenge due to buildings' complex and diverse shapes, occlusions, and noise. Existing methods often struggle with irregular contours, rounded corners, and redundancy points, necessitating extensive post-processing to produce regular polygonal building contours. To address these challenges, we introduce a novel, streamlined pipeline that generates regular building contours without post-processing. Our approach begins with the segmentation of generic geometric primitives (which can include vertices, lines, and corners), followed by the prediction of their sequence. This allows for the direct construction of regular building contours by sequentially connecting the segmented primitives. Building on this pipeline, we developed P2PFormer, which utilizes a transformer-based architecture to segment geometric primitives and predict their order. To enhance the segmentation of primitives, we introduce a unique representation called group queries. This representation comprises a set of queries and a singular query position, which improve the focus on multiple midpoints of primitives and their efficient linkage. Furthermore, we propose an innovative implicit update strategy for the query position embedding aimed at sharpening the focus of queries on the correct positions and, consequently, enhancing the quality of primitive segmentation. Our experiments demonstrate that P2PFormer achieves new state-of-the-art performance on the WHU, CrowdAI, and WHU-Mix datasets, surpassing the previous SOTA PolyWorld by a margin of 2.7 AP and 6.5 AP75 on the largest CrowdAI dataset. We intend to make the code and trained weights publicly available to promote their use and facilitate further research. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.19598 [pdf, other]

Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models

Authors: Fujiao Ji, Kiho Lee, Hyungjoon Koo, Wenhao You, Euijin Choo, Hyoungshick Kim, Doowon Kim

Abstract: Phishing attacks pose a significant threat to Internet users, with cybercriminals elaborately replicating the visual appearance of legitimate websites to deceive victims. Visual similarity-based detection systems have emerged as an effective countermeasure, but their effectiveness and robustness in real-world scenarios have been unexplored. In this paper, we comprehensively scrutinize and evaluate… ▽ More Phishing attacks pose a significant threat to Internet users, with cybercriminals elaborately replicating the visual appearance of legitimate websites to deceive victims. Visual similarity-based detection systems have emerged as an effective countermeasure, but their effectiveness and robustness in real-world scenarios have been unexplored. In this paper, we comprehensively scrutinize and evaluate state-of-the-art visual similarity-based anti-phishing models using a large-scale dataset of 450K real-world phishing websites. Our analysis reveals that while certain models maintain high accuracy, others exhibit notably lower performance than results on curated datasets, highlighting the importance of real-world evaluation. In addition, we observe the real-world tactic of manipulating visual components that phishing attackers employ to circumvent the detection systems. To assess the resilience of existing models against adversarial attacks and robustness, we apply visible and perturbation-based manipulations to website logos, which adversaries typically target. We then evaluate the models' robustness in handling these adversarial samples. Our findings reveal vulnerabilities in several models, emphasizing the need for more robust visual similarity techniques capable of withstanding sophisticated evasion attempts. We provide actionable insights for enhancing the security of phishing defense systems, encouraging proactive actions. To the best of our knowledge, this work represents the first large-scale, systematic evaluation of visual similarity-based models for phishing detection in real-world settings, necessitating the development of more effective and robust defenses. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 12 pages

arXiv:2405.07551 [pdf, other]

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

Authors: Shuo Yin, Weihao You, Zhilong Ji, Guoqiang Zhong, Jinfeng Bai

Abstract: The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free methods chose another track: augmenting math reasoning data. However, a great method to integrate the above two research paths and combine their advantages remains to be explored. In this work, we firstly in… ▽ More The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free methods chose another track: augmenting math reasoning data. However, a great method to integrate the above two research paths and combine their advantages remains to be explored. In this work, we firstly include new math questions via multi-perspective data augmenting methods and then synthesize code-nested solutions to them. The open LLMs (i.e., Llama-2) are finetuned on the augmented dataset to get the resulting models, MuMath-Code ($μ$-Math-Code). During the inference phase, our MuMath-Code generates code and interacts with the external python interpreter to get the execution results. Therefore, MuMath-Code leverages the advantages of both the external tool and data augmentation. To fully leverage the advantages of our augmented data, we propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code. Our MuMath-Code-7B achieves 83.8 on GSM8K and 52.4 on MATH, while MuMath-Code-70B model achieves new state-of-the-art performance among open methods -- achieving 90.7% on GSM8K and 55.1% on MATH. Extensive experiments validate the combination of tool use and data augmentation, as well as our two-stage training strategy. We release the proposed dataset along with the associated code for public use. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: The state-of-the-art open-source tool-use LLMs for mathematical reasoning

arXiv:2404.06779 [pdf, other]

Efficient and Scalable Chinese Vector Font Generation via Component Composition

Authors: Jinyu Song, Weitao You, Shuhui Shi, Shuxuan Guo, Lingyun Sun, Wei Wang

Abstract: Chinese vector font generation is challenging due to the complex structure and huge amount of Chinese characters. Recent advances remain limited to generating a small set of characters with simple structure. In this work, we first observe that most Chinese characters can be disassembled into frequently-reused components. Therefore, we introduce the first efficient and scalable Chinese vector font… ▽ More Chinese vector font generation is challenging due to the complex structure and huge amount of Chinese characters. Recent advances remain limited to generating a small set of characters with simple structure. In this work, we first observe that most Chinese characters can be disassembled into frequently-reused components. Therefore, we introduce the first efficient and scalable Chinese vector font generation approach via component composition, allowing generating numerous vector characters from a small set of components. To achieve this, we collect a large-scale dataset that contains over \textit{90K} Chinese characters with their components and layout information. Upon the dataset, we propose a simple yet effective framework based on spatial transformer networks (STN) and multiple losses tailored to font characteristics to learn the affine transformation of the components, which can be directly applied to the Bézier curves, resulting in Chinese characters in vector format. Our qualitative and quantitative experiments have demonstrated that our method significantly surpasses the state-of-the-art vector font generation methods in generating large-scale complex Chinese characters in both font generation and zero-shot font extension. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 15 pages, 23 figures

arXiv:2402.00002 [pdf, other]

Raptor Encoding for Low-Latency Concurrent Multi-PDU Session Transmission with Security Consideration in B5G Edge Network

Authors: Zhongfu Guo, Xinsheng Ji, Wei You, Mingyan Xu, Yu Zhao, Zhimo Cheng, Deqiang Zhou

Abstract: In B5G edge networks, end-to-end low-latency and high-reliability transmissions between edge computing nodes and terminal devices are essential. This paper investigates the queue-aware coding scheduling transmission of randomly arriving data packets, taking into account potential eavesdroppers in edge networks. To address these concerns, we introduce SCLER, a Protocol Data Units (PDU) Raptor-encod… ▽ More In B5G edge networks, end-to-end low-latency and high-reliability transmissions between edge computing nodes and terminal devices are essential. This paper investigates the queue-aware coding scheduling transmission of randomly arriving data packets, taking into account potential eavesdroppers in edge networks. To address these concerns, we introduce SCLER, a Protocol Data Units (PDU) Raptor-encoded multi-path transmission method that overcomes the challenges of a larger attack surface in Concurrent Multipath Transfer (CMT), excessive delay due to asymmetric delay\&bandwidth, and lack of interaction among PDU session bearers. We propose a secure and reliable transmission scheme based on Raptor encoding and distribution that incorporates a queue length-aware encoding strategy. This strategy is modeled using Constrained Markov Decision Process (CMDP), and we solve the constraint optimization problem of optimal decision-making based on a threshold strategy. Numerical results indicate that SCLER effectively reduces data leakage risks while achieving the optimal balance between delay and reliability, thereby ensuring data security. Importantly, the proposed system is compatible with current mobile networks and demonstrates practical applicability. △ Less

Submitted 4 October, 2023; originally announced February 2024.

arXiv:2311.11549 [pdf, other]

Unearthing Common Inconsistency for Generalisable Deepfake Detection

Authors: Beilin Chu, Xuan Xu, Weike You, Linna Zhou

Abstract: Deepfake has emerged for several years, yet efficient detection techniques could generalize over different manipulation methods require further research. While current image-level detection method fails to generalize to unseen domains, owing to the domain-shift phenomenon brought by CNN's strong inductive bias towards Deepfake texture, video-level one shows its potential to have both generalizatio… ▽ More Deepfake has emerged for several years, yet efficient detection techniques could generalize over different manipulation methods require further research. While current image-level detection method fails to generalize to unseen domains, owing to the domain-shift phenomenon brought by CNN's strong inductive bias towards Deepfake texture, video-level one shows its potential to have both generalization across multiple domains and robustness to compression. We argue that although distinct face manipulation tools have different inherent bias, they all disrupt the consistency between frames, which is a natural characteristic shared by authentic videos. Inspired by this, we proposed a detection approach by capturing frame inconsistency that broadly exists in different forgery techniques, termed unearthing-common-inconsistency (UCI). Concretely, the UCI network based on self-supervised contrastive learning can better distinguish temporal consistency between real and fake videos from multiple domains. We introduced a temporally-preserved module method to introduce spatial noise perturbations, directing the model's attention towards temporal information. Subsequently, leveraging a multi-view cross-correlation learning module, we extensively learn the disparities in temporal representations between genuine and fake samples. Extensive experiments demonstrate the generalization ability of our method on unseen Deepfake domains. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 9 pages, 2 figures and 5 tables

arXiv:2311.06122 [pdf, other]

Fight Fire with Fire: Combating Adversarial Patch Attacks using Pattern-randomized Defensive Patches

Authors: Jianan Feng, Jiachun Li, Changqing Miao, Jianjun Huang, Wei You, Wenchang Shi, Bin Liang

Abstract: Object detection has found extensive applications in various tasks, but it is also susceptible to adversarial patch attacks. Existing defense methods often necessitate modifications to the target model or result in unacceptable time overhead. In this paper, we adopt a counterattack approach, following the principle of "fight fire with fire," and propose a novel and general methodology for defendin… ▽ More Object detection has found extensive applications in various tasks, but it is also susceptible to adversarial patch attacks. Existing defense methods often necessitate modifications to the target model or result in unacceptable time overhead. In this paper, we adopt a counterattack approach, following the principle of "fight fire with fire," and propose a novel and general methodology for defending adversarial attacks. We utilize an active defense strategy by injecting two types of defensive patches, canary and woodpecker, into the input to proactively probe or weaken potential adversarial patches without altering the target model. Moreover, inspired by randomization techniques employed in software security, we employ randomized canary and woodpecker injection patterns to defend against defense-aware attacks. The effectiveness and practicality of the proposed method are demonstrated through comprehensive experiments. The results illustrate that canary and woodpecker achieve high performance, even when confronted with unknown attack methods, while incurring limited time overhead. Furthermore, our method also exhibits sufficient robustness against defense-aware attacks, as evidenced by adaptive attack experiments. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2310.19319 [pdf, other]

Dual-Directed Algorithm Design for Efficient Pure Exploration

Authors: Chao Qin, Wei You

Abstract: We consider pure-exploration problems in the context of stochastic sequential adaptive experiments with a finite set of alternative options. The goal of the decision-maker is to accurately answer a query question regarding the alternatives with high confidence with minimal measurement efforts. A typical query question is to identify the alternative with the best performance, leading to ranking and… ▽ More We consider pure-exploration problems in the context of stochastic sequential adaptive experiments with a finite set of alternative options. The goal of the decision-maker is to accurately answer a query question regarding the alternatives with high confidence with minimal measurement efforts. A typical query question is to identify the alternative with the best performance, leading to ranking and selection problems, or best-arm identification in the machine learning literature. We focus on the fixed-precision setting and derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples. Using dual variables, we characterize the necessary and sufficient conditions for an allocation to be optimal. The use of dual variables allow us to bypass the combinatorial structure of the optimality conditions that relies solely on primal variables. Remarkably, these optimality conditions enable an extension of top-two algorithm design principle, initially proposed for best-arm identification. Furthermore, our optimality conditions give rise to a straightforward yet efficient selection rule, termed information-directed selection, which adaptively picks from a candidate set based on information gain of the candidates. We outline the broad contexts where our algorithmic approach can be implemented. We establish that, paired with information-directed selection, top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm identification, solving a glaring open problem in the pure exploration literature. Our algorithm is optimal for $ε$-best-arm identification and thresholding bandit problems. Our analysis also leads to a general principle to guide adaptations of Thompson sampling for pure-exploration problems. Numerical experiments highlight the exceptional efficiency of our proposed algorithms relative to existing ones. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: An earlier version of this paper appeared as an extended abstract in the Proceedings of the 36th Annual Conference on Learning Theory, COLT'23, with the title "Information-Directed Selection for Top-Two Algorithms.''

arXiv:2310.18603 [pdf, other]

Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers

Authors: Wencong You, Zayd Hammoudeh, Daniel Lowd

Abstract: Backdoor attacks manipulate model predictions by inserting innocuous triggers into training and test data. We focus on more realistic and more challenging clean-label attacks where the adversarial training examples are correctly labeled. Our attack, LLMBkd, leverages language models to automatically insert diverse style-based triggers into texts. We also propose a poison selection technique to imp… ▽ More Backdoor attacks manipulate model predictions by inserting innocuous triggers into training and test data. We focus on more realistic and more challenging clean-label attacks where the adversarial training examples are correctly labeled. Our attack, LLMBkd, leverages language models to automatically insert diverse style-based triggers into texts. We also propose a poison selection technique to improve the effectiveness of both LLMBkd as well as existing textual backdoor attacks. Lastly, we describe REACT, a baseline defense to mitigate backdoor attacks via antidote training examples. Our evaluations demonstrate LLMBkd's effectiveness and efficiency, where we consistently achieve high attack success rates across a wide range of styles with little effort and no model training. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: Accepted at EMNLP 2023 Findings

arXiv:2310.16316 [pdf, other]

Sum-of-Parts Models: Faithful Attributions for Groups of Features

Authors: Weiqiu You, Helen Qu, Marco Gatti, Bhuvnesh Jain, Eric Wong

Abstract: An explanation of a machine learning model is considered "faithful" if it accurately reflects the model's decision-making process. However, explanations such as feature attributions for deep learning are not guaranteed to be faithful, and can produce potentially misleading interpretations. In this work, we develop Sum-of-Parts (SOP), a class of models whose predictions come with grouped feature at… ▽ More An explanation of a machine learning model is considered "faithful" if it accurately reflects the model's decision-making process. However, explanations such as feature attributions for deep learning are not guaranteed to be faithful, and can produce potentially misleading interpretations. In this work, we develop Sum-of-Parts (SOP), a class of models whose predictions come with grouped feature attributions that are faithful-by-construction. This model decomposes a prediction into an interpretable sum of scores, each of which is directly attributable to a sparse group of features. We evaluate SOP on benchmarks with standard interpretability metrics, and in a case study, we use the faithful explanations from SOP to help astrophysicists discover new knowledge about galaxy formation. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.12419 [pdf, other]

Toward Unbiased Multiple-Target Fuzzing with Path Diversity

Authors: Huanyao Rong, Wei You, Xiaofeng Wang, Tianhao Mao

Abstract: In this paper, we propose a novel directed fuzzing solution named AFLRun, which features target path-diversity metric and unbiased energy assignment. Firstly, we develop a new coverage metric by maintaining extra virgin map for each covered target to track the coverage status of seeds that hit the target. This approach enables the storage of waypoints into the corpus that hit a target through inte… ▽ More In this paper, we propose a novel directed fuzzing solution named AFLRun, which features target path-diversity metric and unbiased energy assignment. Firstly, we develop a new coverage metric by maintaining extra virgin map for each covered target to track the coverage status of seeds that hit the target. This approach enables the storage of waypoints into the corpus that hit a target through interesting path, thus enriching the path diversity for each target. Additionally, we propose a corpus-level energy assignment strategy that guarantees fairness for each target. AFLRun starts with uniform target weight and propagates this weight to seeds to get a desired seed weight distribution. By assigning energy to each seed in the corpus according to such desired distribution, a precise and unbiased energy assignment can be achieved. We built a prototype system and assessed its performance using a standard benchmark and several extensively fuzzed real-world applications. The evaluation results demonstrate that AFLRun outperforms state-of-the-art fuzzers in terms of vulnerability detection, both in quantity and speed. Moreover, AFLRun uncovers 29 previously unidentified vulnerabilities, including 8 CVEs, across four distinct programs. △ Less

Submitted 6 June, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.08185 [pdf, other]

EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation

Authors: Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan

Abstract: Plan-and-Write is a common hierarchical approach in long-form narrative text generation, which first creates a plan to guide the narrative writing. Following this approach, several studies rely on simply prompting large language models for planning, which often yields suboptimal results. In this paper, we propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narr… ▽ More Plan-and-Write is a common hierarchical approach in long-form narrative text generation, which first creates a plan to guide the narrative writing. Following this approach, several studies rely on simply prompting large language models for planning, which often yields suboptimal results. In this paper, we propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narrative text generation (EIPE-text), which extracts plans from the corpus of narratives and utilizes the extracted plans to construct a better planner. EIPE-text has three stages: plan extraction, learning, and inference. In the plan extraction stage, it iteratively extracts and improves plans from the narrative corpus and constructs a plan corpus. We propose a question answer (QA) based evaluation mechanism to automatically evaluate the plans and generate detailed plan refinement instructions to guide the iterative improvement. In the learning stage, we build a better planner by fine-tuning with the plan corpus or in-context learning with examples in the plan corpus. Finally, we leverage a hierarchical approach to generate long-form narratives. We evaluate the effectiveness of EIPE-text in the domains of novels and storytelling. Both GPT-4-based evaluations and human evaluations demonstrate that our method can generate more coherent and relevant long-form narratives. Our code will be released in the future. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2309.10706 [pdf, other]

OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch

Authors: Juntao Li, Zecheng Tang, Yuyang Ding, Pinzheng Wang, Pei Guo, Wangjie You, Dan Qiao, Wenliang Chen, Guohong Fu, Qiaoming Zhu, Guodong Zhou, Min Zhang

Abstract: Large language models (LLMs) with billions of parameters have demonstrated outstanding performance on various natural language processing tasks. This report presents OpenBA, an open-sourced 15B bilingual asymmetric seq2seq model, to contribute an LLM variant to the Chinese-oriented open-source model community. We enhance OpenBA with effective and efficient techniques as well as adopt a three-stage… ▽ More Large language models (LLMs) with billions of parameters have demonstrated outstanding performance on various natural language processing tasks. This report presents OpenBA, an open-sourced 15B bilingual asymmetric seq2seq model, to contribute an LLM variant to the Chinese-oriented open-source model community. We enhance OpenBA with effective and efficient techniques as well as adopt a three-stage training strategy to train the model from scratch. Our solution can also achieve very competitive performance with only 380B tokens, which is better than LLaMA-70B on the BELEBELE benchmark, BLOOM-176B on the MMLU benchmark, GLM-130B on the C-Eval (hard) benchmark. This report provides the main details to pre-train an analogous model, including pre-training data processing, Bilingual Flan data collection, the empirical observations that inspire our model architecture design, training objectives of different stages, and other enhancement techniques. Additionally, we also provide the fine-tuning details of OpenBA on four downstream tasks. We have refactored our code to follow the design principles of the Huggingface Transformers Library, making it more convenient for developers to use, and released checkpoints of different training stages at https://huggingface.co/openBA. More details of our project are available at https://github.com/OpenNLG/openBA.git. △ Less

Submitted 1 October, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

arXiv:2308.16836 [pdf, other]

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

Authors: Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng

Abstract: This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice. Based on the main architecture of recently proposed VISinger, we put forward several specific designs for expressive singing voice synthesis. First, dif… ▽ More This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice. Based on the main architecture of recently proposed VISinger, we put forward several specific designs for expressive singing voice synthesis. First, different from the previous SVS models, we use text representation of lyrics extracted from pre-trained BERT as additional input to the model. The representation contains information about semantics of the lyrics, which could help SVS system produce more expressive and natural voice. Second, we further introduce an energy predictor to stabilize the synthesized voice and model the wider range of energy variations that also contribute to the expressiveness of singing voice. Last but not the least, to attenuate the off-key issues, the pitch predictor is re-designed to predict the real to note pitch ratio. Both objective and subjective experimental results indicate that the proposed SVS system can produce singing voice with higher-quality outperforming VISinger. △ Less

Submitted 31 August, 2023; originally announced August 2023.

arXiv:2307.01676 [pdf, other]

RaidEnv: Exploring New Challenges in Automated Content Balancing for Boss Raid Games

Authors: Hyeon-Chang Jeon, In-Chang Baek, Cheong-mok Bae, Taehwa Park, Wonsang You, Taegwan Ha, Hoyun Jung, Jinha Noh, Seungwon Oh, Kyung-Joong Kim

Abstract: The balance of game content significantly impacts the gaming experience. Unbalanced game content diminishes engagement or increases frustration because of repetitive failure. Although game designers intend to adjust the difficulty of game content, this is a repetitive, labor-intensive, and challenging process, especially for commercial-level games with extensive content. To address this issue, the… ▽ More The balance of game content significantly impacts the gaming experience. Unbalanced game content diminishes engagement or increases frustration because of repetitive failure. Although game designers intend to adjust the difficulty of game content, this is a repetitive, labor-intensive, and challenging process, especially for commercial-level games with extensive content. To address this issue, the game research community has explored automated game balancing using artificial intelligence (AI) techniques. However, previous studies have focused on limited game content and did not consider the importance of the generalization ability of playtesting agents when encountering content changes. In this study, we propose RaidEnv, a new game simulator that includes diverse and customizable content for the boss raid scenario in MMORPG games. Additionally, we design two benchmarks for the boss raid scenario that can aid in the practical application of game AI. These benchmarks address two open problems in automatic content balancing, and we introduce two evaluation metrics to provide guidance for AI in automatic content balancing. This novel game research platform expands the frontiers of automatic game balancing problems and offers a framework within a realistic game production pipeline. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: 14 pages, 6 figures, 6 tables, 2 algorithms

arXiv:2306.14208

PaRUS: A Virtual Reality Shopping Method Focusing on Context between Products and Real Usage Scenes

Authors: Weitao You, Yinyu Lu, Ziqing Zheng, Yizhan Shao, Changyuan Yang, Zhibin Zhou, Lingyun Sun

Abstract: The development of AR and VR technologies is enhancing users' online shopping experiences in various ways. However, in existing VR shopping applications, shopping contexts merely refer to the products and virtual malls or metaphorical scenes where users select products. This leads to the defect that users can only imagine rather than intuitively feel whether the selected products are suitable for… ▽ More The development of AR and VR technologies is enhancing users' online shopping experiences in various ways. However, in existing VR shopping applications, shopping contexts merely refer to the products and virtual malls or metaphorical scenes where users select products. This leads to the defect that users can only imagine rather than intuitively feel whether the selected products are suitable for their real usage scenes, resulting in a significant discrepancy between their expectations before and after the purchase. To address this issue, we propose PaRUS, a VR shopping approach that focuses on the context between products and their real usage scenes. PaRUS begins by rebuilding the virtual scenario of the products' real usage scene through a new semantic scene reconstruction pipeline, which preserves both the structured scene and textured object models in the scene. Afterwards, intuitive visualization of how the selected products fit the reconstructed virtual scene is provided. We conducted two user studies to evaluate how PaRUS impacts user experience, behavior, and satisfaction with their purchase. The results indicated that PaRUS significantly reduced the perceived performance risk and improved users' trust and satisfaction with their purchase results. △ Less

Submitted 9 October, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

Comments: a mistake: the participant number of the first user study should be 24 instead of 16

arXiv:2304.08103 [pdf, other]

Low-code LLM: Graphical User Interface over Large Language Models

Authors: Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan, Furu Wei

Abstract: Utilizing Large Language Models (LLMs) for complex tasks is challenging, often involving a time-consuming and uncontrollable prompt engineering process. This paper introduces a novel human-LLM interaction framework, Low-code LLM. It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses. Through visual interaction with a graphica… ▽ More Utilizing Large Language Models (LLMs) for complex tasks is challenging, often involving a time-consuming and uncontrollable prompt engineering process. This paper introduces a novel human-LLM interaction framework, Low-code LLM. It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses. Through visual interaction with a graphical user interface, users can incorporate their ideas into the process without writing trivial prompts. The proposed Low-code LLM framework consists of a Planning LLM that designs a structured planning workflow for complex tasks, which can be correspondingly edited and confirmed by users through low-code visual programming operations, and an Executing LLM that generates responses following the user-confirmed workflow. We highlight three advantages of the low-code LLM: user-friendly interaction, controllable generation, and wide applicability. We demonstrate its benefits using four typical applications. By introducing this framework, we aim to bridge the gap between humans and LLMs, enabling more effective and efficient utilization of LLMs for complex tasks. The code, prompts, and experimental details are available at https://github.com/moymix/TaskMatrix/tree/main/LowCodeLLM. A system demonstration video can be found at https://www.youtube.com/watch?v=jb2C1vaeO3E. △ Less

Submitted 1 April, 2024; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: Accepted as a Demo Track paper at NAACL 2024

arXiv:2301.10896 [pdf, other]

Causal Reasoning of Entities and Events in Procedural Texts

Authors: Li Zhang, Hainiu Xu, Yue Yang, Shuyan Zhou, Weiqiu You, Manni Arora, Chris Callison-Burch

Abstract: Entities and events are crucial to natural language reasoning and common in procedural texts. Existing work has focused either exclusively on entity state tracking (e.g., whether a pan is hot) or on event reasoning (e.g., whether one would burn themselves by touching the pan), while these two tasks are often causally related. We propose CREPE, the first benchmark on causal reasoning of event plaus… ▽ More Entities and events are crucial to natural language reasoning and common in procedural texts. Existing work has focused either exclusively on entity state tracking (e.g., whether a pan is hot) or on event reasoning (e.g., whether one would burn themselves by touching the pan), while these two tasks are often causally related. We propose CREPE, the first benchmark on causal reasoning of event plausibility and entity states. We show that most language models, including GPT-3, perform close to chance at .35 F1, lagging far behind human at .87 F1. We boost model performance to .59 F1 by creatively representing events as programming languages while prompting language models pretrained on code. By injecting the causal relations between entities and events as intermediate reasoning steps in our representation, we further boost the performance to .67 F1. Our findings indicate not only the challenge that CREPE brings for language models, but also the efficacy of code-like prompting combined with chain-of-thought prompting for multihop event reasoning. △ Less

Submitted 16 February, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: In Findings of EACL 2023

arXiv:2210.12233 [pdf, other]

TCAB: A Large-Scale Text Classification Attack Benchmark

Authors: Kalyani Asthana, Zhouhang Xie, Wencong You, Adam Noack, Jonathan Brophy, Sameer Singh, Daniel Lowd

Abstract: We introduce the Text Classification Attack Benchmark (TCAB), a dataset for analyzing, understanding, detecting, and labeling adversarial attacks against text classifiers. TCAB includes 1.5 million attack instances, generated by twelve adversarial attacks targeting three classifiers trained on six source datasets for sentiment analysis and abuse detection in English. Unlike standard text classific… ▽ More We introduce the Text Classification Attack Benchmark (TCAB), a dataset for analyzing, understanding, detecting, and labeling adversarial attacks against text classifiers. TCAB includes 1.5 million attack instances, generated by twelve adversarial attacks targeting three classifiers trained on six source datasets for sentiment analysis and abuse detection in English. Unlike standard text classification, text attacks must be understood in the context of the target classifier that is being attacked, and thus features of the target classifier are important as well. TCAB includes all attack instances that are successful in flipping the predicted label; a subset of the attacks are also labeled by human annotators to determine how frequently the primary semantics are preserved. The process of generating attacks is automated, so that TCAB can easily be extended to incorporate new text attacks and better classifiers as they are developed. In addition to the primary tasks of detecting and labeling attacks, TCAB can also be used for attack localization, attack target labeling, and attack characterization. TCAB code and dataset are available at https://react-nlp.github.io/tcab/. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: 32 pages, 7 figures, and 14 tables

arXiv:2205.12086 [pdf, other]

Information-Directed Selection for Top-Two Algorithms

Authors: Wei You, Chao Qin, Zihao Wang, Shuoguang Yang

Abstract: We consider the best-k-arm identification problem for multi-armed bandits, where the objective is to select the exact set of k arms with the highest mean rewards by sequentially allocating measurement effort. We characterize the necessary and sufficient conditions for the optimal allocation using dual variables. Remarkably these optimality conditions lead to the extension of top-two algorithm desi… ▽ More We consider the best-k-arm identification problem for multi-armed bandits, where the objective is to select the exact set of k arms with the highest mean rewards by sequentially allocating measurement effort. We characterize the necessary and sufficient conditions for the optimal allocation using dual variables. Remarkably these optimality conditions lead to the extension of top-two algorithm design principle (Russo, 2020), initially proposed for best-arm identification. Furthermore, our optimality conditions induce a simple and effective selection rule dubbed information-directed selection (IDS) that selects one of the top-two candidates based on a measure of information gain. As a theoretical guarantee, we prove that integrated with IDS, top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm identification, solving a glaring open problem in the pure exploration literature (Russo, 2020). As a by-product, we show that for k > 1, top-two algorithms cannot achieve optimality even when the algorithm has access to the unknown "optimal" tuning parameter. Numerical experiments show the superior performance of the proposed top-two algorithms with IDS and considerable improvement compared with algorithms without adaptive selection. △ Less

Submitted 17 July, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2023

arXiv:2201.08555 [pdf, other]

Identifying Adversarial Attacks on Text Classifiers

Authors: Zhouhang Xie, Jonathan Brophy, Adam Noack, Wencong You, Kalyani Asthana, Carter Perkins, Sabrina Reis, Sameer Singh, Daniel Lowd

Abstract: The landscape of adversarial attacks against text classifiers continues to grow, with new attacks developed every year and many of them available in standard toolkits, such as TextAttack and OpenAttack. In response, there is a growing body of work on robust learning, which reduces vulnerability to these attacks, though sometimes at a high cost in compute time or accuracy. In this paper, we take an… ▽ More The landscape of adversarial attacks against text classifiers continues to grow, with new attacks developed every year and many of them available in standard toolkits, such as TextAttack and OpenAttack. In response, there is a growing body of work on robust learning, which reduces vulnerability to these attacks, though sometimes at a high cost in compute time or accuracy. In this paper, we take an alternate approach -- we attempt to understand the attacker by analyzing adversarial text to determine which methods were used to create it. Our first contribution is an extensive dataset for attack detection and labeling: 1.5~million attack instances, generated by twelve adversarial attacks targeting three classifiers trained on six source datasets for sentiment analysis and abuse detection in English. As our second contribution, we use this dataset to develop and benchmark a number of classifiers for attack identification -- determining if a given text has been adversarially manipulated and by which attack. As a third contribution, we demonstrate the effectiveness of three classes of features for these tasks: text properties, capturing content and presentation of text; language model properties, determining which tokens are more or less probable throughout the input; and target model properties, representing how the text classifier is influenced by the attack, including internal node activations. Overall, this represents a first step towards forensics for adversarial attacks against text classifiers. △ Less

Submitted 21 January, 2022; originally announced January 2022.

arXiv:2104.05700 [pdf, other]

doi 10.18653/v1/2021.naacl-main.90

Macro-Average: Rare Types Are Important Too

Authors: Thamme Gowda, Weiqiu You, Constantine Lignos, Jonathan May

Abstract: While traditional corpus-level evaluation metrics for machine translation (MT) correlate well with fluency, they struggle to reflect adequacy. Model-based MT metrics trained on segment-level human judgments have emerged as an attractive replacement due to strong correlation results. These models, however, require potentially expensive re-training for new domains and languages. Furthermore, their d… ▽ More While traditional corpus-level evaluation metrics for machine translation (MT) correlate well with fluency, they struggle to reflect adequacy. Model-based MT metrics trained on segment-level human judgments have emerged as an attractive replacement due to strong correlation results. These models, however, require potentially expensive re-training for new domains and languages. Furthermore, their decisions are inherently non-transparent and appear to reflect unwelcome biases. We explore the simple type-based classifier metric, MacroF1, and study its applicability to MT evaluation. We find that MacroF1 is competitive on direct assessment, and outperforms others in indicating downstream cross-lingual information retrieval task performance. Further, we show that MacroF1 can be used to effectively compare supervised and unsupervised neural machine translation, and reveal significant qualitative differences in the methods' outputs. △ Less

Submitted 12 April, 2021; originally announced April 2021.

Journal ref: https://aclanthology.org/2021.naacl-main.90

arXiv:2005.00742 [pdf, other]

Hard-Coded Gaussian Attention for Neural Machine Translation

Authors: Weiqiu You, Simeng Sun, Mohit Iyyer

Abstract: Recent work has questioned the importance of the Transformer's multi-headed attention for achieving high translation quality. We push further in this direction by developing a "hard-coded" attention variant without any learned parameters. Surprisingly, replacing all learned self-attention heads in the encoder and decoder with fixed, input-agnostic Gaussian distributions minimally impacts BLEU scor… ▽ More Recent work has questioned the importance of the Transformer's multi-headed attention for achieving high translation quality. We push further in this direction by developing a "hard-coded" attention variant without any learned parameters. Surprisingly, replacing all learned self-attention heads in the encoder and decoder with fixed, input-agnostic Gaussian distributions minimally impacts BLEU scores across four different language pairs. However, additionally hard-coding cross attention (which connects the decoder to the encoder) significantly lowers BLEU, suggesting that it is more important than self-attention. Much of this BLEU drop can be recovered by adding just a single learned cross attention head to an otherwise hard-coded Transformer. Taken as a whole, our results offer insight into which components of the Transformer are actually important, which we hope will guide future work into the development of simpler and more efficient attention-based models. △ Less

Submitted 2 May, 2020; originally announced May 2020.

Comments: ACL 2020 Camera Ready (12 pages)

arXiv:2003.11174 [pdf, ps, other]

A Robust Queueing Network Analyzer Based on Indices of Dispersion

Authors: Ward Whitt, Wei You

Abstract: We develop a robust queueing network analyzer algorithm to approximate the steady-state performance of a single-class open queueing network of single-server queues with Markovian routing. The algorithm allows non-renewal external arrival processes, general service-time distributions and customer feedback. We focus on the customer flows, defined as the continuous-time processes counting customers f… ▽ More We develop a robust queueing network analyzer algorithm to approximate the steady-state performance of a single-class open queueing network of single-server queues with Markovian routing. The algorithm allows non-renewal external arrival processes, general service-time distributions and customer feedback. We focus on the customer flows, defined as the continuous-time processes counting customers flowing into or out of the network, or flowing from one queue to another. Each flow is partially characterized by its rate and a continuous function that measures the stochastic variability over time. This function is a scaled version of the variance-time curve, called the index of dispersion for counts (IDC). The required IDC functions for the flows can be calculated from the model primitives, estimated from data or approximated by solving a set of linear equations. A robust queueing technique is used to generate approximations of the mean steady-state performance at each queue from the IDC of the total arrival flow and the service specification at that queue. The algorithm effectiveness is supported by extensive simulation studies and heavy-traffic limits. △ Less

Submitted 24 March, 2020; originally announced March 2020.

Comments: Appendix available at https://cnyouwei.github.io/papers/Whitt_You_RQNA_app.pdf

arXiv:1911.09259 [pdf]

doi 10.1109/TSMC.2020.3016821.

Who Are the Phishers? Phishing Scam Detection on Ethereum via Network Embedding

Authors: Jiajing Wu, Qi Yuan, Dan Lin, Wei You, Weili Chen, Chuan Chen, Zibin Zheng

Abstract: Recently, blockchain technology has become a topic in the spotlight but also a hotbed of various cybercrimes. Among them, phishing scams on blockchain have been found making a notable amount of money, thus emerging as a serious threat to the trading security of the blockchain ecosystem. In order to create a favorable environment for investment, an effective method for detecting phishing scams is u… ▽ More Recently, blockchain technology has become a topic in the spotlight but also a hotbed of various cybercrimes. Among them, phishing scams on blockchain have been found making a notable amount of money, thus emerging as a serious threat to the trading security of the blockchain ecosystem. In order to create a favorable environment for investment, an effective method for detecting phishing scams is urgently needed in the blockchain ecosystem. To this end, this paper proposes an approach to detect phishing scams on Ethereum by mining its transaction records. Specifically, we first crawl the labeled phishing addresses from two authorized websites and reconstruct the transaction network according to the collected transaction records. Then, by taking the transaction amount and timestamp into consideration, we propose a novel network embedding algorithm called trans2vec to extract the features of the addresses for subsequent phishing identification. Finally, we adopt the oneclass support vector machine (SVM) to classify the nodes into normal and phishing ones. Experimental results demonstrate that the phishing detection method works effectively on Ethereum, and indicate the efficacy of trans2vec over existing state-of-the-art algorithms on feature extraction for transaction networks. This work is the first investigation on phishing detection on Ethereum via network embedding and provides insights into how features of large-scale transaction networks can be embedded. △ Less

Submitted 17 November, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

Journal ref: TSMC.2020.3016821

arXiv:1910.04351 [pdf]

Research on a Hybrid System With Perfect Forward Secrecy

Authors: Weiqing You, Guozhen Shi, Xiaoming Chen, Jian Qi, Chuang Qing

Abstract: The rapid development of computer technology will be the whole world as a whole, the widespread application of instant messaging technology to bring great convenience to people's lives, while privacy protection has become a more significant problem. For ordinary it's hard to equip themselves with a cryptograph machine. In this paper, through in-depth study of elliptic curve cryptosystem ECC and ad… ▽ More The rapid development of computer technology will be the whole world as a whole, the widespread application of instant messaging technology to bring great convenience to people's lives, while privacy protection has become a more significant problem. For ordinary it's hard to equip themselves with a cryptograph machine. In this paper, through in-depth study of elliptic curve cryptosystem ECC and advanced encryption standard AES encryption algorithm, according to the characteristics of public key cryptography, elliptic curve version through the establishment of Diffie-Hellman key exchange protocol, combined with AES, design a set of perfect forward secrecy mixed cryptograph system .The system can guarantee the security of communication, easy to implement, the operation speed is quick and the cost is low. At last, the security of the system is analyzed under the environment of common network attacks. △ Less

Submitted 9 October, 2019; originally announced October 2019.

arXiv:1910.04346 [pdf]

A New Cryptosystem Based on Positive Braids

Authors: Xiaoming Chen, Weiqing You, Meng Jiao, Kejun Zhang, Shuang Qing, Zhiqiang Wang

Abstract: The braid group is an important non commutative group, at the same time, it is an important tool in quantum field theory with better topological structure, and often used as a research carrier for anti-quantum cryptographic algorithms. This paper proposed a difficult problem on a positive braid semi-group, and proved that the difficulty is not lower than the conjugate search problem. Based on this… ▽ More The braid group is an important non commutative group, at the same time, it is an important tool in quantum field theory with better topological structure, and often used as a research carrier for anti-quantum cryptographic algorithms. This paper proposed a difficult problem on a positive braid semi-group, and proved that the difficulty is not lower than the conjugate search problem. Based on this new difficult problem, we propose a new cryptosystem, which include a key exchange protocol and a public key encryption algorithm. Since our cryptosystem is implemented on a semi-group, it effectively avoids the analysis of attack algorithms on the cluster and makes our algorithm more secure. △ Less

Submitted 9 October, 2019; originally announced October 2019.

arXiv:1806.03078 [pdf, ps, other]

The Twin Conjugacy Search Problem and Applications

Authors: Xiaoming Chen, Weiqing You, Wenxi Li

Abstract: We propose a new computational problem over the noncommutative group, called the twin conjugacy search problem. This problem is related to the conjugacy search problem and can be used for almost all of the same cryptographic constructions that are based on the conjugacy search problem. However, our new problem is at least hard as the conjugacy search problem. Moreover, the twin conjugacy search pr… ▽ More We propose a new computational problem over the noncommutative group, called the twin conjugacy search problem. This problem is related to the conjugacy search problem and can be used for almost all of the same cryptographic constructions that are based on the conjugacy search problem. However, our new problem is at least hard as the conjugacy search problem. Moreover, the twin conjugacy search problem have many applications. One of the most important applications, we propose a trapdoor test which can replace the function of the decision oracle. We also show other applications of the problem, including: a non-interactive key exchange protocol and a key exchange protocol, a new encryption scheme which is secure against chosen ciphertext attack, with a very simple and tight security proof and short ciphertexts, under a weak assumption, in the random oracle model. △ Less

Submitted 8 June, 2018; originally announced June 2018.

arXiv:1806.03075 [pdf, ps, other]

Provably Secure Integration Cryptosystem on Non-Commutative Group

Authors: Xiaoming Chen, Weiqing You

Abstract: Braid group is a very important non-commutative group. It is also an important tool of quantum field theory, and has good topological properties. This paper focuses on the provable security research of cryptosystem over braid group, which consists of two aspects: One, we proved that the Ko's cryptosystem based on braid group is secure against chosen-plaintext-attack(CPA) which proposed in CRYPTO20… ▽ More Braid group is a very important non-commutative group. It is also an important tool of quantum field theory, and has good topological properties. This paper focuses on the provable security research of cryptosystem over braid group, which consists of two aspects: One, we proved that the Ko's cryptosystem based on braid group is secure against chosen-plaintext-attack(CPA) which proposed in CRYPTO2000, while it dose not resist active attack. The other is to propose a new public key cryptosystem over braid group which is secure against adaptive chosen-ciphertext-attack(CCA2). Our proofs are based on random oracle models, under the computational conjugacy search assumption( the CCS assumption ). This kind of results have never been seen before. △ Less

Submitted 6 July, 2018; v1 submitted 8 June, 2018; originally announced June 2018.

Comments: 15 pages

arXiv:1202.4743 [pdf]

doi 10.1117/12.805596

Real-time detection and tracking of multiple objects with partial decoding in H.264/AVC bitstream domain

Authors: Wonsang You, M. S. Houari Sabirin, Munchurl Kim

Abstract: In this paper, we show that we can apply probabilistic spatiotemporal macroblock filtering (PSMF) and partial decoding processes to effectively detect and track multiple objects in real time in H.264|AVC bitstreams with stationary background. Our contribution is that our method cannot only show fast processing time but also handle multiple moving objects that are articulated, changing in size or i… ▽ More In this paper, we show that we can apply probabilistic spatiotemporal macroblock filtering (PSMF) and partial decoding processes to effectively detect and track multiple objects in real time in H.264|AVC bitstreams with stationary background. Our contribution is that our method cannot only show fast processing time but also handle multiple moving objects that are articulated, changing in size or internally have monotonous color, even though they contain a chaotic set of non-homogeneous motion vectors inside. In addition, our partial decoding process for H.264|AVC bitstreams enables to improve the accuracy of object trajectories and overcome long occlusion by using extracted color information. △ Less

Submitted 21 February, 2012; originally announced February 2012.

Comments: SPIE Real-Time Image and Video Processing Conference 2009

Journal ref: Proceedings of SPIE 2009, Volume: 7244, Publisher: SPIE, Pages: 72440D-72440D-12

arXiv:0806.1284 [pdf, ps, other]

The Separation of Duty with Privilege Calculus

Authors: Chenggong Lv, Jun Wang, Lu Liu, Weijia You

Abstract: This paper presents Privilege Calculus (PC) as a new approach of knowledge representation for Separation of Duty (SD) in the view of process and intents to improve the reconfigurability and traceability of SD. PC presumes that the structure of SD should be reduced to the structure of privilege and then the regulation of system should be analyzed with the help of forms of privilege. This paper presents Privilege Calculus (PC) as a new approach of knowledge representation for Separation of Duty (SD) in the view of process and intents to improve the reconfigurability and traceability of SD. PC presumes that the structure of SD should be reduced to the structure of privilege and then the regulation of system should be analyzed with the help of forms of privilege. △ Less

Submitted 7 June, 2008; originally announced June 2008.

Comments: RSKT2008 conference, LNAI 5009, pp.410-417, 2008

Showing 1–31 of 31 results for author: You, W