subscribe to arXiv mailings

Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles

Authors: Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, Diyi Yang

Abstract: Recent works leverage LLMs to roleplay realistic social scenarios, aiding novices in practicing their social skills. However, simulating sensitive interactions, such as in mental health, is challenging. Privacy concerns restrict data access, and collecting expert feedback, although vital, is laborious. To address this, we develop Roleplay-doh, a novel human-LLM collaboration pipeline that elicits… ▽ More Recent works leverage LLMs to roleplay realistic social scenarios, aiding novices in practicing their social skills. However, simulating sensitive interactions, such as in mental health, is challenging. Privacy concerns restrict data access, and collecting expert feedback, although vital, is laborious. To address this, we develop Roleplay-doh, a novel human-LLM collaboration pipeline that elicits qualitative feedback from a domain-expert, which is transformed into a set of principles, or natural language rules, that govern an LLM-prompted roleplay. We apply this pipeline to enable senior mental health supporters to create customized AI patients for simulated practice partners for novice counselors. After uncovering issues in GPT-4 simulations not adhering to expert-defined principles, we also introduce a novel principle-adherence prompting pipeline which shows 30\% improvements in response quality and principle following for the downstream task. Via a user study with 25 counseling experts, we demonstrate that the pipeline makes it easy and effective to create AI patients that more faithfully resemble real patients, as judged by creators and third-party counselors. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 34 pages, 24 figures, 11 Tables

arXiv:2406.04568 [pdf, other]

StackSight: Unveiling WebAssembly through Large Language Models and Neurosymbolic Chain-of-Thought Decompilation

Authors: Weike Fang, Zhejian Zhou, Junzhou He, Weihang Wang

Abstract: WebAssembly enables near-native execution in web applications and is increasingly adopted for tasks that demand high performance and robust security. However, its assembly-like syntax, implicit stack machine, and low-level data types make it extremely difficult for human developers to understand, spurring the need for effective WebAssembly reverse engineering techniques. In this paper, we propose… ▽ More WebAssembly enables near-native execution in web applications and is increasingly adopted for tasks that demand high performance and robust security. However, its assembly-like syntax, implicit stack machine, and low-level data types make it extremely difficult for human developers to understand, spurring the need for effective WebAssembly reverse engineering techniques. In this paper, we propose StackSight, a novel neurosymbolic approach that combines Large Language Models (LLMs) with advanced program analysis to decompile complex WebAssembly code into readable C++ snippets. StackSight visualizes and tracks virtual stack alterations via a static analysis algorithm and then applies chain-of-thought prompting to harness LLM's complex reasoning capabilities. Evaluation results show that StackSight significantly improves WebAssembly decompilation. Our user study also demonstrates that code snippets generated by StackSight have significantly higher win rates and enable a better grasp of code semantics. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 9 pages. In the Proceedings of the 41st International Conference on Machine Learning (ICML' 24)

arXiv:2406.02971 [pdf, other]

Maximal number of subword occurrences in a word

Authors: Wenjie Fang

Abstract: We consider the number of occurrences of subwords (non-consecutive sub-sequences) in a given word. We first define the notion of subword entropy of a given word that measures the maximal number of occurrences among all possible subwords. We then give upper and lower bounds of minimal subword entropy for words of fixed length in a fixed alphabet, and also showing that minimal subword entropy per le… ▽ More We consider the number of occurrences of subwords (non-consecutive sub-sequences) in a given word. We first define the notion of subword entropy of a given word that measures the maximal number of occurrences among all possible subwords. We then give upper and lower bounds of minimal subword entropy for words of fixed length in a fixed alphabet, and also showing that minimal subword entropy per letter has a limit value. A better upper bound of minimal subword entropy for a binary alphabet is then given by looking at certain families of periodic words. We also give some conjectures based on experimental observations △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Extended abstract accepted by 35th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2024). Comments are welcome

arXiv:2405.13515 [pdf, other]

Multi-Scale Feature Fusion Quantum Depthwise Convolutional Neural Networks for Text Classification

Authors: Yixiong Chen, Weichuan Fang

Abstract: In recent years, with the development of quantum machine learning, quantum neural networks (QNNs) have gained increasing attention in the field of natural language processing (NLP) and have achieved a series of promising results. However, most existing QNN models focus on the architectures of quantum recurrent neural network (QRNN) and self-attention mechanism (QSAM). In this work, we propose a no… ▽ More In recent years, with the development of quantum machine learning, quantum neural networks (QNNs) have gained increasing attention in the field of natural language processing (NLP) and have achieved a series of promising results. However, most existing QNN models focus on the architectures of quantum recurrent neural network (QRNN) and self-attention mechanism (QSAM). In this work, we propose a novel QNN model based on quantum convolution. We develop the quantum depthwise convolution that significantly reduces the number of parameters and lowers computational complexity. We also introduce the multi-scale feature fusion mechanism to enhance model performance by integrating word-level and sentence-level features. Additionally, we propose the quantum word embedding and quantum sentence embedding, which provide embedding vectors more efficiently. Through experiments on two benchmark text classification datasets, we demonstrate our model outperforms a wide range of state-of-the-art QNN models. Notably, our model achieves a new state-of-the-art test accuracy of 96.77% on the RP dataset. We also show the advantages of our quantum model over its classical counterparts in its ability to improve test accuracy using fewer parameters. Finally, an ablation test confirms the effectiveness of the multi-scale feature fusion mechanism and quantum depthwise convolution in enhancing model performance. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.05525 [pdf, other]

Ditto: Quantization-aware Secure Inference of Transformers upon MPC

Authors: Haoqi Wu, Wenjing Fang, Yancheng Zheng, Junming Ma, Jin Tan, Yinggui Wang, Lei Wang

Abstract: Due to the rising privacy concerns on sensitive client data and trained models like Transformers, secure multi-party computation (MPC) techniques are employed to enable secure inference despite attendant overhead. Existing works attempt to reduce the overhead using more MPC-friendly non-linear function approximations. However, the integration of quantization widely used in plaintext inference into… ▽ More Due to the rising privacy concerns on sensitive client data and trained models like Transformers, secure multi-party computation (MPC) techniques are employed to enable secure inference despite attendant overhead. Existing works attempt to reduce the overhead using more MPC-friendly non-linear function approximations. However, the integration of quantization widely used in plaintext inference into the MPC domain remains unclear. To bridge this gap, we propose the framework named Ditto to enable more efficient quantization-aware secure Transformer inference. Concretely, we first incorporate an MPC-friendly quantization into Transformer inference and employ a quantization-aware distillation procedure to maintain the model utility. Then, we propose novel MPC primitives to support the type conversions that are essential in quantization and implement the quantization-aware MPC execution of secure quantized inference. This approach significantly decreases both computation and communication overhead, leading to improvements in overall efficiency. We conduct extensive experiments on Bert and GPT2 models to evaluate the performance of Ditto. The results demonstrate that Ditto is about $3.14\sim 4.40\times$ faster than MPCFormer (ICLR 2023) and $1.44\sim 2.35\times$ faster than the state-of-the-art work PUMA with negligible utility degradation. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: to be published in ICML 2024

arXiv:2405.03197 [pdf, other]

StyleSeg V2: Towards Robust One-shot Segmentation of Brain Tissue via Optimization-free Registration Error Perception

Authors: Zhiwei Wang, Xiaoyu Zeng, Chongwei Wu, Jinxin lv, Xu Zhang, Wei Fang, Qiang Li

Abstract: One-shot segmentation of brain tissue requires training registration-segmentation (reg-seg) dual-model iteratively, where reg-model aims to provide pseudo masks of unlabeled images for seg-model by warping a carefully-labeled atlas. However, the imperfect reg-model induces image-mask misalignment, poisoning the seg-model subsequently. Recent StyleSeg bypasses this bottleneck by replacing the unlab… ▽ More One-shot segmentation of brain tissue requires training registration-segmentation (reg-seg) dual-model iteratively, where reg-model aims to provide pseudo masks of unlabeled images for seg-model by warping a carefully-labeled atlas. However, the imperfect reg-model induces image-mask misalignment, poisoning the seg-model subsequently. Recent StyleSeg bypasses this bottleneck by replacing the unlabeled images with their warped copies of atlas, but needs to borrow the diverse image patterns via style transformation. Here, we present StyleSeg V2, inherited from StyleSeg but granted the ability of perceiving the registration errors. The motivation is that good registration behaves in a mirrored fashion for mirrored images. Therefore, almost at no cost, StyleSeg V2 can have reg-model itself "speak out" incorrectly-aligned regions by simply mirroring (symmetrically flipping the brain) its input, and the registration errors are symmetric inconsistencies between the outputs of original and mirrored inputs. Consequently, StyleSeg V2 allows the seg-model to make use of correctly-aligned regions of unlabeled images and also enhances the fidelity of style-transformed warped atlas image by weighting the local transformation strength according to registration errors. The experimental results on three public datasets demonstrate that our proposed StyleSeg V2 outperforms other state-of-the-arts by considerable margins, and exceeds StyleSeg by increasing the average Dice by at least 2.4%. △ Less

Submitted 18 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: 10 pages, 11 figures, 2 tables

arXiv:2404.04878 [pdf, other]

CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data

Authors: Wei Fang, Yuxing Tang, Heng Guo, Mingze Yuan, Tony C. W. Mok, Ke Yan, Jiawen Yao, Xin Chen, Zaiyi Liu, Le Lu, Ling Zhang, Minfeng Xu

Abstract: In the realm of medical 3D data, such as CT and MRI images, prevalent anisotropic resolution is characterized by high intra-slice but diminished inter-slice resolution. The lowered resolution between adjacent slices poses challenges, hindering optimal viewing experiences and impeding the development of robust downstream analysis algorithms. Various volumetric super-resolution algorithms aim to sur… ▽ More In the realm of medical 3D data, such as CT and MRI images, prevalent anisotropic resolution is characterized by high intra-slice but diminished inter-slice resolution. The lowered resolution between adjacent slices poses challenges, hindering optimal viewing experiences and impeding the development of robust downstream analysis algorithms. Various volumetric super-resolution algorithms aim to surmount these challenges, enhancing inter-slice resolution and overall 3D medical imaging quality. However, existing approaches confront inherent challenges: 1) often tailored to specific upsampling factors, lacking flexibility for diverse clinical scenarios; 2) newly generated slices frequently suffer from over-smoothing, degrading fine details, and leading to inter-slice inconsistency. In response, this study presents CycleINR, a novel enhanced Implicit Neural Representation model for 3D medical data volumetric super-resolution. Leveraging the continuity of the learned implicit function, the CycleINR model can achieve results with arbitrary up-sampling rates, eliminating the need for separate training. Additionally, we enhance the grid sampling in CycleINR with a local attention mechanism and mitigate over-smoothing by integrating cycle-consistent loss. We introduce a new metric, Slice-wise Noise Level Inconsistency (SNLI), to quantitatively assess inter-slice noise level inconsistency. The effectiveness of our approach is demonstrated through image quality evaluations on an in-house dataset and a downstream task analysis on the Medical Segmentation Decathlon liver tumor dataset. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: CVPR accepted paper

arXiv:2403.18453 [pdf, other]

Annotating Slack Directly on Your Verilog: Fine-Grained RTL Timing Evaluation for Early Optimization

Authors: Wenji Fang, Shang Liu, Hongce Zhang, Zhiyao Xie

Abstract: In digital IC design, compared with post-synthesis netlists or layouts, the early register-transfer level (RTL) stage offers greater optimization flexibility for both designers and EDA tools. However, timing information is typically unavailable at this early stage. Some recent machine learning (ML) solutions propose to predict the total negative slack (TNS) and worst negative slack (WNS) of an ent… ▽ More In digital IC design, compared with post-synthesis netlists or layouts, the early register-transfer level (RTL) stage offers greater optimization flexibility for both designers and EDA tools. However, timing information is typically unavailable at this early stage. Some recent machine learning (ML) solutions propose to predict the total negative slack (TNS) and worst negative slack (WNS) of an entire design at the RTL stage, but the fine-grained timing information of individual registers remains unavailable. In this work, we address the unique challenges of RTL timing prediction and introduce our solution named RTL-Timer. To the best of our knowledge, this is the first fine-grained general timing estimator applicable to any given design. RTL-Timer explores multiple promising RTL representations and proposes customized loss functions to capture the maximum arrival time at register endpoints. RTL-Timer's fine-grained predictions are further applied to guide optimization in a standard synthesis flow. The average results on unknown test designs demonstrate a correlation above 0.89, contributing around 3% WNS and 10% TNS improvement after optimization. △ Less

Submitted 6 May, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: Published as a conference paper at Design Automation Conference (DAC) 2024

arXiv:2403.17751 [pdf, other]

Robust Analysis of Full-Duplex Two-Way Space Shift Keying With RIS Systems

Authors: Xusheng Zhu, Wen Chen, Qingqing Wu, Wen Fang, Chaoying Huang, Jun Li

Abstract: Reconfigurable intelligent surface (RIS)-assisted index modulation system schemes are considered a promising technology for sixth-generation (6G) wireless communication systems, which can enhance various system capabilities such as coverage and reliability. However, obtaining perfect channel state information (CSI) is challenging due to the lack of a radio frequency chain in RIS. In this paper, we… ▽ More Reconfigurable intelligent surface (RIS)-assisted index modulation system schemes are considered a promising technology for sixth-generation (6G) wireless communication systems, which can enhance various system capabilities such as coverage and reliability. However, obtaining perfect channel state information (CSI) is challenging due to the lack of a radio frequency chain in RIS. In this paper, we investigate the RIS-assisted full-duplex (FD) two-way space shift keying (SSK) system under imperfect CSI, where the signal emissions are augmented by deploying RISs in the vicinity of two FD users. The maximum likelihood detector is utilized to recover the transmit antenna index. With this in mind, we derive closed-form average bit error probability (ABEP) expression based on the Gaussian-Chebyshev quadrature (GCQ) method and provide the upper bound and asymptotic ABEP expressions in the presence of channel estimation errors. To gain more insights, we also derive the outage probability and provide the throughput of the proposed scheme with imperfect CSI. The correctness of the analytical derivation results is confirmed via Monte Carlo simulations. It is demonstrated that increasing the number of elements of RIS can significantly improve the ABEP performance of the FD system over the half-duplex (HD) system. Furthermore, in the high SNR region, the ABEP performance of the FD system is better than that of the HD system. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.14690 [pdf]

Incorporating Graph Attention Mechanism into Geometric Problem Solving Based on Deep Reinforcement Learning

Authors: Xiuqin Zhong, Shengyuan Yan, Gongqi Lin, Hongguang Fu, Liang Xu, Siwen Jiang, Lei Huang, Wei Fang

Abstract: In the context of online education, designing an automatic solver for geometric problems has been considered a crucial step towards general math Artificial Intelligence (AI), empowered by natural language understanding and traditional logical inference. In most instances, problems are addressed by adding auxiliary components such as lines or points. However, adding auxiliary components automatical… ▽ More In the context of online education, designing an automatic solver for geometric problems has been considered a crucial step towards general math Artificial Intelligence (AI), empowered by natural language understanding and traditional logical inference. In most instances, problems are addressed by adding auxiliary components such as lines or points. However, adding auxiliary components automatically is challenging due to the complexity in selecting suitable auxiliary components especially when pivotal decisions have to be made. The state-of-the-art performance has been achieved by exhausting all possible strategies from the category library to identify the one with the maximum likelihood. However, an extensive strategy search have to be applied to trade accuracy for ef-ficiency. To add auxiliary components automatically and efficiently, we present deep reinforcement learning framework based on the language model, such as BERT. We firstly apply the graph attention mechanism to reduce the strategy searching space, called AttnStrategy, which only focus on the conclusion-related components. Meanwhile, a novel algorithm, named Automatically Adding Auxiliary Components using Reinforcement Learning framework (A3C-RL), is proposed by forcing an agent to select top strategies, which incorporates the AttnStrategy and BERT as the memory components. Results from extensive experiments show that the proposed A3C-RL algorithm can substantially enhance the average precision by 32.7% compared to the traditional MCTS. In addition, the A3C-RL algorithm outperforms humans on the geometric questions from the annual University Entrance Mathematical Examination of China. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.11436 [pdf, ps, other]

Deep Holes of Twisted Reed-Solomon Codes

Authors: Weijun Fang, Jingke Xu, Ruiqi Zhu

Abstract: The deep holes of a linear code are the vectors that achieve the maximum error distance to the code. There has been extensive research on the topic of deep holes in Reed-Solomon codes. As a generalization of Reed-Solomon codes, we investigate the problem of deep holes of a class of twisted Reed-Solomon codes in this paper. The covering radius and a standard class of deep holes of twisted Reed-Solo… ▽ More The deep holes of a linear code are the vectors that achieve the maximum error distance to the code. There has been extensive research on the topic of deep holes in Reed-Solomon codes. As a generalization of Reed-Solomon codes, we investigate the problem of deep holes of a class of twisted Reed-Solomon codes in this paper. The covering radius and a standard class of deep holes of twisted Reed-Solomon codes ${\rm TRS}_k(\mathcal{A}, θ)$ are obtained for a general evaluation set $\mathcal{A} \subseteq \mathbb{F}_q$. Furthermore, we consider the problem of determining all deep holes of the full-length twisted Reed-Solomon codes ${\rm TRS}_k(\mathbb{F}_q, θ)$. Specifically, we prove that there are no other deep holes of ${\rm TRS}_k(\mathbb{F}_q, θ)$ for $\frac{3q-8}{4} \leq k\leq q-4$ when $q$ is even, and $\frac{3q+2\sqrt{q}-7}{4} \leq k\leq q-4$ when $q$ is odd. We also completely determine their deep holes for $q-3 \leq k \leq q-1$. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2403.07257 [pdf, other]

The Dawn of AI-Native EDA: Opportunities and Challenges of Large Circuit Models

Authors: Lei Chen, Yiqi Chen, Zhufei Chu, Wenji Fang, Tsung-Yi Ho, Ru Huang, Yu Huang, Sadaf Khan, Min Li, Xingquan Li, Yu Li, Yun Liang, Jinwei Liu, Yi Liu, Yibo Lin, Guojie Luo, Zhengyuan Shi, Guangyu Sun, Dimitrios Tsaras, Runsheng Wang, Ziyi Wang, Xinming Wei, Zhiyao Xie, Qiang Xu, Chenhao Xue , et al. (14 additional authors not shown)

Abstract: Within the Electronic Design Automation (EDA) domain, AI-driven solutions have emerged as formidable tools, yet they typically augment rather than redefine existing methodologies. These solutions often repurpose deep learning models from other domains, such as vision, text, and graph analytics, applying them to circuit design without tailoring to the unique complexities of electronic circuits. Suc… ▽ More Within the Electronic Design Automation (EDA) domain, AI-driven solutions have emerged as formidable tools, yet they typically augment rather than redefine existing methodologies. These solutions often repurpose deep learning models from other domains, such as vision, text, and graph analytics, applying them to circuit design without tailoring to the unique complexities of electronic circuits. Such an AI4EDA approach falls short of achieving a holistic design synthesis and understanding, overlooking the intricate interplay of electrical, logical, and physical facets of circuit data. This paper argues for a paradigm shift from AI4EDA towards AI-native EDA, integrating AI at the core of the design process. Pivotal to this vision is the development of a multimodal circuit representation learning technique, poised to provide a comprehensive understanding by harmonizing and extracting insights from varied data sources, such as functional specifications, RTL designs, circuit netlists, and physical layouts. We champion the creation of large circuit models (LCMs) that are inherently multimodal, crafted to decode and express the rich semantics and structures of circuit data, thus fostering more resilient, efficient, and inventive design methodologies. Embracing this AI-native philosophy, we foresee a trajectory that transcends the current innovation plateau in EDA, igniting a profound shift-left in electronic design methodology. The envisioned advancements herald not just an evolution of existing EDA tools but a revolution, giving rise to novel instruments of design tools that promise to radically enhance design productivity and inaugurate a new epoch where the optimization of circuit performance, power, and area (PPA) is achieved not incrementally, but through leaps that redefine the benchmarks of electronic systems' capabilities. △ Less

Submitted 1 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: The authors are ordered alphabetically. Contact: qxu@cse[dot]cuhk[dot]edu[dot]hk, gluo@pku[dot]edu[dot]cn, yuan.mingxuan@huawei[dot]com

arXiv:2403.05832 [pdf, other]

Research progress on intelligent optimization techniques for energy-efficient design of ship hull forms

Authors: Shuwei Zhu, Siying Lv, Kaifeng Chen, Wei Fang, Leilei Cao

Abstract: The design optimization of ship hull form based on hydrodynamics theory and simulation-based design (SBD) technologies generally considers ship performance and energy efficiency performance as the design objective, which plays an important role in smart design and manufacturing of green ship. An optimal design of sustainable energy system requires multidisciplinary tools to build ships with the le… ▽ More The design optimization of ship hull form based on hydrodynamics theory and simulation-based design (SBD) technologies generally considers ship performance and energy efficiency performance as the design objective, which plays an important role in smart design and manufacturing of green ship. An optimal design of sustainable energy system requires multidisciplinary tools to build ships with the least resistance and energy consumption. Through a systematic approach, this paper presents the research progress of energy-efficient design of ship hull forms based on intelligent optimization techniques. We discuss different methods involved in the optimization procedure, especially the latest developments of intelligent optimization algorithms and surrogate models. Moreover, current development trends and technical challenges of multidisciplinary design optimization and surrogate-assisted evolutionary algorithms for ship design are further analyzed. We explore the gaps and potential future directions, so as to paving the way towards the design of the next generation of more energy-efficient ship hull form. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 30 pages, 8 figures

MSC Class: 41C99 ACM Class: J.6; I.2.8

arXiv:2402.19061 [pdf, other]

Optimal ANN-SNN Conversion with Group Neurons

Authors: Liuzhenghao Lv, Wei Fang, Li Yuan, Yonghong Tian

Abstract: Spiking Neural Networks (SNNs) have emerged as a promising third generation of neural networks, offering unique characteristics such as binary outputs, high sparsity, and biological plausibility. However, the lack of effective learning algorithms remains a challenge for SNNs. For instance, while converting artificial neural networks (ANNs) to SNNs circumvents the need for direct training of SNNs,… ▽ More Spiking Neural Networks (SNNs) have emerged as a promising third generation of neural networks, offering unique characteristics such as binary outputs, high sparsity, and biological plausibility. However, the lack of effective learning algorithms remains a challenge for SNNs. For instance, while converting artificial neural networks (ANNs) to SNNs circumvents the need for direct training of SNNs, it encounters issues related to conversion errors and high inference time delays. In order to reduce or even eliminate conversion errors while decreasing inference time-steps, we have introduced a novel type of neuron called Group Neurons (GNs). One GN is composed of multiple Integrate-and-Fire (IF) neurons as members, and its neural dynamics are meticulously designed. Based on GNs, we have optimized the traditional ANN-SNN conversion framework. Specifically, we replace the IF neurons in the SNNs obtained by the traditional conversion framework with GNs. The resulting SNNs, which utilize GNs, are capable of achieving accuracy levels comparable to ANNs even within extremely short inference time-steps. The experiments on CIFAR10, CIFAR100, and ImageNet datasets demonstrate the superiority of the proposed methods in terms of both inference accuracy and latency. Code is available at https://github.com/Lyu6PosHao/ANN2SNN_GN. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: Accepted by International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024

arXiv:2402.14600 [pdf, other]

Diffusion Model-Based Multiobjective Optimization for Gasoline Blending Scheduling

Authors: Wenxuan Fang, Wei Du, Renchu He, Yang Tang, Yaochu Jin, Gary G. Yen

Abstract: Gasoline blending scheduling uses resource allocation and operation sequencing to meet a refinery's production requirements. The presence of nonlinearity, integer constraints, and a large number of decision variables adds complexity to this problem, posing challenges for traditional and evolutionary algorithms. This paper introduces a novel multiobjective optimization approach driven by a diffusio… ▽ More Gasoline blending scheduling uses resource allocation and operation sequencing to meet a refinery's production requirements. The presence of nonlinearity, integer constraints, and a large number of decision variables adds complexity to this problem, posing challenges for traditional and evolutionary algorithms. This paper introduces a novel multiobjective optimization approach driven by a diffusion model (named DMO), which is designed specifically for gasoline blending scheduling. To address integer constraints and generate feasible schedules, the diffusion model creates multiple intermediate distributions between Gaussian noise and the feasible domain. Through iterative processes, the solutions transition from Gaussian noise to feasible schedules while optimizing the objectives using the gradient descent method. DMO achieves simultaneous objective optimization and constraint adherence. Comparative tests are conducted to evaluate DMO's performance across various scales. The experimental results demonstrate that DMO surpasses state-of-the-art multiobjective evolutionary algorithms in terms of efficiency when solving gasoline blending scheduling problems. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.09745 [pdf, other]

doi 10.1145/3589334.3645628

WEFix: Intelligent Automatic Generation of Explicit Waits for Efficient Web End-to-End Flaky Tests

Authors: Xinyue Liu, Zihe Song, Weike Fang, Wei Yang, Weihang Wang

Abstract: Web end-to-end (e2e) testing evaluates the workflow of a web application. It simulates real-world user scenarios to ensure the application flows behave as expected. However, web e2e tests are notorious for being flaky, i.e., the tests can produce inconsistent results despite no changes to the code. One common type of flakiness is caused by nondeterministic execution orders between the test code an… ▽ More Web end-to-end (e2e) testing evaluates the workflow of a web application. It simulates real-world user scenarios to ensure the application flows behave as expected. However, web e2e tests are notorious for being flaky, i.e., the tests can produce inconsistent results despite no changes to the code. One common type of flakiness is caused by nondeterministic execution orders between the test code and the client-side code under test. In particular, UI-based flakiness emerges as a notably prevalent and challenging issue to fix because the test code has limited knowledge about the client-side code execution. In this paper, we propose WEFix, a technique that can automatically generate fix code for UI-based flakiness in web e2e testing. The core of our approach is to leverage browser UI changes to predict the client-side code execution and generate proper wait oracles. We evaluate the effectiveness and efficiency of WEFix against 122 web e2e flaky tests from seven popular real-world projects. Our results show that WEFix dramatically reduces the overhead (from 3.7$\times$ to 1.25$\times$) while achieving a high correctness (98%). △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 8 pages. Accepted for publication in the proceedings of the ACM Web Conference 2024 (WWW 24)

arXiv:2402.00386 [pdf, other]

AssertLLM: Generating and Evaluating Hardware Verification Assertions from Design Specifications via Multi-LLMs

Authors: Wenji Fang, Mengming Li, Min Li, Zhiyuan Yan, Shang Liu, Hongce Zhang, Zhiyao Xie

Abstract: Assertion-based verification (ABV) is a critical method for ensuring design circuits comply with their architectural specifications, which are typically described in natural language. This process often requires significant interpretation by engineers to convert these specifications into functional verification assertions. Existing methods for generating assertions from natural language specificat… ▽ More Assertion-based verification (ABV) is a critical method for ensuring design circuits comply with their architectural specifications, which are typically described in natural language. This process often requires significant interpretation by engineers to convert these specifications into functional verification assertions. Existing methods for generating assertions from natural language specifications are limited to sentences extracted by engineers, discouraging the practical application. In this work, we present AssertLLM, an automatic assertion generation framework for complete specification files. AssertLLM breaks down the complex task into three phases, incorporating three customized Large Language Models (LLMs) for extracting structural specifications, mapping signal definitions, and generating assertions. Additionally, we provide an open-source benchmark for assessing assertion generation capabilities. Our evaluation of AssertLLM on a full design, encompassing 23 signals, demonstrates that 89% of the generated assertions are both syntactically and functionally accurate. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.13266 [pdf, other]

SpecLLM: Exploring Generation and Review of VLSI Design Specification with Large Language Model

Authors: Mengming Li, Wenji Fang, Qijun Zhang, Zhiyao Xie

Abstract: The development of architecture specifications is an initial and fundamental stage of the integrated circuit (IC) design process. Traditionally, architecture specifications are crafted by experienced chip architects, a process that is not only time-consuming but also error-prone. Mistakes in these specifications may significantly affect subsequent stages of chip design. Despite the presence of adv… ▽ More The development of architecture specifications is an initial and fundamental stage of the integrated circuit (IC) design process. Traditionally, architecture specifications are crafted by experienced chip architects, a process that is not only time-consuming but also error-prone. Mistakes in these specifications may significantly affect subsequent stages of chip design. Despite the presence of advanced electronic design automation (EDA) tools, effective solutions to these specification-related challenges remain scarce. Since writing architecture specifications is naturally a natural language processing (NLP) task, this paper pioneers the automation of architecture specification development with the advanced capabilities of large language models (LLMs). Leveraging our definition and dataset, we explore the application of LLMs in two key aspects of architecture specification development: (1) Generating architecture specifications, which includes both writing specifications from scratch and converting RTL code into detailed specifications. (2) Reviewing existing architecture specifications. We got promising results indicating that LLMs may revolutionize how these critical specification documents are developed in IC design nowadays. By reducing the effort required, LLMs open up new possibilities for efficiency and accuracy in this crucial aspect of chip design. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.02020 [pdf, other]

Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket

Authors: Zhaokun Zhou, Kaiwei Che, Wei Fang, Keyu Tian, Yuesheng Zhu, Shuicheng Yan, Yonghong Tian, Li Yuan

Abstract: Spiking Neural Networks (SNNs), known for their biologically plausible architecture, face the challenge of limited performance. The self-attention mechanism, which is the cornerstone of the high-performance Transformer and also a biologically inspired structure, is absent in existing SNNs. To this end, we explore the potential of leveraging both self-attention capability and biological properties… ▽ More Spiking Neural Networks (SNNs), known for their biologically plausible architecture, face the challenge of limited performance. The self-attention mechanism, which is the cornerstone of the high-performance Transformer and also a biologically inspired structure, is absent in existing SNNs. To this end, we explore the potential of leveraging both self-attention capability and biological properties of SNNs, and propose a novel Spiking Self-Attention (SSA) and Spiking Transformer (Spikformer). The SSA mechanism eliminates the need for softmax and captures the sparse visual feature employing spike-based Query, Key, and Value. This sparse computation without multiplication makes SSA efficient and energy-saving. Further, we develop a Spiking Convolutional Stem (SCS) with supplementary convolutional layers to enhance the architecture of Spikformer. The Spikformer enhanced with the SCS is referred to as Spikformer V2. To train larger and deeper Spikformer V2, we introduce a pioneering exploration of Self-Supervised Learning (SSL) within the SNN. Specifically, we pre-train Spikformer V2 with masking and reconstruction style inspired by the mainstream self-supervised Transformer, and then finetune the Spikformer V2 on the image classification on ImageNet. Extensive experiments show that Spikformer V2 outperforms other previous surrogate training and ANN2SNN methods. An 8-layer Spikformer V2 achieves an accuracy of 80.38% using 4 time steps, and after SSL, a 172M 16-layer Spikformer V2 reaches an accuracy of 81.10% with just 1 time step. To the best of our knowledge, this is the first time that the SNN achieves 80+% accuracy on ImageNet. The code will be available at Spikformer V2. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2401.01070 [pdf, other]

A Novel Dual-Stage Evolutionary Algorithm for Finding Robust Solutions

Authors: Wei Du, Wenxuan Fang, Chen Liang, Yang Tang, Yaochu Jin

Abstract: In robust optimization problems, the magnitude of perturbations is relatively small. Consequently, solutions within certain regions are less likely to represent the robust optima when perturbations are introduced. Hence, a more efficient search process would benefit from increased opportunities to explore promising regions where global optima or good local optima are situated. In this paper, we in… ▽ More In robust optimization problems, the magnitude of perturbations is relatively small. Consequently, solutions within certain regions are less likely to represent the robust optima when perturbations are introduced. Hence, a more efficient search process would benefit from increased opportunities to explore promising regions where global optima or good local optima are situated. In this paper, we introduce a novel robust evolutionary algorithm named the dual-stage robust evolutionary algorithm (DREA) aimed at discovering robust solutions. DREA operates in two stages: the peak-detection stage and the robust solution-searching stage. The primary objective of the peak-detection stage is to identify peaks in the fitness landscape of the original optimization problem. Conversely, the robust solution-searching stage focuses on swiftly identifying the robust optimal solution using information obtained from the peaks discovered in the initial stage. These two stages collectively enable the proposed DREA to efficiently obtain the robust optimal solution for the optimization problem. This approach achieves a balance between solution optimality and robustness by separating the search processes for optimal and robust optimal solutions. Experimental results demonstrate that DREA significantly outperforms five state-of-the-art algorithms across 18 test problems characterized by diverse complexities. Moreover, when evaluated on higher-dimensional robust optimization problems (100-$D$ and 200-$D$), DREA also demonstrates superior performance compared to all five counterpart algorithms. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.08617 [pdf, other]

RTLCoder: Outperforming GPT-3.5 in Design RTL Generation with Our Open-Source Dataset and Lightweight Solution

Authors: Shang Liu, Wenji Fang, Yao Lu, Qijun Zhang, Hongce Zhang, Zhiyao Xie

Abstract: The automatic generation of RTL code (e.g., Verilog) using natural language instructions and large language models (LLMs) has attracted significant research interest recently. However, most existing approaches heavily rely on commercial LLMs such as ChatGPT, while open-source LLMs tailored for this specific design generation task exhibit notably inferior performance. The absence of high-quality op… ▽ More The automatic generation of RTL code (e.g., Verilog) using natural language instructions and large language models (LLMs) has attracted significant research interest recently. However, most existing approaches heavily rely on commercial LLMs such as ChatGPT, while open-source LLMs tailored for this specific design generation task exhibit notably inferior performance. The absence of high-quality open-source solutions restricts the flexibility and data privacy of this emerging technique. In this study, we present a new customized LLM solution with a modest parameter count of only 7B, achieving better performance than GPT-3.5 on two representative benchmarks for RTL code generation. This remarkable balance between accuracy and efficiency is made possible by leveraging our new RTL code dataset and a customized LLM algorithm, both of which will be made fully open-source. Furthermore, we have successfully quantized our LLM to 4-bit with a total size of 4GB, enabling it to function on a single laptop with only slight performance degradation. This efficiency allows the RTL generator to serve as a local assistant for engineers, ensuring all design privacy concerns are addressed. △ Less

Submitted 20 February, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

arXiv:2311.11313 [pdf, other]

doi 10.1145/3656419

Symbolic Execution for Quantum Error Correction Programs

Authors: Wang Fang, Mingsheng Ying

Abstract: We define QSE, a symbolic execution framework for quantum programs by integrating symbolic variables into quantum states and the outcomes of quantum measurements. The soundness of QSE is established through a theorem that ensures the correctness of symbolic execution within operational semantics. We further introduce symbolic stabilizer states, which symbolize the phases of stabilizer generators,… ▽ More We define QSE, a symbolic execution framework for quantum programs by integrating symbolic variables into quantum states and the outcomes of quantum measurements. The soundness of QSE is established through a theorem that ensures the correctness of symbolic execution within operational semantics. We further introduce symbolic stabilizer states, which symbolize the phases of stabilizer generators, for the efficient analysis of quantum error correction (QEC) programs. Within the QSE framework, we can use symbolic expressions to characterize the possible discrete Pauli errors in QEC, providing a significant improvement over existing methods that rely on sampling with simulators. We implement QSE with the support of symbolic stabilizer states in a prototype tool named QuantumSE.jl. Our experiments on representative QEC codes, including quantum repetition codes, Kitaev's toric codes, and quantum Tanner codes, demonstrate the efficiency of QuantumSE.jl for debugging QEC programs with over 1000 qubits. In addition, by substituting concrete values in symbolic expressions of measurement results, QuantumSE.jl is also equipped with a sampling feature for stabilizer circuits. Despite a longer initialization time than the state-of-the-art stabilizer simulator, Google's Stim, QuantumSE.jl offers a quicker sampling rate in the experiments. △ Less

Submitted 28 April, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

Comments: 41pages, 11 figures. v2: fix inappropriate use of Stim. v3: Extended version of PLDI 2024 publication

arXiv:2311.08441 [pdf, other]

MasterRTL: A Pre-Synthesis PPA Estimation Framework for Any RTL Design

Authors: Wenji Fang, Yao Lu, Shang Liu, Qijun Zhang, Ceyu Xu, Lisa Wu Wills, Hongce Zhang, Zhiyao Xie

Abstract: In modern VLSI design flow, the register-transfer level (RTL) stage is a critical point, where designers define precise design behavior with hardware description languages (HDLs) like Verilog. Since the RTL design is in the format of HDL code, the standard way to evaluate its quality requires time-consuming subsequent synthesis steps with EDA tools. This time-consuming process significantly impede… ▽ More In modern VLSI design flow, the register-transfer level (RTL) stage is a critical point, where designers define precise design behavior with hardware description languages (HDLs) like Verilog. Since the RTL design is in the format of HDL code, the standard way to evaluate its quality requires time-consuming subsequent synthesis steps with EDA tools. This time-consuming process significantly impedes design optimization at the early RTL stage. Despite the emergence of some recent ML-based solutions, they fail to maintain high accuracy for any given RTL design. In this work, we propose an innovative pre-synthesis PPA estimation framework named MasterRTL. It first converts the HDL code to a new bit-level design representation named the simple operator graph (SOG). By only adopting single-bit simple operators, this SOG proves to be a general representation that unifies different design types and styles. The SOG is also more similar to the target gate-level netlist, reducing the gap between RTL representation and netlist. In addition to the new SOG representation, MasterRTL proposes new ML methods for the RTL-stage modeling of timing, power, and area separately. Compared with state-of-the-art solutions, the experiment on a comprehensive dataset with 90 different designs shows accuracy improvement by 0.33, 0.22, and 0.15 in correlation for total negative slack (TNS), worst negative slack (WNS), and power, respectively. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: To be published in the Proceedings of 42nd IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2023

arXiv:2311.03906 [pdf, other]

SymPhase: Phase Symbolization for Fast Simulation of Stabilizer Circuits

Authors: Wang Fang, Mingsheng Ying

Abstract: This paper proposes an efficient stabilizer circuit simulation algorithm that only traverses the circuit forward once. We introduce phase symbolization into stabilizer generators, which allows possible Pauli faults in the circuit to be accumulated explicitly as symbolic expressions in the phases of stabilizer generators. This way, the measurement outcomes are also symbolic expressions, and we can… ▽ More This paper proposes an efficient stabilizer circuit simulation algorithm that only traverses the circuit forward once. We introduce phase symbolization into stabilizer generators, which allows possible Pauli faults in the circuit to be accumulated explicitly as symbolic expressions in the phases of stabilizer generators. This way, the measurement outcomes are also symbolic expressions, and we can sample them by substituting the symbolic variables with concrete values, without traversing the circuit repeatedly. We show how to integrate symbolic phases into the stabilizer tableau and maintain them efficiently using bit-vector encoding. A new data layout of the stabilizer tableau in memory is proposed, which improves the performance of our algorithm (and other stabilizer simulation algorithms based on the stabilizer tableau). We implement our algorithm and data layout in a Julia package named SymPhase.jl, and compare it with Stim, the state-of-the-art simulator, on several benchmarks. We show that SymPhase.jl has superior performance in terms of sampling time, which is crucial for generating a large number of samples for further analysis. △ Less

Submitted 21 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: 7 pages, 3 figures; v2: fix inappropriate use of Stim

arXiv:2310.17890 [pdf, other]

Submodel Partitioning in Hierarchical Federated Learning: Algorithm Design and Convergence Analysis

Authors: Wenzhi Fang, Dong-Jun Han, Christopher G. Brinton

Abstract: Hierarchical federated learning (HFL) has demonstrated promising scalability advantages over the traditional "star-topology" architecture-based federated learning (FL). However, HFL still imposes significant computation, communication, and storage burdens on the edge, especially when training a large-scale model over resource-constrained Internet of Things (IoT) devices. In this paper, we propose… ▽ More Hierarchical federated learning (HFL) has demonstrated promising scalability advantages over the traditional "star-topology" architecture-based federated learning (FL). However, HFL still imposes significant computation, communication, and storage burdens on the edge, especially when training a large-scale model over resource-constrained Internet of Things (IoT) devices. In this paper, we propose hierarchical independent submodel training (HIST), a new FL methodology that aims to address these issues in hierarchical settings. The key idea behind HIST is a hierarchical version of model partitioning, where we partition the global model into disjoint submodels in each round, and distribute them across different cells, so that each cell is responsible for training only one partition of the full model. This enables each client to save computation/storage costs while alleviating the communication loads throughout the hierarchy. We characterize the convergence behavior of HIST for non-convex loss functions under mild assumptions, showing the impact of several attributes (e.g., number of cells, local and global aggregation frequency) on the performance-efficiency tradeoff. Finally, through numerical experiments, we verify that HIST is able to save communication costs by a wide margin while achieving the same target testing accuracy. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 14 pages, 4 figures

arXiv:2310.16620 [pdf, other]

SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence

Authors: Wei Fang, Yanqi Chen, Jianhao Ding, Zhaofei Yu, Timothée Masquelier, Ding Chen, Liwei Huang, Huihui Zhou, Guoqi Li, Yonghong Tian

Abstract: Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency by introducing neural dynamics and spike properties. As the emerging spiking deep learning paradigm attracts increasing interest, traditional programming frameworks cannot meet the demands of the automatic differentiation, parallel computation acceleration, and high integrati… ▽ More Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency by introducing neural dynamics and spike properties. As the emerging spiking deep learning paradigm attracts increasing interest, traditional programming frameworks cannot meet the demands of the automatic differentiation, parallel computation acceleration, and high integration of processing neuromorphic datasets and deployment. In this work, we present the SpikingJelly framework to address the aforementioned dilemma. We contribute a full-stack toolkit for pre-processing neuromorphic datasets, building deep SNNs, optimizing their parameters, and deploying SNNs on neuromorphic chips. Compared to existing methods, the training of deep SNNs can be accelerated $11\times$, and the superior extensibility and flexibility of SpikingJelly enable users to accelerate custom models at low costs through multilevel inheritance and semiautomatic code generation. SpikingJelly paves the way for synthesizing truly energy-efficient SNN-based machine intelligence systems, which will enrich the ecology of neuromorphic computing. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted in Science Advances (https://www.science.org/doi/10.1126/sciadv.adi1480)

arXiv:2310.07524 [pdf, ps, other]

New Lower Bounds for the Minimum Distance of Cyclic Codes and Applications to Locally Repairable Codes

Authors: Jing Qiu, Weijun Fang, Fang-Wei Fu

Abstract: Cyclic codes are an important class of linear codes. Bounding the minimum distance of cyclic codes is a long-standing research topic in coding theory, and several well-known and basic results have been developed on this topic. Recently, locally repairable codes (LRCs) have attracted much attention due to their repair efficiency in large-scale distributed storage systems. In this paper, by employin… ▽ More Cyclic codes are an important class of linear codes. Bounding the minimum distance of cyclic codes is a long-standing research topic in coding theory, and several well-known and basic results have been developed on this topic. Recently, locally repairable codes (LRCs) have attracted much attention due to their repair efficiency in large-scale distributed storage systems. In this paper, by employing the singleton procedure technique, we first provide a sufficient condition for bounding the minimum distance of cyclic codes with typical defining sets. Secondly, by considering a specific case, we establish a connection between bounds for the minimum distance of cyclic codes and solutions to a system of inequalities. This connection leads to the derivation of new bounds, including some with general patterns. In particular, we provide three new bounds with general patterns, one of which serves as a generalization of the Betti-Sala bound. Finally, we present a generalized lower bound for a special case and construct several families of $(2, δ)$-LRCs with unbounded length and minimum distance $2δ$. It turns out that these LRCs are distance-optimal, and their parameters are new. To the best of our knowledge, this work represents the first construction of distance-optimal $(r, δ)$-LRCs with unbounded length and minimum distance exceeding $r+δ-1$. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 35 pages

arXiv:2309.04819 [pdf, other]

doi 10.1145/3576915.3623108

Detecting Violations of Differential Privacy for Quantum Algorithms

Authors: Ji Guan, Wang Fang, Mingyu Huang, Mingsheng Ying

Abstract: Quantum algorithms for solving a wide range of practical problems have been proposed in the last ten years, such as data search and analysis, product recommendation, and credit scoring. The concern about privacy and other ethical issues in quantum computing naturally rises up. In this paper, we define a formal framework for detecting violations of differential privacy for quantum algorithms. A det… ▽ More Quantum algorithms for solving a wide range of practical problems have been proposed in the last ten years, such as data search and analysis, product recommendation, and credit scoring. The concern about privacy and other ethical issues in quantum computing naturally rises up. In this paper, we define a formal framework for detecting violations of differential privacy for quantum algorithms. A detection algorithm is developed to verify whether a (noisy) quantum algorithm is differentially private and automatically generate bugging information when the violation of differential privacy is reported. The information consists of a pair of quantum states that violate the privacy, to illustrate the cause of the violation. Our algorithm is equipped with Tensor Networks, a highly efficient data structure, and executed both on TensorFlow Quantum and TorchQuantum which are the quantum extensions of famous machine learning platforms -- TensorFlow and PyTorch, respectively. The effectiveness and efficiency of our algorithm are confirmed by the experimental results of almost all types of quantum algorithms already implemented on realistic quantum computers, including quantum supremacy algorithms (beyond the capability of classical algorithms), quantum machine learning models, quantum approximate optimization algorithms, and variational quantum eigensolvers with up to 21 quantum bits. △ Less

Submitted 9 September, 2023; originally announced September 2023.

Journal ref: In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS 2023)

arXiv:2308.13904 [pdf, other]

doi 10.14722/ndss.2024.23238

LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors

Authors: Chengkun Wei, Wenlong Meng, Zhikun Zhang, Min Chen, Minghu Zhao, Wenjing Fang, Lei Wang, Zihui Zhang, Wenzhi Chen

Abstract: Prompt-tuning has emerged as an attractive paradigm for deploying large-scale language models due to its strong downstream task performance and efficient multitask serving ability. Despite its wide adoption, we empirically show that prompt-tuning is vulnerable to downstream task-agnostic backdoors, which reside in the pretrained models and can affect arbitrary downstream tasks. The state-of-the-ar… ▽ More Prompt-tuning has emerged as an attractive paradigm for deploying large-scale language models due to its strong downstream task performance and efficient multitask serving ability. Despite its wide adoption, we empirically show that prompt-tuning is vulnerable to downstream task-agnostic backdoors, which reside in the pretrained models and can affect arbitrary downstream tasks. The state-of-the-art backdoor detection approaches cannot defend against task-agnostic backdoors since they hardly converge in reversing the backdoor triggers. To address this issue, we propose LMSanitator, a novel approach for detecting and removing task-agnostic backdoors on Transformer models. Instead of directly inverting the triggers, LMSanitator aims to invert the predefined attack vectors (pretrained models' output when the input is embedded with triggers) of the task-agnostic backdoors, which achieves much better convergence performance and backdoor detection accuracy. LMSanitator further leverages prompt-tuning's property of freezing the pretrained model to perform accurate and fast output monitoring and input purging during the inference phase. Extensive experiments on multiple language models and NLP tasks illustrate the effectiveness of LMSanitator. For instance, LMSanitator achieves 92.8% backdoor detection accuracy on 960 models and decreases the attack success rate to less than 1% in most scenarios. △ Less

Submitted 14 October, 2023; v1 submitted 26 August, 2023; originally announced August 2023.

Comments: To Appear in the Network and Distributed System Security (NDSS) Symposium 2024, 26 February - 1 March 2024, San Diego, CA, USA; typos corrected

arXiv:2308.04840 [pdf]

Analyzing and controlling diversity in quantum-behaved particle swarm optimization

Authors: Li-Wei Li, Jun Sun, Chao Li, Wei Fang, Vasile Palade, Xiao-Jun Wu

Abstract: This paper addresses the issues of controlling and analyzing the population diversity in quantum-behaved particle swarm optimization (QPSO), which is an optimization approach motivated by concepts in quantum mechanics and PSO. In order to gain an in-depth understanding of the role the diversity plays in the evolving process, we first define the genotype diversity by the distance to the average poi… ▽ More This paper addresses the issues of controlling and analyzing the population diversity in quantum-behaved particle swarm optimization (QPSO), which is an optimization approach motivated by concepts in quantum mechanics and PSO. In order to gain an in-depth understanding of the role the diversity plays in the evolving process, we first define the genotype diversity by the distance to the average point of the particles' positions and the phenotype diversity by the fitness values for the QPSO. Then, the correlations between the two types of diversities and the search performance are tested and analyzed on several benchmark functions, and the distance-to-average-point diversity is showed to have stronger association with the search performance during the evolving processes. Finally, in the light of the performed diversity analyses, two strategies for controlling the distance-to-average-point diversities are proposed for the purpose of improving the search ability of the QPSO algorithm. Empirical studies on the QPSO with the introduced diversity control methods are performed on a set of benchmark functions from the CEC 2005 benchmark suite. The performance of the proposed methods are evaluated and compared with the original QPSO and other PSO variants. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2307.07138 [pdf, ps, other]

Reconfigurable Intelligent Surface Assisted Free Space Optical Information and Power Transfer

Authors: Wen Fang, Wen Chen, Qingqing Wu, Kunlun Wang, Shunqing Zhang, Qingwen Liu, Jun Li

Abstract: Free space optical (FSO) transmission has emerged as a key candidate technology for 6G to expand new spectrum and improve network capacity due to its advantages of large bandwidth, low electromagnetic interference, and high energy efficiency. Resonant beam operating in the infrared band utilizes spatially separated laser cavities to enable safe and mobile high-power energy and high-rate informatio… ▽ More Free space optical (FSO) transmission has emerged as a key candidate technology for 6G to expand new spectrum and improve network capacity due to its advantages of large bandwidth, low electromagnetic interference, and high energy efficiency. Resonant beam operating in the infrared band utilizes spatially separated laser cavities to enable safe and mobile high-power energy and high-rate information transmission but is limited by line-of-sight (LOS) channel. In this paper, we propose a reconfigurable intelligent surface (RIS) assisted resonant beam simultaneous wireless information and power transfer (SWIPT) system and establish an optical field propagation model to analyze the channel state information (CSI), in which LOS obstruction can be detected sensitively and non-line-of-sight (NLOS) transmission can be realized by changing the phased of resonant beam in RIS. Numerical results demonstrate that, apart from the transmission distance, the NLOS performance depends on both the horizontal and vertical positions of RIS. The maximum NLOS energy efficiency can achieve 55% within a transfer distance of 10m, a translation distance of $\pm$4mm, and rotation angle of $\pm$50°. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.04323 [pdf, ps, other]

Optimal $(2,δ)$ Locally Repairable Codes via Punctured Simplex Codes

Authors: Yuan Gao, Weijun Fang, Jingke Xu, Dong Wang, Sihuang Hu

Abstract: Locally repairable codes (LRCs) have attracted a lot of attention due to their applications in distributed storage systems. In this paper, we provide new constructions of optimal $(2, δ)$-LRCs over $\mathbb{F}_q$ with flexible parameters. Firstly, employing techniques from finite geometry, we introduce a simple yet useful condition to ensure that a punctured simplex code becomes a $(2, δ)$-LRC. It… ▽ More Locally repairable codes (LRCs) have attracted a lot of attention due to their applications in distributed storage systems. In this paper, we provide new constructions of optimal $(2, δ)$-LRCs over $\mathbb{F}_q$ with flexible parameters. Firstly, employing techniques from finite geometry, we introduce a simple yet useful condition to ensure that a punctured simplex code becomes a $(2, δ)$-LRC. It is worth noting that this condition only imposes a requirement on the size of the puncturing set. Secondly, utilizing character sums over finite fields and Krawtchouk polynomials, we determine the parameters of more punctured simplex codes with puncturing sets of new structures. Several infinite families of LRCs with new parameters are derived. All of our new LRCs are optimal with respect to the generalized Cadambe-Mazumdar bound and some of them are also Griesmer codes or distance-optimal codes. △ Less

Submitted 18 June, 2024; v1 submitted 9 July, 2023; originally announced July 2023.

MSC Class: 94B60; 11T71

arXiv:2306.00807 [pdf, other]

Auto-Spikformer: Spikformer Architecture Search

Authors: Kaiwei Che, Zhaokun Zhou, Zhengyu Ma, Wei Fang, Yanqi Chen, Shuaijie Shen, Li Yuan, Yonghong Tian

Abstract: The integration of self-attention mechanisms into Spiking Neural Networks (SNNs) has garnered considerable interest in the realm of advanced deep learning, primarily due to their biological properties. Recent advancements in SNN architecture, such as Spikformer, have demonstrated promising outcomes by leveraging Spiking Self-Attention (SSA) and Spiking Patch Splitting (SPS) modules. However, we ob… ▽ More The integration of self-attention mechanisms into Spiking Neural Networks (SNNs) has garnered considerable interest in the realm of advanced deep learning, primarily due to their biological properties. Recent advancements in SNN architecture, such as Spikformer, have demonstrated promising outcomes by leveraging Spiking Self-Attention (SSA) and Spiking Patch Splitting (SPS) modules. However, we observe that Spikformer may exhibit excessive energy consumption, potentially attributable to redundant channels and blocks. To mitigate this issue, we propose Auto-Spikformer, a one-shot Transformer Architecture Search (TAS) method, which automates the quest for an optimized Spikformer architecture. To facilitate the search process, we propose methods Evolutionary SNN neurons (ESNN), which optimizes the SNN parameters, and apply the previous method of weight entanglement supernet training, which optimizes the Vision Transformer (ViT) parameters. Moreover, we propose an accuracy and energy balanced fitness function $\mathcal{F}_{AEB}$ that jointly considers both energy consumption and accuracy, and aims to find a Pareto optimal combination that balances these two objectives. Our experimental results demonstrate the effectiveness of Auto-Spikformer, which outperforms the state-of-the-art method including CNN or ViT models that are manually or automatically designed while significantly reducing energy consumption. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.17080 [pdf, other]

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering

Authors: Yung-Sung Chuang, Wei Fang, Shang-Wen Li, Wen-tau Yih, James Glass

Abstract: We propose EAR, a query Expansion And Reranking approach for improving passage retrieval, with the application to open-domain question answering. EAR first applies a query expansion model to generate a diverse set of queries, and then uses a query reranker to select the ones that could lead to better retrieval results. Motivated by the observation that the best query expansion often is not picked… ▽ More We propose EAR, a query Expansion And Reranking approach for improving passage retrieval, with the application to open-domain question answering. EAR first applies a query expansion model to generate a diverse set of queries, and then uses a query reranker to select the ones that could lead to better retrieval results. Motivated by the observation that the best query expansion often is not picked by greedy decoding, EAR trains its reranker to predict the rank orders of the gold passages when issuing the expanded queries to a given retriever. By connecting better the query expansion model and retriever, EAR significantly enhances a traditional sparse retrieval method, BM25. Empirically, EAR improves top-5/20 accuracy by 3-8 and 5-10 points in in-domain and out-of-domain settings, respectively, when compared to a vanilla query expansion model, GAR, and a dense retrieval model, DPR. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: ACL 2023 long paper (Findings)

arXiv:2305.13909 [pdf, other]

Temporal Contrastive Learning for Spiking Neural Networks

Authors: Haonan Qiu, Zeyin Song, Yanqi Chen, Munan Ning, Wei Fang, Tao Sun, Zhengyu Ma, Li Yuan, Yonghong Tian

Abstract: Biologically inspired spiking neural networks (SNNs) have garnered considerable attention due to their low-energy consumption and spatio-temporal information processing capabilities. Most existing SNNs training methods first integrate output information across time steps, then adopt the cross-entropy (CE) loss to supervise the prediction of the average representations. However, in this work, we fi… ▽ More Biologically inspired spiking neural networks (SNNs) have garnered considerable attention due to their low-energy consumption and spatio-temporal information processing capabilities. Most existing SNNs training methods first integrate output information across time steps, then adopt the cross-entropy (CE) loss to supervise the prediction of the average representations. However, in this work, we find the method above is not ideal for the SNNs training as it omits the temporal dynamics of SNNs and degrades the performance quickly with the decrease of inference time steps. One tempting method to model temporal correlations is to apply the same label supervision at each time step and treat them identically. Although it can acquire relatively consistent performance across various time steps, it still faces challenges in obtaining SNNs with high performance. Inspired by these observations, we propose Temporal-domain supervised Contrastive Learning (TCL) framework, a novel method to obtain SNNs with low latency and high performance by incorporating contrastive supervision with temporal domain information. Contrastive learning (CL) prompts the network to discern both consistency and variability in the representation space, enabling it to better learn discriminative and generalizable features. We extend this concept to the temporal domain of SNNs, allowing us to flexibly and fully leverage the correlation between representations at different time steps. Furthermore, we propose a Siamese Temporal-domain supervised Contrastive Learning (STCL) framework to enhance the SNNs via augmentation, temporal and class constraints simultaneously. Extensive experimental results demonstrate that SNNs trained by our TCL and STCL can achieve both high performance and low latency, achieving state-of-the-art performance on a variety of datasets (e.g., CIFAR-10, CIFAR-100, and DVS-CIFAR10). △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2304.13938 [pdf, other]

doi 10.1016/j.compmedimag.2023.102273

A Deep Registration Method for Accurate Quantification of Joint Space Narrowing Progression in Rheumatoid Arthritis

Authors: Haolin Wang, Yafei Ou, Wanxuan Fang, Prasoon Ambalathankandy, Naoto Goto, Gen Ota, Masayuki Ikebe, Tamotsu Kamishima

Abstract: Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disease that results in progressive articular destruction and severe disability. Joint space narrowing (JSN) progression has been regarded as an important indicator for RA progression and has received sustained attention. In the diagnosis and monitoring of RA, radiology plays a crucial role to monitor joint space. A new framework for m… ▽ More Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disease that results in progressive articular destruction and severe disability. Joint space narrowing (JSN) progression has been regarded as an important indicator for RA progression and has received sustained attention. In the diagnosis and monitoring of RA, radiology plays a crucial role to monitor joint space. A new framework for monitoring joint space by quantifying JSN progression through image registration in radiographic images has been developed. This framework offers the advantage of high accuracy, however, challenges do exist in reducing mismatches and improving reliability. In this work, a deep intra-subject rigid registration network is proposed to automatically quantify JSN progression in the early stage of RA. In our experiments, the mean-square error of Euclidean distance between moving and fixed image is 0.0031, standard deviation is 0.0661 mm, and the mismatching rate is 0.48\%. The proposed method has sub-pixel level accuracy, exceeding manual measurements by far, and is equipped with immune to noise, rotation, and scaling of joints. Moreover, this work provides loss visualization, which can aid radiologists and rheumatologists in assessing quantification reliability, with important implications for possible future clinical applications. As a result, we are optimistic that this proposed work will make a significant contribution to the automatic quantification of JSN progression in RA. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: 11 pages, 9 figures, 7 tables

MSC Class: 68T45 ACM Class: I.4

arXiv:2304.13398 [pdf, other]

Acceleration for Timing-Aware Gate-Level Logic Simulation with One-Pass GPU Parallelism

Authors: Weijie Fang, Yanggeng Fu, Jiaquan Gao, Longkun Guo, Gregory Gutin, Xiaoyan Zhang

Abstract: Witnessing the advancing scale and complexity of chip design and benefiting from high-performance computation technologies, the simulation of Very Large Scale Integration (VLSI) Circuits imposes an increasing requirement for acceleration through parallel computing with GPU devices. However, the conventional parallel strategies do not fully align with modern GPU abilities, leading to new challenges… ▽ More Witnessing the advancing scale and complexity of chip design and benefiting from high-performance computation technologies, the simulation of Very Large Scale Integration (VLSI) Circuits imposes an increasing requirement for acceleration through parallel computing with GPU devices. However, the conventional parallel strategies do not fully align with modern GPU abilities, leading to new challenges in the parallelism of VLSI simulation when using GPU, despite some previous successful demonstrations of significant acceleration. In this paper, we propose a novel approach to accelerate 4-value logic timing-aware gate-level logic simulation using waveform-based GPU parallelism. Our approach utilizes a new strategy that can effectively handle the dependency between tasks during the parallelism, reducing the synchronization requirement between CPU and GPU when parallelizing the simulation on combinational circuits. This approach requires only one round of data transfer and hence achieves one-pass parallelism. Moreover, to overcome the difficulty within the adoption of our strategy in GPU devices, we design a series of data structures and tune them to dynamically allocate and store new-generated output with uncertain scale. Finally, experiments are carried out on industrial-scale open-source benchmarks to demonstrate the performance gain of our approach compared to several state-of-the-art baselines. △ Less

Submitted 26 April, 2023; originally announced April 2023.

arXiv:2304.12760 [pdf, other]

Parallel Spiking Neurons with High Efficiency and Ability to Learn Long-term Dependencies

Authors: Wei Fang, Zhaofei Yu, Zhaokun Zhou, Ding Chen, Yanqi Chen, Zhengyu Ma, Timothée Masquelier, Yonghong Tian

Abstract: Vanilla spiking neurons in Spiking Neural Networks (SNNs) use charge-fire-reset neuronal dynamics, which can only be simulated serially and can hardly learn long-time dependencies. We find that when removing reset, the neuronal dynamics can be reformulated in a non-iterative form and parallelized. By rewriting neuronal dynamics without reset to a general formulation, we propose the Parallel Spikin… ▽ More Vanilla spiking neurons in Spiking Neural Networks (SNNs) use charge-fire-reset neuronal dynamics, which can only be simulated serially and can hardly learn long-time dependencies. We find that when removing reset, the neuronal dynamics can be reformulated in a non-iterative form and parallelized. By rewriting neuronal dynamics without reset to a general formulation, we propose the Parallel Spiking Neuron (PSN), which generates hidden states that are independent of their predecessors, resulting in parallelizable neuronal dynamics and extremely high simulation speed. The weights of inputs in the PSN are fully connected, which maximizes the utilization of temporal information. To avoid the use of future inputs for step-by-step inference, the weights of the PSN can be masked, resulting in the masked PSN. By sharing weights across time-steps based on the masked PSN, the sliding PSN is proposed to handle sequences of varying lengths. We evaluate the PSN family on simulation speed and temporal/static data classification, and the results show the overwhelming advantage of the PSN family in efficiency and accuracy. To the best of our knowledge, this is the first study about parallelizing spiking neurons and can be a cornerstone for the spiking deep learning research. Our codes are available at \url{https://github.com/fangwei123456/Parallel-Spiking-Neuron}. △ Less

Submitted 9 January, 2024; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: Accepted in NeurIPS 2023

arXiv:2304.03728 [pdf, other]

Interpretable Unified Language Checking

Authors: Tianhua Zhang, Hongyin Luo, Yung-Sung Chuang, Wei Fang, Luc Gaitskell, Thomas Hartvigsen, Xixin Wu, Danny Fox, Helen Meng, James Glass

Abstract: Despite recent concerns about undesirable behaviors generated by large language models (LLMs), including non-factual, biased, and hateful language, we find LLMs are inherent multi-task language checkers based on their latent representations of natural and social knowledge. We present an interpretable, unified, language checking (UniLC) method for both human and machine-generated language that aims… ▽ More Despite recent concerns about undesirable behaviors generated by large language models (LLMs), including non-factual, biased, and hateful language, we find LLMs are inherent multi-task language checkers based on their latent representations of natural and social knowledge. We present an interpretable, unified, language checking (UniLC) method for both human and machine-generated language that aims to check if language input is factual and fair. While fairness and fact-checking tasks have been handled separately with dedicated models, we find that LLMs can achieve high performance on a combination of fact-checking, stereotype detection, and hate speech detection tasks with a simple, few-shot, unified set of prompts. With the ``1/2-shot'' multi-task language checking method proposed in this work, the GPT3.5-turbo model outperforms fully supervised baselines on several language tasks. The simple approach and results suggest that based on strong latent knowledge representations, an LLM can be an adaptive and explainable tool for detecting misinformation, stereotypes, and hate speech. △ Less

Submitted 7 April, 2023; originally announced April 2023.

Comments: 10 + 5 pages

arXiv:2303.17895 [pdf, other]

EA-LSS: Edge-aware Lift-splat-shot Framework for 3D BEV Object Detection

Authors: Haotian Hu, Fanyi Wang, Jingwen Su, Yaonong Wang, Laifeng Hu, Weiye Fang, Jingwei Xu, Zhiwang Zhang

Abstract: In recent years, great progress has been made in the Lift-Splat-Shot-based (LSS-based) 3D object detection method. However, inaccurate depth estimation remains an important constraint to the accuracy of camera-only and multi-model 3D object detection models, especially in regions where the depth changes significantly (i.e., the "depth jump" problem). In this paper, we proposed a novel Edge-aware L… ▽ More In recent years, great progress has been made in the Lift-Splat-Shot-based (LSS-based) 3D object detection method. However, inaccurate depth estimation remains an important constraint to the accuracy of camera-only and multi-model 3D object detection models, especially in regions where the depth changes significantly (i.e., the "depth jump" problem). In this paper, we proposed a novel Edge-aware Lift-splat-shot (EA-LSS) framework. Specifically, edge-aware depth fusion (EADF) module is proposed to alleviate the "depth jump" problem and fine-grained depth (FGD) module to further enforce refined supervision on depth. Our EA-LSS framework is compatible for any LSS-based 3D object detection models, and effectively boosts their performances with negligible increment of inference time. Experiments on nuScenes benchmarks demonstrate that EA-LSS is effective in either camera-only or multi-model models. It is worth mentioning that EA-LSS achieved the state-of-the-art performance on nuScenes test benchmarks with mAP and NDS of 76.5% and 77.6%, respectively. △ Less

Submitted 29 August, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

arXiv:2303.06287 [pdf, ps, other]

Singleton-Optimal LRCs and Perfect LRCs via Cyclic and Constacyclic Codes

Authors: Weijun Fang, Fang-Wei Fu, Bin Chen, Shu-Tao Xia

Abstract: Locally repairable codes (LRCs) have emerged as an important coding scheme in distributed storage systems (DSSs) with relatively low repair cost by accessing fewer non-failure nodes. Theoretical bounds and optimal constructions of LRCs have been widely investigated. Optimal LRCs via cyclic and constacyclic codes provide significant benefit of elegant algebraic structure and efficient encoding proc… ▽ More Locally repairable codes (LRCs) have emerged as an important coding scheme in distributed storage systems (DSSs) with relatively low repair cost by accessing fewer non-failure nodes. Theoretical bounds and optimal constructions of LRCs have been widely investigated. Optimal LRCs via cyclic and constacyclic codes provide significant benefit of elegant algebraic structure and efficient encoding procedure. In this paper, we continue to consider the constructions of optimal LRCs via cyclic and constacyclic codes with long code length. Specifically, we first obtain two classes of $q$-ary cyclic Singleton-optimal $(n, k, d=6;r=2)$-LRCs with length $n=3(q+1)$ when $3 \mid (q-1)$ and $q$ is even, and length $n=\frac{3}{2}(q+1)$ when $3 \mid (q-1)$ and $q \equiv 1(\bmod~4)$, respectively. To the best of our knowledge, this is the first construction of $q$-ary cyclic Singleton-optimal LRCs with length $n>q+1$ and minimum distance $d \geq 5$. On the other hand, an LRC acheiving the Hamming-type bound is called a perfect LRC. By using cyclic and constacyclic codes, we construct two new families of $q$-ary perfect LRCs with length $n=\frac{q^m-1}{q-1}$, minimum distance $d=5$ and locality $r=2$. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2303.04347 [pdf, ps, other]

Optimal ANN-SNN Conversion for High-accuracy and Ultra-low-latency Spiking Neural Networks

Authors: Tong Bu, Wei Fang, Jianhao Ding, PengLin Dai, Zhaofei Yu, Tiejun Huang

Abstract: Spiking Neural Networks (SNNs) have gained great attraction due to their distinctive properties of low power consumption and fast inference on neuromorphic hardware. As the most effective method to get deep SNNs, ANN-SNN conversion has achieved comparable performance as ANNs on large-scale datasets. Despite this, it requires long time-steps to match the firing rates of SNNs to the activation of AN… ▽ More Spiking Neural Networks (SNNs) have gained great attraction due to their distinctive properties of low power consumption and fast inference on neuromorphic hardware. As the most effective method to get deep SNNs, ANN-SNN conversion has achieved comparable performance as ANNs on large-scale datasets. Despite this, it requires long time-steps to match the firing rates of SNNs to the activation of ANNs. As a result, the converted SNN suffers severe performance degradation problems with short time-steps, which hamper the practical application of SNNs. In this paper, we theoretically analyze ANN-SNN conversion error and derive the estimated activation function of SNNs. Then we propose the quantization clip-floor-shift activation function to replace the ReLU activation function in source ANNs, which can better approximate the activation function of SNNs. We prove that the expected conversion error between SNNs and ANNs is zero, enabling us to achieve high-accuracy and ultra-low-latency SNNs. We evaluate our method on CIFAR-10/100 and ImageNet datasets, and show that it outperforms the state-of-the-art ANN-SNN and directly trained SNNs in both accuracy and time-steps. To the best of our knowledge, this is the first time to explore high-performance ANN-SNN conversion with ultra-low latency (4 time-steps). Code is available at https://github.com/putshua/SNN\_conversion\_QCFS △ Less

Submitted 7 March, 2023; originally announced March 2023.

Journal ref: International Conference on Learning Representations (2022)

arXiv:2302.13019 [pdf, other]

A Unified Framework for Soft Threshold Pruning

Authors: Yanqi Chen, Zhengyu Ma, Wei Fang, Xiawu Zheng, Zhaofei Yu, Yonghong Tian

Abstract: Soft threshold pruning is among the cutting-edge pruning methods with state-of-the-art performance. However, previous methods either perform aimless searching on the threshold scheduler or simply set the threshold trainable, lacking theoretical explanation from a unified perspective. In this work, we reformulate soft threshold pruning as an implicit optimization problem solved using the Iterative… ▽ More Soft threshold pruning is among the cutting-edge pruning methods with state-of-the-art performance. However, previous methods either perform aimless searching on the threshold scheduler or simply set the threshold trainable, lacking theoretical explanation from a unified perspective. In this work, we reformulate soft threshold pruning as an implicit optimization problem solved using the Iterative Shrinkage-Thresholding Algorithm (ISTA), a classic method from the fields of sparse recovery and compressed sensing. Under this theoretical framework, all threshold tuning strategies proposed in previous studies of soft threshold pruning are concluded as different styles of tuning $L_1$-regularization term. We further derive an optimal threshold scheduler through an in-depth study of threshold scheduling based on our framework. This scheduler keeps $L_1$-regularization coefficient stable, implying a time-invariant objective function from the perspective of optimization. In principle, the derived pruning algorithm could sparsify any mathematical model trained via SGD. We conduct extensive experiments and verify its state-of-the-art performance on both Artificial Neural Networks (ResNet-50 and MobileNet-V1) and Spiking Neural Networks (SEW ResNet-18) on ImageNet datasets. On the basis of this framework, we derive a family of pruning methods, including sparsify-during-training, early pruning, and pruning at initialization. The code is available at https://github.com/Yanqi-Chen/LATS. △ Less

Submitted 25 February, 2023; originally announced February 2023.

Comments: To appear in the 11th International Conference on Learning Representations (ICLR 2023)

arXiv:2301.12291 [pdf, other]

CancerUniT: Towards a Single Unified Model for Effective Detection, Segmentation, and Diagnosis of Eight Major Cancers Using a Large Collection of CT Scans

Authors: Jieneng Chen, Yingda Xia, Jiawen Yao, Ke Yan, Jianpeng Zhang, Le Lu, Fakai Wang, Bo Zhou, Mingyan Qiu, Qihang Yu, Mingze Yuan, Wei Fang, Yuxing Tang, Minfeng Xu, Jian Zhou, Yuqian Zhao, Qifeng Wang, Xianghua Ye, Xiaoli Yin, Yu Shi, Xin Chen, Jingren Zhou, Alan Yuille, Zaiyi Liu, Ling Zhang

Abstract: Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice, while most medical AI systems are built to focus on single organs with a narrow list of a few diseases. This might severely limit AI's clinical adoption. A certain number of AI models need to be assembled non-trivially to match the diagnostic process of a human reading… ▽ More Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice, while most medical AI systems are built to focus on single organs with a narrow list of a few diseases. This might severely limit AI's clinical adoption. A certain number of AI models need to be assembled non-trivially to match the diagnostic process of a human reading a CT scan. In this paper, we construct a Unified Tumor Transformer (CancerUniT) model to jointly detect tumor existence & location and diagnose tumor characteristics for eight major cancers in CT scans. CancerUniT is a query-based Mask Transformer model with the output of multi-tumor prediction. We decouple the object queries into organ queries, tumor detection queries and tumor diagnosis queries, and further establish hierarchical relationships among the three groups. This clinically-inspired architecture effectively assists inter- and intra-organ representation learning of tumors and facilitates the resolution of these complex, anatomically related multi-organ cancer image reading tasks. CancerUniT is trained end-to-end using a curated large-scale CT images of 10,042 patients including eight major types of cancers and occurring non-cancer tumors (all are pathology-confirmed with 3D tumor masks annotated by radiologists). On the test set of 631 patients, CancerUniT has demonstrated strong performance under a set of clinically relevant evaluation metrics, substantially outperforming both multi-disease methods and an assembly of eight single-organ expert models in tumor detection, segmentation, and diagnosis. This moves one step closer towards a universal high performance cancer screening tool. △ Less

Submitted 6 October, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

Comments: ICCV 2023 Camera Ready Version

arXiv:2211.06579 [pdf]

Explainable Artificial Intelligence: Precepts, Methods, and Opportunities for Research in Construction

Authors: Peter ED Love, Weili Fang, Jane Matthews, Stuart Porter, Hanbin Luo, Lieyun Ding

Abstract: Explainable artificial intelligence has received limited attention in construction despite its growing importance in various other industrial sectors. In this paper, we provide a narrative review of XAI to raise awareness about its potential in construction. Our review develops a taxonomy of the XAI literature comprising its precepts and approaches. Opportunities for future XAI research focusing o… ▽ More Explainable artificial intelligence has received limited attention in construction despite its growing importance in various other industrial sectors. In this paper, we provide a narrative review of XAI to raise awareness about its potential in construction. Our review develops a taxonomy of the XAI literature comprising its precepts and approaches. Opportunities for future XAI research focusing on stakeholder desiderata and data and information fusion are identified and discussed. We hope the opportunities we suggest stimulate new lines of inquiry to help alleviate the scepticism and hesitancy toward AI adoption and integration in construction. △ Less

Submitted 10 February, 2023; v1 submitted 12 November, 2022; originally announced November 2022.

Comments: 56 pages, 3 figures. arXiv admin note: text overlap with arXiv:1910.10045 by other authors

ACM Class: H.0; H.4; J.0

arXiv:2211.06561 [pdf]

Explainable Artificial Intelligence in Construction: The Content, Context, Process, Outcome Evaluation Framework

Authors: Peter ED Love, Jane Matthews, Weili Fang, Stuart Porter, Hanbin Luo, Lieyun Ding

Abstract: Explainable artificial intelligence is an emerging and evolving concept. Its impact on construction, though yet to be realised, will be profound in the foreseeable future. Still, XAI has received limited attention in construction. As a result, no evaluation frameworks have been propagated to enable construction organisations to understand the what, why, how, and when of XAI. Our paper aims to fill… ▽ More Explainable artificial intelligence is an emerging and evolving concept. Its impact on construction, though yet to be realised, will be profound in the foreseeable future. Still, XAI has received limited attention in construction. As a result, no evaluation frameworks have been propagated to enable construction organisations to understand the what, why, how, and when of XAI. Our paper aims to fill this void by developing a content, context, process, and outcome evaluation framework that can be used to justify the adoption and effective management of XAI. After introducing and describing this novel framework, we discuss its implications for future research. While our novel framework is conceptual, it provides a frame of reference for construction organisations to make headway toward realising XAI business value and benefits. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: 43 pages, 5 figures

ACM Class: H.0; H.4; J.0

arXiv:2211.04507 [pdf, other]

Differentiable Quantum Programming with Unbounded Loops

Authors: Wang Fang, Mingsheng Ying, Xiaodi Wu

Abstract: The emergence of variational quantum applications has led to the development of automatic differentiation techniques in quantum computing. Recently, Zhu et al. (PLDI 2020) have formulated differentiable quantum programming with bounded loops, providing a framework for scalable gradient calculation by quantum means for training quantum variational applications. However, promising parameterized quan… ▽ More The emergence of variational quantum applications has led to the development of automatic differentiation techniques in quantum computing. Recently, Zhu et al. (PLDI 2020) have formulated differentiable quantum programming with bounded loops, providing a framework for scalable gradient calculation by quantum means for training quantum variational applications. However, promising parameterized quantum applications, e.g., quantum walk and unitary implementation, cannot be trained in the existing framework due to the natural involvement of unbounded loops. To fill in the gap, we provide the first differentiable quantum programming framework with unbounded loops, including a newly designed differentiation rule, code transformation, and their correctness proof. Technically, we introduce a randomized estimator for derivatives to deal with the infinite sum in the differentiation of unbounded loops, whose applicability in classical and probabilistic programming is also discussed. We implement our framework with Python and Q#, and demonstrate a reasonable sample efficiency. Through extensive case studies, we showcase an exciting application of our framework in automatically identifying close-to-optimal parameters for several parameterized quantum applications. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: Codes are available at https://github.com/njuwfang/DifferentiableQPL

arXiv:2211.02895 [pdf, other]

Simple Primitives with Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-shot Learning

Authors: Zhe Liu, Yun Li, Lina Yao, Xiaojun Chang, Wei Fang, Xiaojun Wu, Yi Yang

Abstract: The task of Compositional Zero-Shot Learning (CZSL) is to recognize images of novel state-object compositions that are absent during the training stage. Previous methods of learning compositional embedding have shown effectiveness in closed-world CZSL. However, in Open-World CZSL (OW-CZSL), their performance tends to degrade significantly due to the large cardinality of possible compositions. Some… ▽ More The task of Compositional Zero-Shot Learning (CZSL) is to recognize images of novel state-object compositions that are absent during the training stage. Previous methods of learning compositional embedding have shown effectiveness in closed-world CZSL. However, in Open-World CZSL (OW-CZSL), their performance tends to degrade significantly due to the large cardinality of possible compositions. Some recent works separately predict simple primitives (i.e., states and objects) to reduce cardinality. However, they consider simple primitives as independent probability distributions, ignoring the heavy dependence between states, objects, and compositions. In this paper, we model the dependence of compositions via feasibility and contextuality. Feasibility-dependence refers to the unequal feasibility relations between simple primitives, e.g., \textit{hairy} is more feasible with \textit{cat} than with \textit{building} in the real world. Contextuality-dependence represents the contextual variance in images, e.g., \textit{cat} shows diverse appearances under the state of \textit{dry} and \textit{wet}. We design Semantic Attention (SA) and generative Knowledge Disentanglement (KD) to learn the dependence of feasibility and contextuality, respectively. SA captures semantics in compositions to alleviate impossible predictions, driven by the visual similarity between simple primitives. KD disentangles images into unbiased feature representations, easing contextual bias in predictions. Moreover, we complement the current compositional probability model with feasibility and contextuality in a compatible format. Finally, we conduct comprehensive experiments to analyze and validate the superior or competitive performance of our model, Semantic Attention and knowledge Disentanglement guided Simple Primitives (SAD-SP), on three widely-used benchmark OW-CZSL datasets. △ Less

Submitted 5 November, 2022; originally announced November 2022.

arXiv:2208.08662 [pdf, other]

Private, Efficient, and Accurate: Protecting Models Trained by Multi-party Learning with Differential Privacy

Authors: Wenqiang Ruan, Mingxin Xu, Wenjing Fang, Li Wang, Lei Wang, Weili Han

Abstract: Secure multi-party computation-based machine learning, referred to as MPL, has become an important technology to utilize data from multiple parties with privacy preservation. While MPL provides rigorous security guarantees for the computation process, the models trained by MPL are still vulnerable to attacks that solely depend on access to the models. Differential privacy could help to defend agai… ▽ More Secure multi-party computation-based machine learning, referred to as MPL, has become an important technology to utilize data from multiple parties with privacy preservation. While MPL provides rigorous security guarantees for the computation process, the models trained by MPL are still vulnerable to attacks that solely depend on access to the models. Differential privacy could help to defend against such attacks. However, the accuracy loss brought by differential privacy and the huge communication overhead of secure multi-party computation protocols make it highly challenging to balance the 3-way trade-off between privacy, efficiency, and accuracy. In this paper, we are motivated to resolve the above issue by proposing a solution, referred to as PEA (Private, Efficient, Accurate), which consists of a secure DPSGD protocol and two optimization methods. First, we propose a secure DPSGD protocol to enforce DPSGD in secret sharing-based MPL frameworks. Second, to reduce the accuracy loss led by differential privacy noise and the huge communication overhead of MPL, we propose two optimization methods for the training process of MPL: (1) the data-independent feature extraction method, which aims to simplify the trained model structure; (2) the local data-based global model initialization method, which aims to speed up the convergence of the model training. We implement PEA in two open-source MPL frameworks: TF-Encrypted and Queqiao. The experimental results on various datasets demonstrate the efficiency and effectiveness of PEA. E.g. when $ε$ = 2, we can train a differentially private classification model with an accuracy of 88% for CIFAR-10 within 7 minutes under the LAN setting. This result significantly outperforms the one from CryptGPU, one SOTA MPL framework: it costs more than 16 hours to train a non-private deep neural network model on CIFAR-10 with the same accuracy. △ Less

Submitted 18 August, 2022; originally announced August 2022.

Comments: This paper has been accepted for publication at IEEE S&P 2023. Please cite this paper as "Wenqiang Ruan, Mingxin Xu, Wenjing Fang, Li Wang, Lei Wang, Weili Han. Private, Efficient, and Accurate: Protecting Models Trained by Multi-party Learning with Differential Privacy. In Proceedings of The 44th IEEE Symposium on Security and Privacy, San Francisco, May 22-26, 2023."

arXiv:2207.12906 [pdf, ps, other]

Searching on the boundary of abundance for odd weird numbers

Authors: Wenjie Fang

Abstract: Weird numbers are abundant numbers that are not pseudoperfect. Since their introduction, the existence of odd weird numbers has been an open problem. In this work, we describe our computational effort to search for odd weird numbers, which shows their non-existence up to $10^{21}$. We also searched up to $10^{28}$ for numbers with an abundance below $10^{14}$, to no avail. Our approach to speed up… ▽ More Weird numbers are abundant numbers that are not pseudoperfect. Since their introduction, the existence of odd weird numbers has been an open problem. In this work, we describe our computational effort to search for odd weird numbers, which shows their non-existence up to $10^{21}$. We also searched up to $10^{28}$ for numbers with an abundance below $10^{14}$, to no avail. Our approach to speed up the search can be viewed as an application of reverse search in the domain of combinatorial optimization, and may be useful for other similar quest for natural numbers with special properties that depend crucially on their factorization. △ Less

Submitted 26 July, 2022; originally announced July 2022.

Comments: 8 pages

Showing 1–50 of 112 results for author: Fang, W