subscribe to arXiv mailings

GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation

Authors: Baiqi Li, Zhiqiu Lin, Deepak Pathak, Jiayao Li, Yixin Fei, Kewen Wu, Tiffany Ling, Xide Xia, Pengchuan Zhang, Graham Neubig, Deva Ramanan

Abstract: While text-to-visual models now produce photo-realistic images and videos, they struggle with compositional text prompts involving attributes, relationships, and higher-order reasoning such as logic and comparison. In this work, we conduct an extensive human study on GenAI-Bench to evaluate the performance of leading image and video generation models in various aspects of compositional text-to-vis… ▽ More While text-to-visual models now produce photo-realistic images and videos, they struggle with compositional text prompts involving attributes, relationships, and higher-order reasoning such as logic and comparison. In this work, we conduct an extensive human study on GenAI-Bench to evaluate the performance of leading image and video generation models in various aspects of compositional text-to-visual generation. We also compare automated evaluation metrics against our collected human ratings and find that VQAScore -- a metric measuring the likelihood that a VQA model views an image as accurately depicting the prompt -- significantly outperforms previous metrics such as CLIPScore. In addition, VQAScore can improve generation in a black-box manner (without finetuning) via simply ranking a few (3 to 9) candidate images. Ranking by VQAScore is 2x to 3x more effective than other scoring methods like PickScore, HPSv2, and ImageReward at improving human alignment ratings for DALL-E 3 and Stable Diffusion, especially on compositional prompts that require advanced visio-linguistic reasoning. We will release a new GenAI-Rank benchmark with over 40,000 human ratings to evaluate scoring metrics on ranking images generated from the same prompt. Lastly, we discuss promising areas for improvement in VQAScore, such as addressing fine-grained visual details. We will release all human ratings (over 80,000) to facilitate scientific benchmarking of both generative models and automated metrics. △ Less

Submitted 21 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

Comments: We open-source our dataset, model, and code at: https://linzhiqiu.github.io/papers/genai_bench ; Project page: https://linzhiqiu.github.io/papers/genai_bench ; GenAI-Bench was first introduced in arxiv:2404.01291. This article extends it with an additional GenAI-Rank benchmark.

arXiv:2405.20681 [pdf, other]

No Free Lunch Theorem for Privacy-Preserving LLM Inference

Authors: Xiaojin Zhang, Yulin Fei, Yan Kang, Wei Chen, Lixin Fan, Hai Jin, Qiang Yang

Abstract: Individuals and businesses have been significantly benefited by Large Language Models (LLMs) including PaLM, Gemini and ChatGPT in various ways. For example, LLMs enhance productivity, reduce costs, and enable us to focus on more valuable tasks. Furthermore, LLMs possess the capacity to sift through extensive datasets, uncover underlying patterns, and furnish critical insights that propel the fron… ▽ More Individuals and businesses have been significantly benefited by Large Language Models (LLMs) including PaLM, Gemini and ChatGPT in various ways. For example, LLMs enhance productivity, reduce costs, and enable us to focus on more valuable tasks. Furthermore, LLMs possess the capacity to sift through extensive datasets, uncover underlying patterns, and furnish critical insights that propel the frontiers of technology and science. However, LLMs also pose privacy concerns. Users' interactions with LLMs may expose their sensitive personal or company information. A lack of robust privacy safeguards and legal frameworks could permit the unwarranted intrusion or improper handling of individual data, thereby risking infringements of privacy and the theft of personal identities. To ensure privacy, it is essential to minimize the dependency between shared prompts and private information. Various randomization approaches have been proposed to protect prompts' privacy, but they may incur utility loss compared to unprotected LLMs prompting. Therefore, it is essential to evaluate the balance between the risk of privacy leakage and loss of utility when conducting effective protection mechanisms. The current study develops a framework for inferring privacy-protected Large Language Models (LLMs) and lays down a solid theoretical basis for examining the interplay between privacy preservation and utility. The core insight is encapsulated within a theorem that is called as the NFL (abbreviation of the word No-Free-Lunch) Theorem. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.13930 [pdf, other]

AlabOS: A Python-based Reconfigurable Workflow Management Framework for Autonomous Laboratories

Authors: Yuxing Fei, Bernardus Rendy, Rishi Kumar, Olympia Dartsi, Hrushikesh P. Sahasrabuddhe, Matthew J. McDermott, Zheren Wang, Nathan J. Szymanski, Lauren N. Walters, David Milsted, Yan Zeng, Anubhav Jain, Gerbrand Ceder

Abstract: The recent advent of autonomous laboratories, coupled with algorithms for high-throughput screening and active learning, promises to accelerate materials discovery and innovation. As these autonomous systems grow in complexity, the demand for robust and efficient workflow management software becomes increasingly critical. In this paper, we introduce AlabOS, a general-purpose software framework for… ▽ More The recent advent of autonomous laboratories, coupled with algorithms for high-throughput screening and active learning, promises to accelerate materials discovery and innovation. As these autonomous systems grow in complexity, the demand for robust and efficient workflow management software becomes increasingly critical. In this paper, we introduce AlabOS, a general-purpose software framework for orchestrating experiments and managing resources, with an emphasis on automated laboratories for materials synthesis and characterization. We demonstrate the implementation of AlabOS in a prototype autonomous materials laboratory. AlabOS features a reconfigurable experiment workflow model, enabling the simultaneous execution of varied workflows composed of modular tasks. Therefore, AlabOS is well-suited to handle the rapidly changing experimental protocols defining the progress of self-driving laboratory development for materials research. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 30 pages, 5 figures

arXiv:2405.02724 [pdf, ps, other]

Taming Equilibrium Bias in Risk-Sensitive Multi-Agent Reinforcement Learning

Authors: Yingjie Fei, Ruitu Xu

Abstract: We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games, where agents optimize the entropic risk measure of rewards with possibly diverse risk preferences. We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias that favor the most risk-sensitive agents and overlook the other agents… ▽ More We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games, where agents optimize the entropic risk measure of rewards with possibly diverse risk preferences. We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias that favor the most risk-sensitive agents and overlook the other agents. To address such deficiency of the naive regret, we propose a novel notion of regret, which we call risk-balanced regret, and show through a lower bound that it overcomes the issue of equilibrium bias. Furthermore, we develop a self-play algorithm for learning Nash, correlated, and coarse correlated equilibria in risk-sensitive Markov games. We prove that the proposed algorithm attains near-optimal regret guarantees with respect to the risk-balanced regret. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: 29 pages

arXiv:2404.19295 [pdf, ps, other]

doi 10.1007/s11467-022-1192-z

Collisional dynamics of symmetric two-dimensional quantum droplets

Authors: Yanming Hu, Yifan Fei, Xiao-Long Chen, Yunbo Zhang

Abstract: The collisional dynamics of two symmetric droplets with equal intraspecies scattering lengths and particle number density for each component is studied by solving the corresponding extended Gross-Pitaevskii equation in two dimensions by including a logarithmic correction term in the usual contact interaction. We find the merging droplet after collision experiences a quadrupole oscillation in its s… ▽ More The collisional dynamics of two symmetric droplets with equal intraspecies scattering lengths and particle number density for each component is studied by solving the corresponding extended Gross-Pitaevskii equation in two dimensions by including a logarithmic correction term in the usual contact interaction. We find the merging droplet after collision experiences a quadrupole oscillation in its shape and the oscillation period is found to be independent of the incidental momentum for small droplets. With increasing collision momentum the colliding droplets may separate into two, or even more, and finally into small pieces of droplets. For these dynamical phases, we manage to present boundaries determined by the remnant particle number in the central area and the damped oscillation of the quadrupole mode. A stability peak for the existence of droplets emerges at the critical particle number $N_c \simeq 48$ for the quasi-Gaussian and flat-top shapes of the droplets. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures

Journal ref: Front. Phys. 17, 61505 (2022)

arXiv:2404.11103 [pdf, ps, other]

Distribution-Free Testing of Decision Lists with a Sublinear Number of Queries

Authors: Xi Chen, Yumou Fei, Shyamal Patel

Abstract: We give a distribution-free testing algorithm for decision lists with $\tilde{O}(n^{11/12}/\varepsilon^3)$ queries. This is the first sublinear algorithm for this problem, which shows that, unlike halfspaces, testing is strictly easier than learning for decision lists. Complementing the algorithm, we show that any distribution-free tester for decision lists must make $\tildeΩ(\sqrt{n})$ queries, o… ▽ More We give a distribution-free testing algorithm for decision lists with $\tilde{O}(n^{11/12}/\varepsilon^3)$ queries. This is the first sublinear algorithm for this problem, which shows that, unlike halfspaces, testing is strictly easier than learning for decision lists. Complementing the algorithm, we show that any distribution-free tester for decision lists must make $\tildeΩ(\sqrt{n})$ queries, or draw $\tildeΩ(n)$ samples when the algorithm is sample-based. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: To appear in STOC 2024

arXiv:2404.10253 [pdf, other]

Kilometer-Level Coupled Modeling Using 40 Million Cores: An Eight-Year Journey of Model Development

Authors: Xiaohui Duan, Yuxuan Li, Zhao Liu, Bin Yang, Juepeng Zheng, Haohuan Fu, Shaoqing Zhang, Shiming Xu, Yang Gao, Wei Xue, Di Wei, Xiaojing Lv, Lifeng Yan, Haopeng Huang, Haitian Lu, Lingfeng Wan, Haoran Lin, Qixin Chang, Chenlin Li, Quanjie He, Zeyu Song, Xuantong Wang, Yangyang Yu, Xilong Fan, Zhaopeng Qu , et al. (16 additional authors not shown)

Abstract: With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries t… ▽ More With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries to minimizes manual code modifications, our project tries to achieve both improvement of performance and consistency of the model code. By using a hierarchical grid system and an OpenMP-based offloading toolkit, our porting and parallelization effort covers over 80% of the code, and achieves a simulation speed of 340 SDPD (simulated days per day) for 5-km atmosphere, 265 SDPD for 3-km ocean, and 222 SDPD for a coupled model, thus making multi-year or even multi-decadal experiments at such high resolution possible. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 18 pages, 13 figures

arXiv:2404.01563 [pdf]

Two-Phase Multi-Dose-Level PET Image Reconstruction with Dose Level Awareness

Authors: Yuchen Fei, Yanmei Luo, Yan Wang, Jiaqi Cui, Yuanyuan Xu, Jiliu Zhou, Dinggang Shen

Abstract: To obtain high-quality positron emission tomography (PET) while minimizing radiation exposure, a range of methods have been designed to reconstruct standard-dose PET (SPET) from corresponding low-dose PET (LPET) images. However, most current methods merely learn the mapping between single-dose-level LPET and SPET images, but omit the dose disparity of LPET images in clinical scenarios. In this pap… ▽ More To obtain high-quality positron emission tomography (PET) while minimizing radiation exposure, a range of methods have been designed to reconstruct standard-dose PET (SPET) from corresponding low-dose PET (LPET) images. However, most current methods merely learn the mapping between single-dose-level LPET and SPET images, but omit the dose disparity of LPET images in clinical scenarios. In this paper, to reconstruct high-quality SPET images from multi-dose-level LPET images, we design a novel two-phase multi-dose-level PET reconstruction algorithm with dose level awareness, containing a pre-training phase and a SPET prediction phase. Specifically, the pre-training phase is devised to explore both fine-grained discriminative features and effective semantic representation. The SPET prediction phase adopts a coarse prediction network utilizing pre-learned dose level prior to generate preliminary result, and a refinement network to precisely preserve the details. Experiments on MICCAI 2022 Ultra-low Dose PET Imaging Challenge Dataset have demonstrated the superiority of our method. △ Less

Submitted 10 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: Accepted by ISBI2024

arXiv:2403.16591 [pdf, other]

Deciphering the Interplay between Local Differential Privacy, Average Bayesian Privacy, and Maximum Bayesian Privacy

Authors: Xiaojin Zhang, Yulin Fei, Wei Chen

Abstract: The swift evolution of machine learning has led to emergence of various definitions of privacy due to the threats it poses to privacy, including the concept of local differential privacy (LDP). Although widely embraced and utilized across numerous domains, this conventional approach to measure privacy still exhibits certain limitations, spanning from failure to prevent inferential disclosure to la… ▽ More The swift evolution of machine learning has led to emergence of various definitions of privacy due to the threats it poses to privacy, including the concept of local differential privacy (LDP). Although widely embraced and utilized across numerous domains, this conventional approach to measure privacy still exhibits certain limitations, spanning from failure to prevent inferential disclosure to lack of consideration for the adversary's background knowledge. In this comprehensive study, we introduce Bayesian privacy and delve into the intricate relationship between LDP and its Bayesian counterparts, unveiling novel insights into utility-privacy trade-offs. We introduce a framework that encapsulates both attack and defense strategies, highlighting their interplay and effectiveness. The relationship between LDP and Maximum Bayesian Privacy (MBP) is first revealed, demonstrating that under uniform prior distribution, a mechanism satisfying $ξ$-LDP will satisfy $ξ$-MBP and conversely $ξ$-MBP also confers 2$ξ$-LDP. Our next theoretical contribution are anchored in the rigorous definitions and relationships between Average Bayesian Privacy (ABP) and Maximum Bayesian Privacy (MBP), encapsulated by equations $ε_{p,a} \leq \frac{1}{\sqrt{2}}\sqrt{(ε_{p,m} + ε)\cdot(e^{ε_{p,m} + ε} - 1)}$. These relationships fortify our understanding of the privacy guarantees provided by various mechanisms. Our work not only lays the groundwork for future empirical exploration but also promises to facilitate the design of privacy-preserving algorithms, thereby fostering the development of trustworthy machine learning solutions. △ Less

Submitted 2 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2402.19007 [pdf, other]

DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

Authors: Ji Ma, Hongming Dai, Yao Mu, Pengying Wu, Hao Wang, Xiaowei Chi, Yang Fei, Shanghang Zhang, Chang Liu

Abstract: Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments and has emerged as a particularly challenging task within the domain of Embodied AI. Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object attribute diversity, and scene texts, thus exhibiting noticeable discrepancies from real-… ▽ More Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments and has emerged as a particularly challenging task within the domain of Embodied AI. Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object attribute diversity, and scene texts, thus exhibiting noticeable discrepancies from real-world situations. To address these issues, we propose a Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE) that comprises ten high-fidelity 3D scenes with over 18k tasks, aiming to mimic complex, dynamic real-world scenarios. Specifically, DOZE scenes feature multiple moving humanoid obstacles, a wide array of open-vocabulary objects, diverse distinct-attribute objects, and valuable textual hints. Besides, different from existing datasets that only provide collision checking between the agent and static obstacles, we enhance DOZE by integrating capabilities for detecting collisions between the agent and moving obstacles. This novel functionality enables the evaluation of the agents' collision avoidance abilities in dynamic environments. We test four representative ZSON methods on DOZE, revealing substantial room for improvement in existing approaches concerning navigation efficiency, safety, and object recognition accuracy. Our dataset can be found at https://DOZE-Dataset.github.io/. △ Less

Submitted 8 July, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: This version of the paper has been accepted for publication in IEEE Robotics and Automation Letters (RA-L)

arXiv:2402.18879 [pdf]

Dose Prediction Driven Radiotherapy Paramters Regression via Intra- and Inter-Relation Modeling

Authors: Jiaqi Cui, Yuanyuan Xu, Jianghong Xiao, Yuchen Fei, Jiliu Zhou, Xingcheng Peng, Yan Wang

Abstract: Deep learning has facilitated the automation of radiotherapy by predicting accurate dose distribution maps. However, existing methods fail to derive the desirable radiotherapy parameters that can be directly input into the treatment planning system (TPS), impeding the full automation of radiotherapy. To enable more thorough automatic radiotherapy, in this paper, we propose a novel two-stage framew… ▽ More Deep learning has facilitated the automation of radiotherapy by predicting accurate dose distribution maps. However, existing methods fail to derive the desirable radiotherapy parameters that can be directly input into the treatment planning system (TPS), impeding the full automation of radiotherapy. To enable more thorough automatic radiotherapy, in this paper, we propose a novel two-stage framework to directly regress the radiotherapy parameters, including a dose map prediction stage and a radiotherapy parameters regression stage. In stage one, we combine transformer and convolutional neural network (CNN) to predict realistic dose maps with rich global and local information, providing accurate dosimetric knowledge for the subsequent parameters regression. In stage two, two elaborate modules, i.e., an intra-relation modeling (Intra-RM) module and an inter-relation modeling (Inter-RM) module, are designed to exploit the organ-specific and organ-shared features for precise parameters regression. Experimental results on a rectal cancer dataset demonstrate the effectiveness of our method. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: Accepted by ISBI 2024

arXiv:2402.18679 [pdf, other]

Data Interpreter: An LLM Agent For Data Science

Authors: Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Wenyi Wang, Xiangru Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Zongze Xu, Chenglin Wu

Abstract: Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution de… ▽ More Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution designed to solve with code that emphasizes three pivotal techniques to augment problem-solving in data science: 1) dynamic planning with hierarchical graph structures for real-time data adaptability;2) tool integration dynamically to enhance code proficiency during execution, enriching the requisite expertise;3) logical inconsistency identification in feedback, and efficiency enhancement through experience recording. We evaluate the Data Interpreter on various data science and real-world tasks. Compared to open-source baselines, it demonstrated superior performance, exhibiting significant improvements in machine learning tasks, increasing from 0.86 to 0.95. Additionally, it showed a 26% increase in the MATH dataset and a remarkable 112% improvement in open-ended tasks. The solution will be released at https://github.com/geekan/MetaGPT. △ Less

Submitted 12 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.15811 [pdf, ps, other]

doi 10.1103/PhysRevA.109.053309

Collective excitations in two-dimensional harmonically trapped quantum droplets

Authors: Yifan Fei, Xucong Du, Xiao-Long Chen, Yunbo Zhang

Abstract: The collective excitation modes in quantum droplets trapped in a two-dimensional harmonic potential in the context of symmetric weakly interacting binary bosonic mixtures are studied. By utilizing the linearization technique, the time-dependent extended Gross-Pitaevskii equation, and a sum-rule approach with a variational approximation, the ground state properties and collective excitations of suc… ▽ More The collective excitation modes in quantum droplets trapped in a two-dimensional harmonic potential in the context of symmetric weakly interacting binary bosonic mixtures are studied. By utilizing the linearization technique, the time-dependent extended Gross-Pitaevskii equation, and a sum-rule approach with a variational approximation, the ground state properties and collective excitations of such a two-dimensional quantum system are investigated for various system parameters. We present comprehensive analysis and calculations on the effect of the confinement strength and anisotropy of the trapping potential, the number of atoms in the droplet, and the collective excitation modes. The radius of the droplet, as well as the chemical potential, is non-monotonically related to the number of atoms in the droplet, and the confinement tends to shift the minimum values towards the ideal gas limit. The excitation frequency peaks, which are prominent in a self-bounded droplet, become less pronounced and smoother when subjected to a strong trapping potential. The sum-rule approach fails to reproduce the breathing mode frequency for a moderate number of atoms in a weak trapping potential, however, works perfectly well in a strong confinement. It was found that the anisotropy in the trap eliminates the degeneracy between the quadrupole and scissors modes that occurs in an isotropic trap, causing the frequencies of these two modes to immediately diverge from each other for any degree of anisotropy. These findings provide valuable insights into the unique characteristics and behavior of quantum droplets, offering potential implications for future research and applications in the dynamic behaviors of intriguing quantum droplets. △ Less

Submitted 24 February, 2024; originally announced February 2024.

Comments: 12 pages, 6 figures

Journal ref: Phys. Rev. A 109, 053309 (2024) - Published 13 May 2024

arXiv:2402.07866 [pdf, other]

Virtual Channel Purification

Authors: Zhenhuan Liu, Xingjian Zhang, Yue-Yang Fei, Zhenyu Cai

Abstract: Quantum error mitigation is a key approach for extracting target state properties on state-of-the-art noisy machines and early fault-tolerant devices. Using the ideas from flag fault tolerance and virtual state purification, we develop the virtual channel purification (VCP) protocol, which consumes similar qubit and gate resources as virtual state purification but offers up to exponentially strong… ▽ More Quantum error mitigation is a key approach for extracting target state properties on state-of-the-art noisy machines and early fault-tolerant devices. Using the ideas from flag fault tolerance and virtual state purification, we develop the virtual channel purification (VCP) protocol, which consumes similar qubit and gate resources as virtual state purification but offers up to exponentially stronger error suppression with increased system size and more noisy operation copies. Furthermore, VCP removes most of the assumptions required in virtual state purification. Essentially, VCP is the first quantum error mitigation protocol that does not require specific knowledge about the noise models, the target quantum state, and the target problem while still offering rigorous performance guarantees for practical noise regimes. Further connections are made between VCP and quantum error correction to produce one of the first protocols that combine quantum error correction and quantum error mitigation beyond concatenation. We can remove all noise in the channel while paying only the same sampling cost as low-order purification, reaching beyond the standard bias-variance trade-off in quantum error mitigation. Our protocol can also be adapted to key tasks in quantum networks like channel capacity activation and entanglement distribution. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2311.05340 [pdf, other]

Quotients of Special Classes of Positroids

Authors: Zhixing Chen, Yumou Fei, Jiyang Gao, Yuxuan Sun, Yuchong Zhang

Abstract: In this paper, we give a complete characterization of rank $k-1$ positroids that are quotients of the uniform matroid $U_{k,n}$, completing a partial result by Bendetti-Chavez-Jiménez. Furthermore, we show that any pair of concordant positroids with adjacent ranks are related by a cyclic shift on their decorated permutations. We also use the concept of conecklaces to give a full characterization o… ▽ More In this paper, we give a complete characterization of rank $k-1$ positroids that are quotients of the uniform matroid $U_{k,n}$, completing a partial result by Bendetti-Chavez-Jiménez. Furthermore, we show that any pair of concordant positroids with adjacent ranks are related by a cyclic shift on their decorated permutations. We also use the concept of conecklaces to give a full characterization of concordant lattice path matroids (LPMs). △ Less

Submitted 7 January, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: 32 pages, 10 figures; This research was carried out as part of the PACE program in the summer of 2023 at Peking University, Beijing; Comments very welcome

MSC Class: 05B35

arXiv:2310.19651 [pdf, other]

Dynamics of Instruction Tuning: Each Ability of Large Language Models Has Its Own Growth Pace

Authors: Chiyu Song, Zhanchao Zhou, Jianhao Yan, Yuejiao Fei, Zhenzhong Lan, Yue Zhang

Abstract: Instruction tuning is a burgeoning method to elicit the general intelligence of Large Language Models (LLMs). However, the creation of instruction data is still largely heuristic, leading to significant variation in quantity and quality across existing datasets. While some research advocates for expanding the number of instructions, others suggest that a small set of well-chosen examples is adequa… ▽ More Instruction tuning is a burgeoning method to elicit the general intelligence of Large Language Models (LLMs). However, the creation of instruction data is still largely heuristic, leading to significant variation in quantity and quality across existing datasets. While some research advocates for expanding the number of instructions, others suggest that a small set of well-chosen examples is adequate. To better understand data construction guidelines, our research provides a granular analysis of how data volume, parameter size, and data construction methods influence the development of each underlying ability of LLM, such as creative writing, code generation, and logical reasoning. We present a meticulously curated dataset with over 40k instances across ten abilities and examine instruction-tuned models with 7b to 33b parameters. Our study reveals three primary findings: (i) Despite the models' overall performance being tied to data and parameter scale, individual abilities have different sensitivities to these factors. (ii) Human-curated data strongly outperforms synthetic data from GPT-4 in efficiency and can constantly enhance model performance with volume increases, but is unachievable with synthetic data. (iii) Instruction data brings powerful cross-ability generalization, as evidenced by out-of-domain evaluations. Furthermore, we demonstrate how these findings can guide more efficient data constructions, leading to practical performance improvements on two public benchmarks. △ Less

Submitted 22 February, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.17976 [pdf, other]

InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews

Authors: Xintao Wang, Yunze Xiao, Jen-tse Huang, Siyu Yuan, Rui Xu, Haoran Guo, Quan Tu, Yaying Fei, Ziang Leng, Wei Wang, Jiangjie Chen, Cheng Li, Yanghua Xiao

Abstract: Role-playing agents (RPAs), powered by large language models, have emerged as a flourishing field of applications. However, a key challenge lies in assessing whether RPAs accurately reproduce the personas of target characters, namely their character fidelity. Existing methods mainly focus on the knowledge and linguistic patterns of characters. This paper, instead, introduces a novel perspective to… ▽ More Role-playing agents (RPAs), powered by large language models, have emerged as a flourishing field of applications. However, a key challenge lies in assessing whether RPAs accurately reproduce the personas of target characters, namely their character fidelity. Existing methods mainly focus on the knowledge and linguistic patterns of characters. This paper, instead, introduces a novel perspective to evaluate the personality fidelity of RPAs with psychological scales. Overcoming drawbacks of previous self-report assessments on RPAs, we propose InCharacter, namely Interviewing Character agents for personality tests. Experiments include various types of RPAs and LLMs, covering 32 distinct characters on 14 widely used psychological scales. The results validate the effectiveness of InCharacter in measuring RPA personalities. Then, with InCharacter, we show that state-of-the-art RPAs exhibit personalities highly aligned with the human-perceived personalities of the characters, achieving an accuracy up to 80.7%. △ Less

Submitted 7 June, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

Comments: ACL 2024

arXiv:2310.14491 [pdf, other]

Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models

Authors: Yifan Hou, Jiaoda Li, Yu Fei, Alessandro Stolfo, Wangchunshu Zhou, Guangtao Zeng, Antoine Bosselut, Mrinmaya Sachan

Abstract: Recent work has shown that language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities. However, it is unclear whether LMs perform these tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism. In this paper, we try to answer this question by exploring a mechanistic interpretation of LMs for multi-step reasoning tasks. C… ▽ More Recent work has shown that language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities. However, it is unclear whether LMs perform these tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism. In this paper, we try to answer this question by exploring a mechanistic interpretation of LMs for multi-step reasoning tasks. Concretely, we hypothesize that the LM implicitly embeds a reasoning tree resembling the correct reasoning process within it. We test this hypothesis by introducing a new probing approach (called MechanisticProbe) that recovers the reasoning tree from the model's attention patterns. We use our probe to analyze two LMs: GPT-2 on a synthetic task (k-th smallest element), and LLaMA on two simple language-based reasoning tasks (ProofWriter & AI2 Reasoning Challenge). We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples, suggesting that the LM indeed is going through a process of multi-step reasoning within its architecture in many cases. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: This work is published in EMNLP 2023

arXiv:2310.10441 [pdf, other]

Efficiently matching random inhomogeneous graphs via degree profiles

Authors: Jian Ding, Yumou Fei, Yuanzheng Wang

Abstract: In this paper, we study the problem of recovering the latent vertex correspondence between two correlated random graphs with vastly inhomogeneous and unknown edge probabilities between different pairs of vertices. Inspired by and extending the matching algorithm via degree profiles by Ding, Ma, Wu and Xu (2021), we obtain an efficient matching algorithm as long as the minimal average degree is at… ▽ More In this paper, we study the problem of recovering the latent vertex correspondence between two correlated random graphs with vastly inhomogeneous and unknown edge probabilities between different pairs of vertices. Inspired by and extending the matching algorithm via degree profiles by Ding, Ma, Wu and Xu (2021), we obtain an efficient matching algorithm as long as the minimal average degree is at least $Ω(\log^{2} n)$ and the minimal correlation is at least $1 - O(\log^{-2} n)$. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 44 pages, 3 figures

arXiv:2309.05245 [pdf, ps, other]

doi 10.1103/PhysRevA.108.033312

Ground-state Properties and Bogoliubov Modes of a Harmonically Trapped One-Dimensional Quantum Droplet

Authors: Xucong Du, Yifan Fei, Xiao-Long Chen, Yunbo Zhang

Abstract: We study the stationary and excitation properties of a one-dimensional quantum droplet in the two-component Bose mixture trapped in a harmonic potential. By constructing the energy functional for the inhomogeneous mixture, we elaborate the extended the Gross-Pitaevskii equation applicable to both symmetric and asymmetric mixtures into a universal form, and the equations in two different dimensionl… ▽ More We study the stationary and excitation properties of a one-dimensional quantum droplet in the two-component Bose mixture trapped in a harmonic potential. By constructing the energy functional for the inhomogeneous mixture, we elaborate the extended the Gross-Pitaevskii equation applicable to both symmetric and asymmetric mixtures into a universal form, and the equations in two different dimensionless schemes are in a duality relation, i.e. the unique parameters left are inverse of each other. The Bogoliubov equations for the trapped droplet are obtained by linearizing the small density fluctuation around the ground state and the low-lying excitation modes are calculated numerically.It is found that the confinement trap changes easily the flat-top structure for large droplets and alters the mean square radius and the chemical potential intensively. The breathing mode of the confined droplet connects the self-bound and ideal gas limits, with the excitation in the weakly interacting Bose condensate for large particle numbers lying in between. We explicitly show how the continuum spectrum of the excitation is split into discrete modes, and finally taken over by the harmonic trap. Two critical particle numbers are identified by the minimum size of the trapped droplet and the maximum breathing mode energy, both of which are found to decrease exponentially with the trapping parameter. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: 11 pages, 7 figures

Journal ref: Phys. Rev. A 108, 033312 (2023), Published 18 September 2023

arXiv:2309.04735 [pdf, other]

Two-State Spin Systems with Negative Interactions

Authors: Yumou Fei, Leslie Ann Goldberg, Pinyan Lu

Abstract: We study the approximability of computing the partition functions of two-state spin systems. The problem is parameterized by a $2\times 2$ symmetric matrix. Previous results on this problem were restricted either to the case where the matrix has non-negative entries, or to the case where the diagonal entries are equal, i.e. Ising models. In this paper, we study the generalization to arbitrary… ▽ More We study the approximability of computing the partition functions of two-state spin systems. The problem is parameterized by a $2\times 2$ symmetric matrix. Previous results on this problem were restricted either to the case where the matrix has non-negative entries, or to the case where the diagonal entries are equal, i.e. Ising models. In this paper, we study the generalization to arbitrary $2\times 2$ interaction matrices with real entries. We show that in some regions of the parameter space, it's \#P-hard to even determine the sign of the partition function, while in other regions there are fully polynomial approximation schemes for the partition function. Our results reveal several new computational phase transitions. △ Less

Submitted 21 November, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

arXiv:2309.04389 [pdf, other]

CSPRD: A Financial Policy Retrieval Dataset for Chinese Stock Market

Authors: Jinyuan Wang, Hai Zhao, Zhong Wang, Zeyang Zhu, Jinhao Xie, Yong Yu, Yongjian Fei, Yue Huang, Dawei Cheng

Abstract: In recent years, great advances in pre-trained language models (PLMs) have sparked considerable research focus and achieved promising performance on the approach of dense passage retrieval, which aims at retrieving relative passages from massive corpus with given questions. However, most of existing datasets mainly benchmark the models with factoid queries of general commonsense, while specialised… ▽ More In recent years, great advances in pre-trained language models (PLMs) have sparked considerable research focus and achieved promising performance on the approach of dense passage retrieval, which aims at retrieving relative passages from massive corpus with given questions. However, most of existing datasets mainly benchmark the models with factoid queries of general commonsense, while specialised fields such as finance and economics remain unexplored due to the deficiency of large-scale and high-quality datasets with expert annotations. In this work, we propose a new task, policy retrieval, by introducing the Chinese Stock Policy Retrieval Dataset (CSPRD), which provides 700+ prospectus passages labeled by experienced experts with relevant articles from 10k+ entries in our collected Chinese policy corpus. Experiments on lexical, embedding and fine-tuned bi-encoder models show the effectiveness of our proposed CSPRD yet also suggests ample potential for improvement. Our best performing baseline achieves 56.1% MRR@10, 28.5% NDCG@10, 37.5% Recall@10 and 80.6% Precision@10 on dev set. △ Less

Submitted 11 September, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

arXiv:2308.09597 [pdf, other]

ChatHaruhi: Reviving Anime Character in Reality via Large Language Model

Authors: Cheng Li, Ziang Leng, Chenxi Yan, Junyi Shen, Hao Wang, Weishi MI, Yaying Fei, Xiaoyang Feng, Song Yan, HaoSheng Wang, Linkang Zhan, Yaokai Jia, Pingyu Wu, Haozhen Sun

Abstract: Role-playing chatbots built on large language models have drawn interest, but better techniques are needed to enable mimicking specific fictional characters. We propose an algorithm that controls language models via an improved prompt and memories of the character extracted from scripts. We construct ChatHaruhi, a dataset covering 32 Chinese / English TV / anime characters with over 54k simulated… ▽ More Role-playing chatbots built on large language models have drawn interest, but better techniques are needed to enable mimicking specific fictional characters. We propose an algorithm that controls language models via an improved prompt and memories of the character extracted from scripts. We construct ChatHaruhi, a dataset covering 32 Chinese / English TV / anime characters with over 54k simulated dialogues. Both automatic and human evaluations show our approach improves role-playing ability over baselines. Code and data are available at https://github.com/LC1332/Chat-Haruhi-Suzumiya . △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: v1 - First version of techique report

arXiv:2308.04223 [pdf, other]

Real-Time Progressive Learning: Accumulate Knowledge from Control with Neural-Network-Based Selective Memory

Authors: Yiming Fei, Jiangang Li, Yanan Li

Abstract: Memory, as the basis of learning, determines the storage, update and forgetting of knowledge and further determines the efficiency of learning. Featured with the mechanism of memory, a radial basis function neural network based learning control scheme named real-time progressive learning (RTPL) is proposed to learn the unknown dynamics of the system with guaranteed stability and closed-loop perfor… ▽ More Memory, as the basis of learning, determines the storage, update and forgetting of knowledge and further determines the efficiency of learning. Featured with the mechanism of memory, a radial basis function neural network based learning control scheme named real-time progressive learning (RTPL) is proposed to learn the unknown dynamics of the system with guaranteed stability and closed-loop performance. Instead of the Lyapunov-based weight update law of conventional neural network learning control (NNLC), which mainly concentrates on stability and control performance, RTPL employs the selective memory recursive least squares (SMRLS) algorithm to update the weights of the neural network and achieves the following merits: 1) improved learning speed without filtering, 2) robustness to hyperparameter setting of neural networks, 3) good generalization ability, i.e., reuse of learned knowledge in different tasks, and 4) guaranteed learning performance under parameter perturbation. Moreover, RTPL realizes continuous accumulation of knowledge as a result of its reasonably allocated memory while NNLC may gradually forget knowledge that it has learned. Corresponding theoretical analysis and simulation studies demonstrate the effectiveness of RTPL. △ Less

Submitted 24 November, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

Comments: 15 pages, 16 figures

MSC Class: 93-10

arXiv:2308.01469 [pdf, other]

VertexSerum: Poisoning Graph Neural Networks for Link Inference

Authors: Ruyi Ding, Shijin Duan, Xiaolin Xu, Yunsi Fei

Abstract: Graph neural networks (GNNs) have brought superb performance to various applications utilizing graph structural data, such as social analysis and fraud detection. The graph links, e.g., social relationships and transaction history, are sensitive and valuable information, which raises privacy concerns when using GNNs. To exploit these vulnerabilities, we propose VertexSerum, a novel graph poisoning… ▽ More Graph neural networks (GNNs) have brought superb performance to various applications utilizing graph structural data, such as social analysis and fraud detection. The graph links, e.g., social relationships and transaction history, are sensitive and valuable information, which raises privacy concerns when using GNNs. To exploit these vulnerabilities, we propose VertexSerum, a novel graph poisoning attack that increases the effectiveness of graph link stealing by amplifying the link connectivity leakage. To infer node adjacency more accurately, we propose an attention mechanism that can be embedded into the link detection network. Our experiments demonstrate that VertexSerum significantly outperforms the SOTA link inference attack, improving the AUC scores by an average of $9.8\%$ across four real-world datasets and three different GNN structures. Furthermore, our experiments reveal the effectiveness of VertexSerum in both black-box and online learning settings, further validating its applicability in real-world scenarios. △ Less

Submitted 2 August, 2023; originally announced August 2023.

arXiv:2306.08700 [pdf]

doi 10.1093/jcde/qwac098

Iterative self-transfer learning: A general methodology for response time-history prediction based on small dataset

Authors: Yongjia Xu, Xinzheng Lu, Yifan Fei, Yuli Huang

Abstract: There are numerous advantages of deep neural network surrogate modeling for response time-history prediction. However, due to the high cost of refined numerical simulations and actual experiments, the lack of data has become an unavoidable bottleneck in practical applications. An iterative self-transfer learningmethod for training neural networks based on small datasets is proposed in this study.… ▽ More There are numerous advantages of deep neural network surrogate modeling for response time-history prediction. However, due to the high cost of refined numerical simulations and actual experiments, the lack of data has become an unavoidable bottleneck in practical applications. An iterative self-transfer learningmethod for training neural networks based on small datasets is proposed in this study. A new mapping-based transfer learning network, named as deep adaptation network with three branches for regression (DAN-TR), is proposed. A general iterative network training strategy is developed by coupling DAN-TR and the pseudo-label strategy, and the establishment of corresponding datasets is also discussed. Finally, a complex component is selected as a case study. The results show that the proposed method can improve the model performance by near an order of magnitude on small datasets without the need of external labeled samples,well behaved pre-trainedmodels, additional artificial labeling, and complex physical/mathematical analysis. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: 14 pages, 8 figures; Published on Journal of Computational Design and Engineering, 9(5), 2089-2102

Journal ref: Journal of Computational Design and Engineering, 9(5), 2089-2102 (2022)

arXiv:2305.19148 [pdf, other]

Mitigating Label Biases for In-context Learning

Authors: Yu Fei, Yifan Hou, Zeming Chen, Antoine Bosselut

Abstract: Various design settings for in-context learning (ICL), such as the choice and order of the in-context examples, can bias a model toward a particular prediction without being reflective of an understanding of the task. While many studies discuss these design choices, there have been few systematic investigations into categorizing them and mitigating their impact. In this work, we define a typology… ▽ More Various design settings for in-context learning (ICL), such as the choice and order of the in-context examples, can bias a model toward a particular prediction without being reflective of an understanding of the task. While many studies discuss these design choices, there have been few systematic investigations into categorizing them and mitigating their impact. In this work, we define a typology for three types of label biases in ICL for text classification: vanilla-label bias, context-label bias, and domain-label bias (which we conceptualize and detect for the first time). Our analysis demonstrates that prior label bias calibration methods fall short of addressing all three types of biases. Specifically, domain-label bias restricts LLMs to random-level performance on many tasks regardless of the choice of in-context examples. To mitigate the effect of these biases, we propose a simple bias calibration method that estimates a language model's label bias using random in-domain words from the task corpus. After controlling for this estimated bias when making predictions, our novel domain-context calibration significantly improves the ICL performance of GPT-J and GPT-3 on a wide range of tasks. The gain is substantial on tasks with large domain-label bias (up to 37% in Macro-F1). Furthermore, our results generalize to models with different scales, pretraining methods, and manually-designed task instructions, showing the prevalence of label biases in ICL. △ Less

Submitted 4 August, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: Accepted to ACL 2023

arXiv:2305.16444 [pdf, other]

Don't Retrain, Just Rewrite: Countering Adversarial Perturbations by Rewriting Text

Authors: Ashim Gupta, Carter Wood Blum, Temma Choji, Yingjie Fei, Shalin Shah, Alakananda Vempala, Vivek Srikumar

Abstract: Can language models transform inputs to protect text classifiers against adversarial attacks? In this work, we present ATINTER, a model that intercepts and learns to rewrite adversarial inputs to make them non-adversarial for a downstream text classifier. Our experiments on four datasets and five attack mechanisms reveal that ATINTER is effective at providing better adversarial robustness than exi… ▽ More Can language models transform inputs to protect text classifiers against adversarial attacks? In this work, we present ATINTER, a model that intercepts and learns to rewrite adversarial inputs to make them non-adversarial for a downstream text classifier. Our experiments on four datasets and five attack mechanisms reveal that ATINTER is effective at providing better adversarial robustness than existing defense approaches, without compromising task accuracy. For example, on sentiment classification using the SST-2 dataset, our method improves the adversarial accuracy over the best existing defense approach by more than 4% with a smaller decrease in task accuracy (0.5% vs 2.5%). Moreover, we show that ATINTER generalizes across multiple downstream tasks and classifiers without having to explicitly retrain it for those settings. Specifically, we find that when ATINTER is trained to remove adversarial perturbations for the sentiment classification task on the SST-2 dataset, it even transfers to a semantically different task of news classification (on AGNews) and improves the adversarial robustness by more than 10%. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: Accepted to ACL 2023

arXiv:2305.15676 [pdf, other]

Enhancing Grammatical Error Correction Systems with Explanations

Authors: Yuejiao Fei, Leyang Cui, Sen Yang, Wai Lam, Zhenzhong Lan, Shuming Shi

Abstract: Grammatical error correction systems improve written communication by detecting and correcting language mistakes. To help language learners better understand why the GEC system makes a certain correction, the causes of errors (evidence words) and the corresponding error types are two key factors. To enhance GEC systems with explanations, we introduce EXPECT, a large dataset annotated with evidence… ▽ More Grammatical error correction systems improve written communication by detecting and correcting language mistakes. To help language learners better understand why the GEC system makes a certain correction, the causes of errors (evidence words) and the corresponding error types are two key factors. To enhance GEC systems with explanations, we introduce EXPECT, a large dataset annotated with evidence words and grammatical error types. We propose several baselines and analysis to understand this task. Furthermore, human evaluation verifies our explainable GEC system's explanations can assist second-language learners in determining whether to accept a correction suggestion and in understanding the associated grammar rule. △ Less

Submitted 10 June, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: 9 pages, 7 figures, accepted to the main conference of ACL 2023

arXiv:2303.15571 [pdf, other]

EMShepherd: Detecting Adversarial Samples via Side-channel Leakage

Authors: Ruyi Ding, Cheng Gongye, Siyue Wang, Aidong Ding, Yunsi Fei

Abstract: Deep Neural Networks (DNN) are vulnerable to adversarial perturbations-small changes crafted deliberately on the input to mislead the model for wrong predictions. Adversarial attacks have disastrous consequences for deep learning-empowered critical applications. Existing defense and detection techniques both require extensive knowledge of the model, testing inputs, and even execution details. They… ▽ More Deep Neural Networks (DNN) are vulnerable to adversarial perturbations-small changes crafted deliberately on the input to mislead the model for wrong predictions. Adversarial attacks have disastrous consequences for deep learning-empowered critical applications. Existing defense and detection techniques both require extensive knowledge of the model, testing inputs, and even execution details. They are not viable for general deep learning implementations where the model internal is unknown, a common 'black-box' scenario for model users. Inspired by the fact that electromagnetic (EM) emanations of a model inference are dependent on both operations and data and may contain footprints of different input classes, we propose a framework, EMShepherd, to capture EM traces of model execution, perform processing on traces and exploit them for adversarial detection. Only benign samples and their EM traces are used to train the adversarial detector: a set of EM classifiers and class-specific unsupervised anomaly detectors. When the victim model system is under attack by an adversarial example, the model execution will be different from executions for the known classes, and the EM trace will be different. We demonstrate that our air-gapped EMShepherd can effectively detect different adversarial attacks on a commonly used FPGA deep learning accelerator for both Fashion MNIST and CIFAR-10 datasets. It achieves a 100% detection rate on most types of adversarial samples, which is comparable to the state-of-the-art 'white-box' software-based detectors. △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2303.01098 [pdf, other]

doi 10.1007/s11433-023-2315-0

Determination of Molecular Energies via Quantum Imaginary Time Evolution in a Superconducting Qubit System

Authors: Zhiwen Zong, Sainan Huai, Tianqi Cai, Wenyan Jin, Ze Zhan, Zhenxing Zhang, Kunliang Bu, Liyang Sui, Ying Fei, Yicong Zheng, Shengyu Zhang, Jianlan Wu, Yi Yin

Abstract: As a valid tool for solving ground state problems, imaginary time evolution (ITE) is widely used in physical and chemical simulations. Different ITE-based algorithms in their quantum counterpart have recently been proposed and applied to some real systems. We experimentally realize the variational-based quantum imaginary time evolution (QITE) algorithm to simulate the ground state energy of hydrog… ▽ More As a valid tool for solving ground state problems, imaginary time evolution (ITE) is widely used in physical and chemical simulations. Different ITE-based algorithms in their quantum counterpart have recently been proposed and applied to some real systems. We experimentally realize the variational-based quantum imaginary time evolution (QITE) algorithm to simulate the ground state energy of hydrogen (H2) and lithium hydride (LiH) molecules in a superconducting qubit system. The H2 molecule is directly simulated using the 3-qubit circuit with unitary-coupled clusters (UCC) ansatz. We also combine QITE with the cluster mean-field (CMF) method to obtain an effective Hamiltonian. The LiH molecule is correspondingly simulated using the 3-qubit circuit with hardware-efficient ansatz. For comparison, the LiH molecule is also directly simulated using the 4-qubit circuit with UCC ansatz at the equilibrium point. All the experimental results show a convergence within 4 iterations, with high-fidelity ground state energy obtained. For a more complex system in the future, the CMF may allow further grouping of interactions to obtain an effective Hamiltonian, then the hybrid QITE algorithm can possibly simulate a relatively large-scale system with fewer qubits. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 11 pages, 5 figures

arXiv:2302.08210 [pdf, other]

A Survey of Geometric Optimization for Deep Learning: From Euclidean Space to Riemannian Manifold

Authors: Yanhong Fei, Xian Wei, Yingjie Liu, Zhengyu Li, Mingsong Chen

Abstract: Although Deep Learning (DL) has achieved success in complex Artificial Intelligence (AI) tasks, it suffers from various notorious problems (e.g., feature redundancy, and vanishing or exploding gradients), since updating parameters in Euclidean space cannot fully exploit the geometric structure of the solution space. As a promising alternative solution, Riemannian-based DL uses geometric optimizati… ▽ More Although Deep Learning (DL) has achieved success in complex Artificial Intelligence (AI) tasks, it suffers from various notorious problems (e.g., feature redundancy, and vanishing or exploding gradients), since updating parameters in Euclidean space cannot fully exploit the geometric structure of the solution space. As a promising alternative solution, Riemannian-based DL uses geometric optimization to update parameters on Riemannian manifolds and can leverage the underlying geometric information. Accordingly, this article presents a comprehensive survey of applying geometric optimization in DL. At first, this article introduces the basic procedure of the geometric optimization, including various geometric optimizers and some concepts of Riemannian manifold. Subsequently, this article investigates the application of geometric optimization in different DL networks in various AI tasks, e.g., convolution neural network, recurrent neural network, transfer learning, and optimal transport. Additionally, typical public toolboxes that implement optimization on manifold are also discussed. Finally, this article makes a performance comparison between different deep geometric optimization methods under image recognition scenarios. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: 41 pages

arXiv:2301.01286 [pdf, other]

Pseudo-Inverted Bottleneck Convolution for DARTS Search Space

Authors: Arash Ahmadian, Louis S. P. Liu, Yue Fei, Konstantinos N. Plataniotis, Mahdi S. Hosseini

Abstract: Differentiable Architecture Search (DARTS) has attracted considerable attention as a gradient-based neural architecture search method. Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-desig… ▽ More Differentiable Architecture Search (DARTS) has attracted considerable attention as a gradient-based neural architecture search method. Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-design changes inspired by ConvNeXt and studying the trade-off between accuracy, evaluation layer count, and computational cost. We introduce the Pseudo-Inverted Bottleneck Conv (PIBConv) block intending to reduce the computational footprint of the inverted bottleneck block proposed in ConvNeXt. Our proposed architecture is much less sensitive to evaluation layer count and outperforms a DARTS network with similar size significantly, at layer counts as small as 2. Furthermore, with less layers, not only does it achieve higher accuracy with lower computational footprint (measured in GMACs) and parameter count, GradCAM comparisons show that our network can better detect distinctive features of target objects compared to DARTS. Code is available from https://github.com/mahdihosseini/PIBConv. △ Less

Submitted 18 March, 2023; v1 submitted 31 December, 2022; originally announced January 2023.

Comments: 5 pages

arXiv:2212.12372 [pdf, other]

Factoring integers with sublinear resources on a superconducting quantum processor

Authors: Bao Yan, Ziqi Tan, Shijie Wei, Haocong Jiang, Weilong Wang, Hong Wang, Lan Luo, Qianheng Duan, Yiting Liu, Wenhao Shi, Yangyang Fei, Xiangdong Meng, Yu Han, Zheng Shan, Jiachen Chen, Xuhao Zhu, Chuanyu Zhang, Feitong Jin, Hekang Li, Chao Song, Zhen Wang, Zhi Ma, H. Wang, Gui-Lu Long

Abstract: Shor's algorithm has seriously challenged information security based on public key cryptosystems. However, to break the widely used RSA-2048 scheme, one needs millions of physical qubits, which is far beyond current technical capabilities. Here, we report a universal quantum algorithm for integer factorization by combining the classical lattice reduction with a quantum approximate optimization alg… ▽ More Shor's algorithm has seriously challenged information security based on public key cryptosystems. However, to break the widely used RSA-2048 scheme, one needs millions of physical qubits, which is far beyond current technical capabilities. Here, we report a universal quantum algorithm for integer factorization by combining the classical lattice reduction with a quantum approximate optimization algorithm (QAOA). The number of qubits required is O(logN/loglog N), which is sublinear in the bit length of the integer $N$, making it the most qubit-saving factorization algorithm to date. We demonstrate the algorithm experimentally by factoring integers up to 48 bits with 10 superconducting qubits, the largest integer factored on a quantum device. We estimate that a quantum circuit with 372 physical qubits and a depth of thousands is necessary to challenge RSA-2048 using our algorithm. Our study shows great promise in expediting the application of current noisy quantum computers, and paves the way to factor large integers of realistic cryptographic significance. △ Less

Submitted 23 December, 2022; originally announced December 2022.

Comments: 32 pages, 12 figures

arXiv:2211.07909 [pdf, other]

Selective Memory Recursive Least Squares: Recast Forgetting into Memory in RBF Neural Network Based Real-Time Learning

Authors: Yiming Fei, Jiangang Li, Yanan Li

Abstract: In radial basis function neural network (RBFNN) based real-time learning tasks, forgetting mechanisms are widely used such that the neural network can keep its sensitivity to new data. However, with forgetting mechanisms, some useful knowledge will get lost simply because they are learned a long time ago, which we refer to as the passive knowledge forgetting phenomenon. To address this problem, th… ▽ More In radial basis function neural network (RBFNN) based real-time learning tasks, forgetting mechanisms are widely used such that the neural network can keep its sensitivity to new data. However, with forgetting mechanisms, some useful knowledge will get lost simply because they are learned a long time ago, which we refer to as the passive knowledge forgetting phenomenon. To address this problem, this paper proposes a real-time training method named selective memory recursive least squares (SMRLS) in which the classical forgetting mechanisms are recast into a memory mechanism. Different from the forgetting mechanism, which mainly evaluates the importance of samples according to the time when samples are collected, the memory mechanism evaluates the importance of samples through both temporal and spatial distribution of samples. With SMRLS, the input space of the RBFNN is evenly divided into a finite number of partitions and a synthesized objective function is developed using synthesized samples from each partition. In addition to the current approximation error, the neural network also updates its weights according to the recorded data from the partition being visited. Compared with classical training methods including the forgetting factor recursive least squares (FFRLS) and stochastic gradient descent (SGD) methods, SMRLS achieves improved learning speed and generalization capability, which are demonstrated by corresponding simulation results. △ Less

Submitted 8 August, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: 12 pages, 15 figures

MSC Class: 93-10

arXiv:2210.16637 [pdf, other]

Beyond Prompting: Making Pre-trained Language Models Better Zero-shot Learners by Clustering Representations

Authors: Yu Fei, Ping Nie, Zhao Meng, Roger Wattenhofer, Mrinmaya Sachan

Abstract: Recent work has demonstrated that pre-trained language models (PLMs) are zero-shot learners. However, most existing zero-shot methods involve heavy human engineering or complicated self-training pipelines, hindering their application to new situations. In this work, we show that zero-shot text classification can be improved simply by clustering texts in the embedding spaces of PLMs. Specifically,… ▽ More Recent work has demonstrated that pre-trained language models (PLMs) are zero-shot learners. However, most existing zero-shot methods involve heavy human engineering or complicated self-training pipelines, hindering their application to new situations. In this work, we show that zero-shot text classification can be improved simply by clustering texts in the embedding spaces of PLMs. Specifically, we fit the unlabeled texts with a Bayesian Gaussian Mixture Model after initializing cluster positions and shapes using class names. Despite its simplicity, this approach achieves superior or comparable performance on both topic and sentiment classification datasets and outperforms prior works significantly on unbalanced datasets. We further explore the applicability of our clustering approach by evaluating it on 14 datasets with more diverse topics, text lengths, and numbers of classes. Our approach achieves an average of 20% absolute improvement over prompt-based zero-shot learning. Finally, we compare different PLM embedding spaces and find that texts are well-clustered by topics even if the PLM is not explicitly pre-trained to generate meaningful sentence embeddings. This work indicates that PLM embeddings can categorize texts without task-specific fine-tuning, thus providing a new way to analyze and utilize their knowledge and zero-shot learning ability. △ Less

Submitted 23 November, 2022; v1 submitted 29 October, 2022; originally announced October 2022.

Comments: Accepted to EMNLP 2022

arXiv:2210.09849 [pdf, other]

Scalable Framework For Deep Learning based CSI Feedback

Authors: Liqiang Jin, Qiuping Huang, Qiubin Gao, Yongqiang Fei, Shaohui Sun

Abstract: Deep learning (DL) based channel state information (CSI) feedback in multiple-input multiple-output (MIMO) systems recently has attracted lots of attention from both academia and industrial. From a practical point of views, it is huge burden to train, transfer and deploy a DL model for each parameter configuration of the base station (BS). In this paper, we propose a scalable and flexible framewor… ▽ More Deep learning (DL) based channel state information (CSI) feedback in multiple-input multiple-output (MIMO) systems recently has attracted lots of attention from both academia and industrial. From a practical point of views, it is huge burden to train, transfer and deploy a DL model for each parameter configuration of the base station (BS). In this paper, we propose a scalable and flexible framework for DL based CSI feedback referred as scalable CsiNet (SCsiNet) to adapt a family of configured parameters such as feedback payloads, MIMO channel ranks, antenna numbers. To reduce model size and training complexity, the core block with pre-processing and post-processing in SCsiNet is reused among different parameter configurations as much as possible which is totally different from configuration-orienting design. The preprocessing and post-processing are trainable neural network layers introduced for matching input/output dimensions and probability distributions. The proposed SCsiNet is evaluated by metrics of squared generalized cosine similarity (SGCS) and user throughput (UPT) in system level simulations. Compared to existing schemes (configuration-orienting DL schemes and 3GPP Rel-16 Type-II codebook based schemes), the proposed scheme can significantly reduce mode size and achieve 2%-10% UPT improvement for all parameter configurations. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: 6 pages,3 figures

arXiv:2208.09896 [pdf, other]

SIM2E: Benchmarking the Group Equivariant Capability of Correspondence Matching Algorithms

Authors: Shuai Su, Zhongkai Zhao, Yixin Fei, Shuda Li, Qijun Chen, Rui Fan

Abstract: Correspondence matching is a fundamental problem in computer vision and robotics applications. Solving correspondence matching problems using neural networks has been on the rise recently. Rotation-equivariance and scale-equivariance are both critical in correspondence matching applications. Classical correspondence matching approaches are designed to withstand scaling and rotation transformations… ▽ More Correspondence matching is a fundamental problem in computer vision and robotics applications. Solving correspondence matching problems using neural networks has been on the rise recently. Rotation-equivariance and scale-equivariance are both critical in correspondence matching applications. Classical correspondence matching approaches are designed to withstand scaling and rotation transformations. However, the features extracted using convolutional neural networks (CNNs) are only translation-equivariant to a certain extent. Recently, researchers have strived to improve the rotation-equivariance of CNNs based on group theories. Sim(2) is the group of similarity transformations in the 2D plane. This paper presents a specialized dataset dedicated to evaluating sim(2)-equivariant correspondence matching algorithms. We compare the performance of 16 state-of-the-art (SoTA) correspondence matching approaches. The experimental results demonstrate the importance of group equivariant algorithms for correspondence matching on various sim(2) transformation conditions. Since the subpixel accuracy achieved by CNN-based correspondence matching approaches is unsatisfactory, this specific area requires more attention in future works. Our dataset is publicly available at: mias.group/SIM2E. △ Less

Submitted 21 August, 2022; originally announced August 2022.

Comments: ECCV2022 Workshop Paper

arXiv:2208.01898 [pdf, other]

XCon: Learning with Experts for Fine-grained Category Discovery

Authors: Yixin Fei, Zhongkai Zhao, Siwei Yang, Bingchen Zhao

Abstract: We address the problem of generalized category discovery (GCD) in this paper, i.e. clustering the unlabeled images leveraging the information from a set of seen classes, where the unlabeled images could contain both seen classes and unseen classes. The seen classes can be seen as an implicit criterion of classes, which makes this setting different from unsupervised clustering where the cluster cri… ▽ More We address the problem of generalized category discovery (GCD) in this paper, i.e. clustering the unlabeled images leveraging the information from a set of seen classes, where the unlabeled images could contain both seen classes and unseen classes. The seen classes can be seen as an implicit criterion of classes, which makes this setting different from unsupervised clustering where the cluster criteria may be ambiguous. We mainly concern the problem of discovering categories within a fine-grained dataset since it is one of the most direct applications of category discovery, i.e. helping experts discover novel concepts within an unlabeled dataset using the implicit criterion set forth by the seen classes. State-of-the-art methods for generalized category discovery leverage contrastive learning to learn the representations, but the large inter-class similarity and intra-class variance pose a challenge for the methods because the negative examples may contain irrelevant cues for recognizing a category so the algorithms may converge to a local-minima. We present a novel method called Expert-Contrastive Learning (XCon) to help the model to mine useful information from the images by first partitioning the dataset into sub-datasets using k-means clustering and then performing contrastive learning on each of the sub-datasets to learn fine-grained discriminative features. Experiments on fine-grained datasets show a clear improved performance over the previous best methods, indicating the effectiveness of our method. △ Less

Submitted 3 August, 2022; originally announced August 2022.

arXiv:2206.03990 [pdf]

doi 10.1177/13694332231184322

Hysteretic Behavior Simulation Based on Pyramid Neural Network:Principle, Network Architecture, Case Study and Explanation

Authors: Yongjia Xu, Xinzheng Lu, Yifan Fei, Yuli Huang

Abstract: An accurate and efficient simulation of the hysteretic behavior of materials and components is essential for structural analysis. The surrogate model based on neural networks shows significant potential in balancing efficiency and accuracy. However, its serial information flow and prediction based on single-level features adversely affect the network performance. Therefore, a weighted stacked pyra… ▽ More An accurate and efficient simulation of the hysteretic behavior of materials and components is essential for structural analysis. The surrogate model based on neural networks shows significant potential in balancing efficiency and accuracy. However, its serial information flow and prediction based on single-level features adversely affect the network performance. Therefore, a weighted stacked pyramid neural network architecture is proposed herein. This network establishes a pyramid architecture by introducing multi-level shortcuts to integrate features directly in the output module. In addition, a weighted stacked strategy is proposed to enhance the conventional feature fusion method. Subsequently, the redesigned architectures are compared with other commonly used network architectures. Results show that the redesigned architectures outperform the alternatives in 87.5% of cases. Meanwhile, the long and short-term memory abilities of different basic network architectures are analyzed through a specially designed experiment, which could provide valuable suggestions for network selection. △ Less

Submitted 19 June, 2023; v1 submitted 29 April, 2022; originally announced June 2022.

Comments: 41 pages, 14 figures

Journal ref: Advances in Structural Engineering. 2023, 1-16

arXiv:2205.00140 [pdf, ps, other]

Improved Approximation to First-Best Gains-from-Trade

Authors: Yumou Fei

Abstract: We study the two-agent single-item bilateral trade. Ideally, the trade should happen whenever the buyer's value for the item exceeds the seller's cost. However, the classical result of Myerson and Satterthwaite showed that no mechanism can achieve this without violating one of the Bayesian incentive compatibility, individual rationality and weakly balanced budget conditions. This motivates the stu… ▽ More We study the two-agent single-item bilateral trade. Ideally, the trade should happen whenever the buyer's value for the item exceeds the seller's cost. However, the classical result of Myerson and Satterthwaite showed that no mechanism can achieve this without violating one of the Bayesian incentive compatibility, individual rationality and weakly balanced budget conditions. This motivates the study of approximating the trade-whenever-socially-beneficial mechanism, in terms of the expected gains-from-trade. Recently, Deng, Mao, Sivan, and Wang showed that the random-offerer mechanism achieves at least a 1/8.23 approximation. We improve this lower bound to 1/3.15 in this paper. We also determine the exact worst-case approximation ratio of the seller-pricing mechanism assuming the distribution of the buyer's value satisfies the monotone hazard rate property. △ Less

Submitted 29 April, 2022; originally announced May 2022.

arXiv:2203.12046 [pdf, other]

NNReArch: A Tensor Program Scheduling Framework Against Neural Network Architecture Reverse Engineering

Authors: Yukui Luo, Shijin Duan, Cheng Gongye, Yunsi Fei, Xiaolin Xu

Abstract: Architecture reverse engineering has become an emerging attack against deep neural network (DNN) implementations. Several prior works have utilized side-channel leakage to recover the model architecture while the target is executing on a hardware acceleration platform. In this work, we target an open-source deep-learning accelerator, Versatile Tensor Accelerator (VTA), and utilize electromagnetic… ▽ More Architecture reverse engineering has become an emerging attack against deep neural network (DNN) implementations. Several prior works have utilized side-channel leakage to recover the model architecture while the target is executing on a hardware acceleration platform. In this work, we target an open-source deep-learning accelerator, Versatile Tensor Accelerator (VTA), and utilize electromagnetic (EM) side-channel leakage to comprehensively learn the association between DNN architecture configurations and EM emanations. We also consider the holistic system -- including the low-level tensor program code of the VTA accelerator on a Xilinx FPGA and explore the effect of such low-level configurations on the EM leakage. Our study demonstrates that both the optimization and configuration of tensor programs will affect the EM side-channel leakage. Gaining knowledge of the association between the low-level tensor program and the EM emanations, we propose NNReArch, a lightweight tensor program scheduling framework against side-channel-based DNN model architecture reverse engineering. Specifically, NNReArch targets reshaping the EM traces of different DNN operators, through scheduling the tensor program execution of the DNN model so as to confuse the adversary. NNReArch is a comprehensive protection framework supporting two modes, a balanced mode that strikes a balance between the DNN model confidentiality and execution performance, and a secure mode where the most secure setting is chosen. We implement and evaluate the proposed framework on the open-source VTA with state-of-the-art DNN architectures. The experimental results demonstrate that NNReArch can efficiently enhance the model architecture security with a small performance overhead. In addition, the proposed obfuscation technique makes reverse engineering of the DNN architecture significantly harder. △ Less

Submitted 22 March, 2022; originally announced March 2022.

Comments: Accepted by FCCM 2022

arXiv:2203.07979 [pdf, other]

doi 10.1364/OPTICA.439170

Loss-tolerant all-photonic quantum repeater with generalized Shor code

Authors: Rui Zhang, Li-Zheng Liu, Zheng-Da Li, Yue-Yang Fei, Xu-Fei Yin, Li Li, Nai-Le Liu, Yingqiu Mao, Yu-Ao Chen, Jian-Wei Pan

Abstract: The all-photonic quantum repeater (APQR) is a promising repeater scheme to realize long-distance quantum communication. For a practical APQR, an indispensable requirement is the robustness of the repeater graph state (RGS) against photon loss. We propose a new loss-tolerant scheme by applying the generalized Shor code to RGS, which can be experimentally demonstrated with current technology. Experi… ▽ More The all-photonic quantum repeater (APQR) is a promising repeater scheme to realize long-distance quantum communication. For a practical APQR, an indispensable requirement is the robustness of the repeater graph state (RGS) against photon loss. We propose a new loss-tolerant scheme by applying the generalized Shor code to RGS, which can be experimentally demonstrated with current technology. Experimentally, we first prepare and verify the nine-qubit Shor code. Then, by applying the generalized Shor code to APQR and preparing a simplified encoded RGS with the structure of $1\times2$ based on the Shor code state, the effectiveness of our loss-tolerant scheme and the loss tolerance of the encoded RGS are respectively verified. Our results make an essential step toward a practical APQR and enrich the research of quantum error correction code. △ Less

Submitted 15 March, 2022; originally announced March 2022.

Comments: 8 pages, 5 figures

Journal ref: Optica 9, 152-158 (2022)

arXiv:2203.07749 [pdf, other]

doi 10.1103/PhysRevLett.128.110501

Efficient Bipartite Entanglement Detection Scheme with a Quantum Adversarial Solver

Authors: Xu-Fei Yin, Yuxuan Du, Yue-Yang Fei, Rui Zhang, Li-Zheng Liu, Yingqiu Mao, Tongliang Liu, Min-Hsiu Hsieh, Li Li, Nai-Le Liu, Dacheng Tao, Yu-Ao Chen, Jian-Wei Pan

Abstract: The recognition of entanglement states is a notoriously difficult problem when no prior information is available. Here, we propose an efficient quantum adversarial bipartite entanglement detection scheme to address this issue. Our proposal reformulates the bipartite entanglement detection as a two-player zero-sum game completed by parameterized quantum circuits, where a two-outcome measurement can… ▽ More The recognition of entanglement states is a notoriously difficult problem when no prior information is available. Here, we propose an efficient quantum adversarial bipartite entanglement detection scheme to address this issue. Our proposal reformulates the bipartite entanglement detection as a two-player zero-sum game completed by parameterized quantum circuits, where a two-outcome measurement can be used to query a classical binary result about whether the input state is bipartite entangled or not. In principle, for an $N$-qubit quantum state, the runtime complexity of our proposal is $O(\text{poly}(N)T)$ with $T$ being the number of iterations. We experimentally implement our protocol on a linear optical network and exhibit its effectiveness to accomplish the bipartite entanglement detection for 5-qubit quantum pure states and 2-qubit quantum mixed states. Our work paves the way for using near-term quantum machines to tackle entanglement detection on multipartite entangled quantum systems. △ Less

Submitted 15 March, 2022; originally announced March 2022.

Comments: 7 pages, 3 figures

Journal ref: Phys. Rev. Lett. 128, 110501 (2022)

arXiv:2203.03110 [pdf, ps, other]

Cascaded Gaps: Towards Gap-Dependent Regret for Risk-Sensitive Reinforcement Learning

Authors: Yingjie Fei, Ruitu Xu

Abstract: In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel definition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to the underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two… ▽ More In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel definition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to the underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds. △ Less

Submitted 6 March, 2022; originally announced March 2022.

arXiv:2201.12133 [pdf, other]

O-ViT: Orthogonal Vision Transformer

Authors: Yanhong Fei, Yingjie Liu, Xian Wei, Mingsong Chen

Abstract: Inspired by the tremendous success of the self-attention mechanism in natural language processing, the Vision Transformer (ViT) creatively applies it to image patch sequences and achieves incredible performance. However, the scaled dot-product self-attention of ViT brings about scale ambiguity to the structure of the original feature space. To address this problem, we propose a novel method named… ▽ More Inspired by the tremendous success of the self-attention mechanism in natural language processing, the Vision Transformer (ViT) creatively applies it to image patch sequences and achieves incredible performance. However, the scaled dot-product self-attention of ViT brings about scale ambiguity to the structure of the original feature space. To address this problem, we propose a novel method named Orthogonal Vision Transformer (O-ViT), to optimize ViT from the geometric perspective. O-ViT limits parameters of self-attention blocks to be on the norm-keeping orthogonal manifold, which can keep the geometry of the feature space. Moreover, O-ViT achieves both orthogonal constraints and cheap optimization overhead by adopting a surjective mapping between the orthogonal group and its Lie algebra.We have conducted comparative experiments on image recognition tasks to demonstrate O-ViT's validity and experiments show that O-ViT can boost the performance of ViT by up to 3.6%. △ Less

Submitted 16 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

arXiv:2201.09329 [pdf, other]

ULSA: Unified Language of Synthesis Actions for Representation of Synthesis Protocols

Authors: Zheren Wang, Kevin Cruse, Yuxing Fei, Ann Chia, Yan Zeng, Haoyan Huo, Tanjin He, Bowen Deng, Olga Kononova, Gerbrand Ceder

Abstract: Applying AI power to predict syntheses of novel materials requires high-quality, large-scale datasets. Extraction of synthesis information from scientific publications is still challenging, especially for extracting synthesis actions, because of the lack of a comprehensive labeled dataset using a solid, robust, and well-established ontology for describing synthesis procedures. In this work, we pro… ▽ More Applying AI power to predict syntheses of novel materials requires high-quality, large-scale datasets. Extraction of synthesis information from scientific publications is still challenging, especially for extracting synthesis actions, because of the lack of a comprehensive labeled dataset using a solid, robust, and well-established ontology for describing synthesis procedures. In this work, we propose the first Unified Language of Synthesis Actions (ULSA) for describing ceramics synthesis procedures. We created a dataset of 3,040 synthesis procedures annotated by domain experts according to the proposed ULSA scheme. To demonstrate the capabilities of ULSA, we built a neural network-based model to map arbitrary ceramics synthesis paragraphs into ULSA and used it to construct synthesis flowcharts for synthesis procedures. Analysis for the flowcharts showed that (a) ULSA covers essential vocabulary used by researchers when describing synthesis procedures and (b) it can capture important features of synthesis protocols. This work is an important step towards creating a synthesis ontology and a solid foundation for autonomous robotic synthesis. △ Less

Submitted 23 January, 2022; originally announced January 2022.

arXiv:2111.10874 [pdf, other]

Dataset of Solution-based Inorganic Materials Synthesis Recipes Extracted from the Scientific Literature

Authors: Zheren Wang, Olga Kononova, Kevin Cruse, Tanjin He, Haoyan Huo, Yuxing Fei, Yan Zeng, Yingzhi Sun, Zijian Cai, Wenhao Sun, Gerbrand Ceder

Abstract: The development of a materials synthesis route is usually based on heuristics and experience. A possible new approach would be to apply data-driven approaches to learn the patterns of synthesis from past experience and use them to predict the syntheses of novel materials. However, this route is impeded by the lack of a large-scale database of synthesis formulations. In this work, we applied advanc… ▽ More The development of a materials synthesis route is usually based on heuristics and experience. A possible new approach would be to apply data-driven approaches to learn the patterns of synthesis from past experience and use them to predict the syntheses of novel materials. However, this route is impeded by the lack of a large-scale database of synthesis formulations. In this work, we applied advanced machine learning and natural language processing techniques to construct a dataset of 35,675 solution-based synthesis "recipes" extracted from the scientific literature. Each recipe contains essential synthesis information including the precursors and target materials, their quantities, and the synthesis actions and corresponding attributes. Every recipe is also augmented with the reaction formula. Through this work, we are making freely available the first large dataset of solution-based inorganic materials synthesis recipes. △ Less

Submitted 21 November, 2021; originally announced November 2021.

arXiv:2111.03947 [pdf, other]

Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning

Authors: Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang

Abstract: We study risk-sensitive reinforcement learning (RL) based on the entropic risk measure. Although existing works have established non-asymptotic regret guarantees for this problem, they leave open an exponential gap between the upper and lower bounds. We identify the deficiencies in existing algorithms and their analysis that result in such a gap. To remedy these deficiencies, we investigate a simp… ▽ More We study risk-sensitive reinforcement learning (RL) based on the entropic risk measure. Although existing works have established non-asymptotic regret guarantees for this problem, they leave open an exponential gap between the upper and lower bounds. We identify the deficiencies in existing algorithms and their analysis that result in such a gap. To remedy these deficiencies, we investigate a simple transformation of the risk-sensitive Bellman equations, which we call the exponential Bellman equation. The exponential Bellman equation inspires us to develop a novel analysis of Bellman backup procedures in risk-sensitive RL algorithms, and further motivates the design of a novel exploration mechanism. We show that these analytic and algorithmic innovations together lead to improved regret upper bounds over existing ones. △ Less

Submitted 6 November, 2021; originally announced November 2021.

arXiv:2111.00699 [pdf, other]

Principles towards Real-Time Simulation of Material Point Method on Modern GPUs

Authors: Yun Fei, Yuhan Huang, Ming Gao

Abstract: Physics-based simulation has been actively employed in generating offline visual effects in the film and animation industry. However, the computations required for high-quality scenarios are generally immense, deterring its adoption in real-time applications, e.g., virtual production, avatar live-streaming, and cloud gaming. We summarize the principles that can accelerate the computation pipeline… ▽ More Physics-based simulation has been actively employed in generating offline visual effects in the film and animation industry. However, the computations required for high-quality scenarios are generally immense, deterring its adoption in real-time applications, e.g., virtual production, avatar live-streaming, and cloud gaming. We summarize the principles that can accelerate the computation pipeline on single-GPU and multi-GPU platforms through extensive investigation and comprehension of modern GPU architecture. We further demonstrate the effectiveness of these principles by applying them to the material point method to build up our framework, which achieves $1.7\times$--$8.6\times$ speedup on a single GPU and $2.5\times$--$14.8\times$ on four GPUs compared to the state-of-the-art. Our pipeline is specifically designed for real-time applications (i.e., scenarios with small to medium particles) and achieves significant multi-GPU efficiency. We demonstrate our pipeline by simulating a snow scenario with 1.33M particles and a fountain scenario with 143K particles in real-time (on average, 68.5 and 55.9 frame-per-second, respectively) on four NVIDIA Tesla V100 GPUs interconnected with NVLinks. △ Less

Submitted 1 November, 2021; originally announced November 2021.

ACM Class: I.3.1; I.3.7

Showing 1–50 of 91 results for author: Fei, Y