subscribe to arXiv mailings

Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?

Authors: Yingming Pu, Liping Huang, Tao Lin, Hongyu Chen

Abstract: With the rapid development of artificial intelligence (AI), large language models (LLMs) such as GPT-4 have garnered significant attention in the scientific community, demonstrating great potential in advancing scientific discovery. This progress raises a critical question: are these LLMs well-aligned with real-world physicochemical principles? Current evaluation strategies largely emphasize fact-… ▽ More With the rapid development of artificial intelligence (AI), large language models (LLMs) such as GPT-4 have garnered significant attention in the scientific community, demonstrating great potential in advancing scientific discovery. This progress raises a critical question: are these LLMs well-aligned with real-world physicochemical principles? Current evaluation strategies largely emphasize fact-based knowledge, such as material property prediction or name recognition, but they often lack an understanding of fundamental physicochemical mechanisms that require logical reasoning. To bridge this gap, our study developed a benchmark consisting of 775 multiple-choice questions focusing on the mechanisms of gold nanoparticle synthesis. By reflecting on existing evaluation metrics, we question whether a direct true-or-false assessment merely suggests conjecture. Hence, we propose a novel evaluation metric, the confidence-based score (c-score), which probes the output logits to derive the precise probability for the correct answer. Based on extensive experiments, our results show that in the context of gold nanoparticle synthesis, LLMs understand the underlying physicochemical mechanisms rather than relying on conjecture. This study underscores the potential of LLMs to grasp intrinsic scientific mechanisms and sets the stage for developing more reliable and effective AI tools across various scientific domains. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.05700 [pdf, other]

InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct

Authors: Yutong Wu, Di Huang, Wenxuan Shi, Wei Wang, Lingzhe Gao, Shihao Liu, Ziyuan Nan, Kaizhao Yuan, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Yewen Pu, Dawei Yin, Xing Hu, Yunji Chen

Abstract: Recent advancements in open-source code large language models (LLMs) have demonstrated remarkable coding abilities by fine-tuning on the data generated from powerful closed-source LLMs such as GPT-3.5 and GPT-4 for instruction tuning. This paper explores how to further improve an instruction-tuned code LLM by generating data from itself rather than querying closed-source LLMs. Our key observation… ▽ More Recent advancements in open-source code large language models (LLMs) have demonstrated remarkable coding abilities by fine-tuning on the data generated from powerful closed-source LLMs such as GPT-3.5 and GPT-4 for instruction tuning. This paper explores how to further improve an instruction-tuned code LLM by generating data from itself rather than querying closed-source LLMs. Our key observation is the misalignment between the translation of formal and informal languages: translating formal language (i.e., code) to informal language (i.e., natural language) is more straightforward than the reverse. Based on this observation, we propose INVERSE-INSTRUCT, which summarizes instructions from code snippets instead of the reverse. Specifically, given an instruction tuning corpus for code and the resulting instruction-tuned code LLM, we ask the code LLM to generate additional high-quality instructions for the original corpus through code summarization and self-evaluation. Then, we fine-tune the base LLM on the combination of the original corpus and the self-generated one, which yields a stronger instruction-tuned LLM. We present a series of code LLMs named InverseCoder, which surpasses the performance of the original code LLMs on a wide range of benchmarks, including Python text-to-code generation, multilingual coding, and data-science code generation. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05118 [pdf, other]

SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding

Authors: Zixu Cheng, Yujiang Pu, Shaogang Gong, Parisa Kordjamshidi, Yu Kong

Abstract: Temporal grounding, also known as video moment retrieval, aims at locating video segments corresponding to a given query sentence. The compositional nature of natural language enables the localization beyond predefined events, posing a certain challenge to the compositional generalizability of existing methods. Recent studies establish the correspondence between videos and queries through a decomp… ▽ More Temporal grounding, also known as video moment retrieval, aims at locating video segments corresponding to a given query sentence. The compositional nature of natural language enables the localization beyond predefined events, posing a certain challenge to the compositional generalizability of existing methods. Recent studies establish the correspondence between videos and queries through a decompose-reconstruct manner to achieve compositional generalization. However, they only consider dominant primitives and build negative queries through random sampling and recombination, resulting in semantically implausible negatives that hinder the models from learning rational compositions. In addition, recent DETR-based methods still underperform in compositional temporal grounding, showing irrational saliency responses when given negative queries that have subtle differences from positive queries. To address these limitations, we first propose a large language model-driven method for negative query construction, utilizing GPT-3.5-Turbo to generate semantically plausible hard negative queries. Subsequently, we introduce a coarse-to-fine saliency ranking strategy, which encourages the model to learn the multi-granularity semantic relationships between videos and hierarchical negative queries to boost compositional generalization. Extensive experiments on two challenging benchmarks validate the effectiveness and generalizability of our proposed method. Our code is available at https://github.com/zxccade/SHINE. △ Less

Submitted 15 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV 2024

arXiv:2407.02499 [pdf, other]

Amortizing Pragmatic Program Synthesis with Rankings

Authors: Yewen Pu, Saujas Vaduguru, Priyan Vaithilingam, Elena Glassman, Daniel Fried

Abstract: The usage of Rational Speech Acts (RSA) framework has been successful in building \emph{pragmatic} program synthesizers that return programs which, in addition to being logically consistent with user-generated examples, account for the fact that a user chooses their examples informatively. We present a general method of amortizing the slow, exact RSA synthesizer. Our method first query the exact R… ▽ More The usage of Rational Speech Acts (RSA) framework has been successful in building \emph{pragmatic} program synthesizers that return programs which, in addition to being logically consistent with user-generated examples, account for the fact that a user chooses their examples informatively. We present a general method of amortizing the slow, exact RSA synthesizer. Our method first query the exact RSA synthesizer to compile a communication dataset. The dataset contains a number of example-dependent rankings of subsets of programs. It then distills a \textit{single} global ranking of all programs as an approximation to every ranking in the dataset. This global ranking is then used at inference time to rank multiple logically consistent candidate programs generated from a fast, non-pragmatic synthesizer. Experiments on two program synthesis domains using our ranking method resulted in orders of magnitudes of speed ups compared to the exact RSA synthesizer, while being more accurate than a non-pragmatic synthesizer when communicating with humans. Finally, we prove that in the special case of synthesis from a single example, this approximation is exact. △ Less

Submitted 1 June, 2024; originally announced July 2024.

Comments: icml 2024. arXiv admin note: substantial text overlap with arXiv:2309.03225

arXiv:2406.10667 [pdf, other]

UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Authors: Yuan Pu, Yazhe Niu, Jiyuan Ren, Zhenjie Yang, Hongsheng Li, Yu Liu

Abstract: Learning predictive world models is essential for enhancing the planning capabilities of reinforcement learning agents. Notably, the MuZero-style algorithms, based on the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains. However, in environments that require capturing long-term dependencies, MuZero's performance deteriorates ra… ▽ More Learning predictive world models is essential for enhancing the planning capabilities of reinforcement learning agents. Notably, the MuZero-style algorithms, based on the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains. However, in environments that require capturing long-term dependencies, MuZero's performance deteriorates rapidly. We identify that this is partially due to the \textit{entanglement} of latent representations with historical information, which results in incompatibility with the auxiliary self-supervised state regularization. To overcome this limitation, we present \textit{UniZero}, a novel approach that \textit{disentangles} latent states from implicit latent history using a transformer-based latent world model. By concurrently predicting latent dynamics and decision-oriented quantities conditioned on the learned latent history, UniZero enables joint optimization of the long-horizon world model and policy, facilitating broader and more efficient planning in latent space. We demonstrate that UniZero, even with single-frame inputs, matches or surpasses the performance of MuZero-style algorithms on the Atari 100k benchmark. Furthermore, it significantly outperforms prior baselines in benchmarks that require long-term memory. Lastly, we validate the effectiveness and scalability of our design choices through extensive ablation studies, visual analyses, and multi-task learning results. The code is available at \textcolor{magenta}{https://github.com/opendilab/LightZero}. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 32 pages, 16 figures

arXiv:2405.19636 [pdf, other]

Creating Language-driven Spatial Variations of Icon Images

Authors: Xianghao Xu, Aditya Ganeshan, Karl D. D. Willis, Yewen Pu, Daniel Ritchie

Abstract: Editing 2D icon images can require significant manual effort from designers. It involves manipulating multiple geometries while maintaining the logical or physical coherence of the objects depicted in the image. Previous language driven image editing methods can change the texture and geometry of objects in the image but fail at producing spatial variations, i.e. modifying spatial relations betwee… ▽ More Editing 2D icon images can require significant manual effort from designers. It involves manipulating multiple geometries while maintaining the logical or physical coherence of the objects depicted in the image. Previous language driven image editing methods can change the texture and geometry of objects in the image but fail at producing spatial variations, i.e. modifying spatial relations between objects while maintaining their identities. We present a language driven editing method that can produce spatial variations of icon images. Our method takes in an icon image along with a user's editing request text prompt and outputs an edited icon image reflecting the user's editing request. Our method is designed based on two key observations: (1) A user's editing requests can be translated by a large language model (LLM), with help from a domain specific language (DSL) library, into to a set of geometrical constraints defining the relationships between segments in an icon image. (2) Optimizing the affine transformations of the segments with respect to these geometrical constraints can produce icon images that fulfill the editing request and preserve overall physical and logical coherence. Quantitative and qualitative results show that our system outperforms multiple baselines, enabling natural editing of icon images. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18717 [pdf]

Silicon-integrated scandium-doped aluminum nitride electro-optic modulator

Authors: Tianqi Xu, Yushuai Liu, Yuanmao Pu, Yongxiang Yang, Qize Zhong, Xingyan Zhao, Yang Qiu, Yuan Dong, Tao Wu, Shaonan Zheng, Ting Hu

Abstract: Scandium-doped aluminum nitride (AlScN) with an asymmetric hexagonal wurtzite structure exhibits enhanced second-order nonlinear and piezoelectric properties compared to aluminum nitride (AlN), while maintaining a relatively large bandgap. It provides a promising platform for photonic integration and facilitates the seamless integration of passive and active functional devices. Here, we present th… ▽ More Scandium-doped aluminum nitride (AlScN) with an asymmetric hexagonal wurtzite structure exhibits enhanced second-order nonlinear and piezoelectric properties compared to aluminum nitride (AlN), while maintaining a relatively large bandgap. It provides a promising platform for photonic integration and facilitates the seamless integration of passive and active functional devices. Here, we present the design, fabrication, and characterization of AlScN EO micro-ring modulators, introducing active functionalities to the chip-scale AlScN platform. These waveguide-integrated EO modulators employ sputtered AlScN thin films as the light-guiding medium, and the entire fabrication process is compatible with complementary metal oxide semiconductor (CMOS) technology. We characterize the high-frequency performance of an AlScN modulator for the first time, extracting a maximum in-device effective EO coefficient of 2.86 pm/V at 12 GHz. The devices show a minimum half-wave voltage-length product of 3.12 V*cm and a 3-dB modulation bandwidth of approximately 22 GHz. Our work provides a promising modulation scheme for cost-effective silicon-integrated photonics systems. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.16863 [pdf]

All-voltage control of Giant Magnetoresistance

Authors: Lujun Wei, Yiyang Zhang, Fei Huang, Jiajv Yang, Jincheng Peng, Yanghui Li, Yu Lu, Jiarui Chen, Tianyu Liu, Yong Pu, Jun Du

Abstract: The aim of voltage control of magnetism is to reduce the power consumption of spintronic devices. For a spin valve, the magnetization directions of two ferromagnetic layers determine the giant magnetoresistance magnitude. However, achieving all-voltage manipulation of the magnetization directions between parallel and antiparallel states is a significant challenge. Here, we demonstrate that by util… ▽ More The aim of voltage control of magnetism is to reduce the power consumption of spintronic devices. For a spin valve, the magnetization directions of two ferromagnetic layers determine the giant magnetoresistance magnitude. However, achieving all-voltage manipulation of the magnetization directions between parallel and antiparallel states is a significant challenge. Here, we demonstrate that by utilizing two exchange-biased Co/IrMn bilayers with opposite pinning directions and with ferromagnetic coupling through the Ruderman-Kittel-Kasuya-Yosida interaction between two Co layers, the magnetization directions of the two ferromagnetic layers of a spin valve can be switched between parallel and antiparallel states through allvoltage-induced strain control. The all-voltage controlled giant magnetoresistance is repeatable and nonvolatile. The rotation of magnetizations in the two Co layers under voltages, from antiparallel to parallel states, occurs in opposite directions as revealed through simulations utilizing the Landau-Lifshitz-Gilbert equation. This work can provide valuable reference for the development of low-power all-voltage-controlled spintronic devices. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16605 [pdf, other]

Demystify Mamba in Vision: A Linear Attention Perspective

Authors: Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang

Abstract: Mamba is an effective state space model with linear computation complexity. It has recently shown impressive efficiency in dealing with high-resolution inputs across various vision tasks. In this paper, we reveal that the powerful Mamba model shares surprising similarities with linear attention Transformer, which typically underperform conventional Transformer in practice. By exploring the similar… ▽ More Mamba is an effective state space model with linear computation complexity. It has recently shown impressive efficiency in dealing with high-resolution inputs across various vision tasks. In this paper, we reveal that the powerful Mamba model shares surprising similarities with linear attention Transformer, which typically underperform conventional Transformer in practice. By exploring the similarities and disparities between the effective Mamba and subpar linear attention Transformer, we provide comprehensive analyses to demystify the key factors behind Mamba's success. Specifically, we reformulate the selective state space model and linear attention within a unified formulation, rephrasing Mamba as a variant of linear attention Transformer with six major distinctions: input gate, forget gate, shortcut, no attention normalization, single-head, and modified block design. For each design, we meticulously analyze its pros and cons, and empirically evaluate its impact on model performance in vision tasks. Interestingly, the results highlight the forget gate and block design as the core contributors to Mamba's success, while the other four designs are less crucial. Based on these findings, we propose a Mamba-Like Linear Attention (MLLA) model by incorporating the merits of these two key designs into linear attention. The resulting model outperforms various vision Mamba models in both image classification and high-resolution dense prediction tasks, while enjoying parallelizable computation and fast inference speed. Code is available at https://github.com/LeapLabTHU/MLLA. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.13369 [pdf, other]

Realization of a crosstalk-free multi-ion node for long-distance quantum networking

Authors: P. -C. Lai, Y. Wang, J. -X. Shi, Z. -B. Cui, Z. -Q. Wang, S. Zhang, P. -Y. Liu, Z. -C. Tian, Y. -D. Sun, X. -Y. Chang, B. -X. Qi, Y. -Y. Huang, Z. -C. Zhou, Y. -K. Wu, Y. Xu, Y. -F. Pu, L. -M. Duan

Abstract: Trapped atomic ions constitute one of the leading physical platforms for building the quantum repeater nodes to realize large-scale quantum networks. In a long-distance trapped-ion quantum network, it is essential to have crosstalk-free dual-type qubits: one type, called the communication qubit, to establish entangling interface with telecom photons; and the other type, called the memory qubit, to… ▽ More Trapped atomic ions constitute one of the leading physical platforms for building the quantum repeater nodes to realize large-scale quantum networks. In a long-distance trapped-ion quantum network, it is essential to have crosstalk-free dual-type qubits: one type, called the communication qubit, to establish entangling interface with telecom photons; and the other type, called the memory qubit, to store quantum information immune from photon scattering under entangling attempts. Here, we report the first experimental implementation of a telecom-compatible and crosstalk-free quantum network node based on two trapped $^{40}$Ca$^{+}$ ions. The memory qubit is encoded on a long-lived metastable level to avoid crosstalk with the communication qubit encoded in another subspace of the same ion species, and a quantum wavelength conversion module is employed to generate ion-photon entanglement over a $12\,$km fiber in a heralded style. Our work therefore constitutes an important step towards the realization of quantum repeaters and long-distance quantum networks. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 12 pages, 12 figures

arXiv:2405.12786 [pdf, other]

Rethinking the Vulnerabilities of Face Recognition Systems:From a Practical Perspective

Authors: Jiahao Chen, Zhiqiang Shen, Yuwen Pu, Chunyi Zhou, Changjiang Li, Jiliang Li, Ting Wang, Shouling Ji

Abstract: Face Recognition Systems (FRS) have increasingly integrated into critical applications, including surveillance and user authentication, highlighting their pivotal role in modern security systems. Recent studies have revealed vulnerabilities in FRS to adversarial (e.g., adversarial patch attacks) and backdoor attacks (e.g., training data poisoning), raising significant concerns about their reliabil… ▽ More Face Recognition Systems (FRS) have increasingly integrated into critical applications, including surveillance and user authentication, highlighting their pivotal role in modern security systems. Recent studies have revealed vulnerabilities in FRS to adversarial (e.g., adversarial patch attacks) and backdoor attacks (e.g., training data poisoning), raising significant concerns about their reliability and trustworthiness. Previous studies primarily focus on traditional adversarial or backdoor attacks, overlooking the resource-intensive or privileged-manipulation nature of such threats, thus limiting their practical generalization, stealthiness, universality and robustness. Correspondingly, in this paper, we delve into the inherent vulnerabilities in FRS through user studies and preliminary explorations. By exploiting these vulnerabilities, we identify a novel attack, facial identity backdoor attack dubbed FIBA, which unveils a potentially more devastating threat against FRS:an enrollment-stage backdoor attack. FIBA circumvents the limitations of traditional attacks, enabling broad-scale disruption by allowing any attacker donning a specific trigger to bypass these systems. This implies that after a single, poisoned example is inserted into the database, the corresponding trigger becomes a universal key for any attackers to spoof the FRS. This strategy essentially challenges the conventional attacks by initiating at the enrollment stage, dramatically transforming the threat landscape by poisoning the feature database rather than the training data. △ Less

Submitted 8 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

Comments: 19 pages,version 3

arXiv:2405.12751 [pdf, other]

A Stealthy Backdoor Attack for Without-Label-Sharing Split Learning

Authors: Yuwen Pu, Zhuoyuan Ding, Jiahao Chen, Chunyi Zhou, Qingming Li, Chunqiang Hu, Shouling Ji

Abstract: As a novel privacy-preserving paradigm aimed at reducing client computational costs and achieving data utility, split learning has garnered extensive attention and proliferated widespread applications across various fields, including smart health and smart transportation, among others. While recent studies have primarily concentrated on addressing privacy leakage concerns in split learning, such a… ▽ More As a novel privacy-preserving paradigm aimed at reducing client computational costs and achieving data utility, split learning has garnered extensive attention and proliferated widespread applications across various fields, including smart health and smart transportation, among others. While recent studies have primarily concentrated on addressing privacy leakage concerns in split learning, such as inference attacks and data reconstruction, the exploration of security issues (e.g., backdoor attacks) within the framework of split learning has been comparatively limited. Nonetheless, the security vulnerability within the context of split learning is highly posing a threat and can give rise to grave security implications, such as the illegal impersonation in the face recognition model. Therefore, in this paper, we propose a stealthy backdoor attack strategy (namely SBAT) tailored to the without-label-sharing split learning architecture, which unveils the inherent security vulnerability of split learning. We posit the existence of a potential attacker on the server side aiming to introduce a backdoor into the training model, while exploring two scenarios: one with known client network architecture and the other with unknown architecture. Diverging from traditional backdoor attack methods that manipulate the training data and labels, we constructively conduct the backdoor attack by injecting the trigger embedding into the server network. Specifically, our SBAT achieves a higher level of attack stealthiness by refraining from modifying any intermediate parameters (e.g., gradients) during training and instead executing all malicious operations post-training. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 15 pages

arXiv:2405.12719 [pdf, other]

How to Train a Backdoor-Robust Model on a Poisoned Dataset without Auxiliary Data?

Authors: Yuwen Pu, Jiahao Chen, Chunyi Zhou, Zhou Feng, Qingming Li, Chunqiang Hu, Shouling Ji

Abstract: Backdoor attacks have attracted wide attention from academia and industry due to their great security threat to deep neural networks (DNN). Most of the existing methods propose to conduct backdoor attacks by poisoning the training dataset with different strategies, so it's critical to identify the poisoned samples and then train a clean model on the unreliable dataset in the context of defending b… ▽ More Backdoor attacks have attracted wide attention from academia and industry due to their great security threat to deep neural networks (DNN). Most of the existing methods propose to conduct backdoor attacks by poisoning the training dataset with different strategies, so it's critical to identify the poisoned samples and then train a clean model on the unreliable dataset in the context of defending backdoor attacks. Although numerous backdoor countermeasure researches are proposed, their inherent weaknesses render them limited in practical scenarios, such as the requirement of enough clean samples, unstable defense performance under various attack conditions, poor defense performance against adaptive attacks, and so on.Therefore, in this paper, we are committed to overcome the above limitations and propose a more practical backdoor defense method. Concretely, we first explore the inherent relationship between the potential perturbations and the backdoor trigger, and the theoretical analysis and experimental results demonstrate that the poisoned samples perform more robustness to perturbation than the clean ones. Then, based on our key explorations, we introduce AdvrBD, an Adversarial perturbation-based and robust Backdoor Defense framework, which can effectively identify the poisoned samples and train a clean model on the poisoned dataset. Constructively, our AdvrBD eliminates the requirement for any clean samples or knowledge about the poisoned dataset (e.g., poisoning ratio), which significantly improves the practicality in real-world scenarios. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 13 pages, under review

arXiv:2405.11398 [pdf, other]

Second-harmonic optical diffraction tomography

Authors: Amirhossein Saba, Carlo Gigli, Ye Pu, Demetri Psaltis

Abstract: Optical diffraction tomography (ODT) has emerged as an important label-free tool in biomedicine to measure the three-dimensional (3D) structure of a biological sample. In this paper, we describe ODT using second-harmonic generation (SHG) which is a coherent nonlinear optical process with a strict symmetry selectivity and has several advantages over traditional fluorescence methods. We report the t… ▽ More Optical diffraction tomography (ODT) has emerged as an important label-free tool in biomedicine to measure the three-dimensional (3D) structure of a biological sample. In this paper, we describe ODT using second-harmonic generation (SHG) which is a coherent nonlinear optical process with a strict symmetry selectivity and has several advantages over traditional fluorescence methods. We report the tomographic retrieval of the 3D second-order nonlinear optical susceptibility using two-dimensional holographic measurements of the SHG fields at different illumination angles and polarization states. The method is a generalization of the conventional linear ODT to the nonlinear scenario. We demonstrate the method with a numerically simulated nanoparticle distribution and an experiment with muscle tissue fibers. Our results show that SHG ODT does not only provide an effective contrast mechanism for label-free imaging but also due to the symmetry requirement enables the visualization of properties that are not otherwise accessible. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2404.17873 [pdf]

Bacterial stress granule protects mRNA through ribonucleases exclusion

Authors: Linsen Pei, Yujia Xian, Xiaodan Yan, Charley Schaefer, Aisha H. Syeda, Jamieson Howard, Hebin Liao, Fan Bai, Mark C. Leake, Yingying Pu

Abstract: Membraneless droplets formed through liquid-liquid phase separation (LLPS) play a crucial role in mRNA storage, enabling organisms to swiftly respond to environmental changes. However, the mechanisms underlying mRNA integration and protection within droplets remain unclear. Here, we unravel the role of bacterial aggresomes as stress granules (SGs) in safeguarding mRNA during stress. We discovered… ▽ More Membraneless droplets formed through liquid-liquid phase separation (LLPS) play a crucial role in mRNA storage, enabling organisms to swiftly respond to environmental changes. However, the mechanisms underlying mRNA integration and protection within droplets remain unclear. Here, we unravel the role of bacterial aggresomes as stress granules (SGs) in safeguarding mRNA during stress. We discovered that upon stress onset, mobile mRNA molecules selectively incorporate into individual proteinaceous SGs based on length-dependent enthalpic gain over entropic loss. As stress prolongs, SGs undergo compaction facilitated by stronger non-specific RNA-protein interactions, thereby promoting recruitment of shorter RNA chains. Remarkably, mRNA ribonucleases are repelled from bacterial SGs, due to the influence of protein surface charge. This exclusion mechanism ensures the integrity and preservation of mRNA within SGs during stress conditions, explaining how mRNA can be stored and protected from degradation. Following stress removal, SGs facilitate mRNA translation, thereby enhancing cell fitness in changing environments. These droplets maintain mRNA physiological activity during storage, making them an intriguing new candidate for mRNA therapeutics manufacturing. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.16364 [pdf, other]

ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze

Authors: Chunyu Xuan, Yazhe Niu, Yuan Pu, Shuai Hu, Yu Liu, Jing Yang

Abstract: Monte Carlo Tree Search (MCTS)-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains. These algorithms employ the reanalyze process to enhance sample efficiency from stale data, albeit at the expense of significant wall-clock time consumption. To address this issue, we propose a general approach named ReZero to boost tree search o… ▽ More Monte Carlo Tree Search (MCTS)-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains. These algorithms employ the reanalyze process to enhance sample efficiency from stale data, albeit at the expense of significant wall-clock time consumption. To address this issue, we propose a general approach named ReZero to boost tree search operations for MCTS-based algorithms. Specifically, drawing inspiration from the one-armed bandit model, we reanalyze training samples through a backward-view reuse technique which obtains the value estimation of a certain child node in advance. To further adapt to this design, we periodically reanalyze the entire buffer instead of frequently reanalyzing the mini-batch. The synergy of these two designs can significantly reduce the search cost and meanwhile guarantee or even improve performance, simplifying both data collecting and reanalyzing. Experiments conducted on Atari environments and board games demonstrate that ReZero substantially improves training speed while maintaining high sample efficiency. The code is available as part of the LightZero benchmark at https://github.com/opendilab/LightZero. △ Less

Submitted 28 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.04140 [pdf, other]

Improving Detection in Aerial Images by Capturing Inter-Object Relationships

Authors: Botao Ren, Botian Xu, Yifan Pu, Jingyi Wang, Zhidong Deng

Abstract: In many image domains, the spatial distribution of objects in a scene exhibits meaningful patterns governed by their semantic relationships. In most modern detection pipelines, however, the detection proposals are processed independently, overlooking the underlying relationships between objects. In this work, we introduce a transformer-based approach to capture these inter-object relationships to… ▽ More In many image domains, the spatial distribution of objects in a scene exhibits meaningful patterns governed by their semantic relationships. In most modern detection pipelines, however, the detection proposals are processed independently, overlooking the underlying relationships between objects. In this work, we introduce a transformer-based approach to capture these inter-object relationships to refine classification and regression outcomes for detected objects. Building on two-stage detectors, we tokenize the region of interest (RoI) proposals to be processed by a transformer encoder. Specific spatial and geometric relations are incorporated into the attention weights and adaptively modulated and regularized. Experimental results demonstrate that the proposed method achieves consistent performance improvement on three benchmarks including DOTA-v1.0, DOTA-v1.5, and HRSC 2016, especially ranking first on both DOTA-v1.5 and HRSC 2016. Specifically, our new method has an increase of 1.59 mAP on DOTA-v1.0, 4.88 mAP on DOTA-v1.5, and 2.1 mAP on HRSC 2016, respectively, compared to the baselines. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2403.16601 [pdf, other]

Singular profile of free boundary of incompressible inviscid fluid with external force

Authors: Lili Du, Yang Pu, Jing Yang

Abstract: This article is devoted to investigate the singular profile of the free boundary of two-dimensional incompressible inviscid fluid with external force near the stagnation point. More precisely, given an external force with some polynomial type decay close to the stagnation point, the singular profile of the free boundary at stagnation point possible are corner wave, flat and cusp singularity. Throu… ▽ More This article is devoted to investigate the singular profile of the free boundary of two-dimensional incompressible inviscid fluid with external force near the stagnation point. More precisely, given an external force with some polynomial type decay close to the stagnation point, the singular profile of the free boundary at stagnation point possible are corner wave, flat and cusp singularity. Through excluding the cusp and flat singularity, we know the only singular profile is corner wave singularity, and the corner depends on the decay rate of the solution near the stagnation point. The analysis depends on the geometric method to a class of Bernoulli-type free boundary problem with given degenerate gradient function on free boundary. This work is motivated by the significant work [E. V$\breve{a}$rv$\breve{a}$ruc$\breve{a}$ and G. Weiss, Acta Math, 206, 363-403, (2011)] on Stokes conjecture to the incompressible inviscid fluid acted on by gravity. △ Less

Submitted 20 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: 40 pages. Any comments are welcome

arXiv:2403.14369 [pdf, other]

A Control Barrier Function Composition Approach for Multi-Agent Systems in Marine Applications

Authors: Yujia Yang, Chris Manzie, Ye Pu

Abstract: The agents within a multi-agent system (MAS) operating in marine environments often need to utilize task payloads and avoid collisions in coordination, necessitating adherence to a set of relative-pose constraints, which may include field-of-view, line-of-sight, collision-avoidance, and range constraints. A nominal controller designed for reference tracking may not guarantee the marine MAS stays s… ▽ More The agents within a multi-agent system (MAS) operating in marine environments often need to utilize task payloads and avoid collisions in coordination, necessitating adherence to a set of relative-pose constraints, which may include field-of-view, line-of-sight, collision-avoidance, and range constraints. A nominal controller designed for reference tracking may not guarantee the marine MAS stays safe w.r.t. these constraints. To modify the nominal input as one that enforces safety, we introduce a framework to systematically encode the relative-pose constraints as nonsmooth control barrier functions (NCBFs) and combine them as a single NCBF using Boolean composition, which enables a simplified verification process compared to using the NCBFs individually. While other relative-pose constraint functions have explicit derivatives, the challenging line-of-sight constraint is encoded with the minimum distance function between the line-of-sight set and other agents, whose derivative is not explicit. Hence, existing safe control design methods that consider composite NCBFs cannot be applied. To address this challenge, we propose a novel quadratic program formulation based on the dual of the minimum distance problem and develop a new theory to ensure the resulting control input guarantees constraint satisfaction. Lastly, we validate the effectiveness of our proposed framework on a simulated large-scale marine MAS and a real-world marine MAS comprising one Unmanned Surface Vehicle and two Unmanned Underwater Vehicles. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 11 pages, 8 figures

arXiv:2403.13623 [pdf, other]

Fast delivery of heralded atom-photon quantum correlation over 12km fiber through multiplexing enhancement

Authors: Sheng Zhang, Jixuan Shi, Yibo Liang, Yuedong Sun, Yukai Wu, Luming Duan, Yunfei Pu

Abstract: Distributing quantum entanglement between distant parties is a significant but difficult task in quantum information science, as it can enable numerous applications but suffers from exponential decay in the quantum channel. Quantum repeater is one of the most promising approaches towards this goal. In a quantum repeater protocol, it is essential that the entanglement generation speed within each e… ▽ More Distributing quantum entanglement between distant parties is a significant but difficult task in quantum information science, as it can enable numerous applications but suffers from exponential decay in the quantum channel. Quantum repeater is one of the most promising approaches towards this goal. In a quantum repeater protocol, it is essential that the entanglement generation speed within each elementary link is faster than the memory decoherence rate, to enable the scale-up of the quantum repeater by connecting neighboring repeater segments. This stringent requirement has not been implemented over a fiber of metropolitan scale so far. As a step towards this challenging goal, in this work we experimentally realize multiplexing-enhanced generation of heralded atom-photon quantum correlation over a 12km fiber. We excite the memory modes in a multiplexed quantum memory successively to generate 280 pairs of atom-photon quantum correlations with a train of photonic time-bin pulses filling the long fiber. After successful detection of a heralding signal, the excited memory mode can be identified and retrieved into idler photons on demand with either fixed or variable storage time. With the multiplexing enhancement, the heralding rate of atom-photon correlation can reach 1.95kHz, and the ratio between the quantum correlation generation rate to memory decoherence rate can be improved to 0.46 for a fiber length of 12km, which is so far the best for long fiber length (>10km) to our knowledge. This work therefore constitutes an important step towards the realization of a large-scale quantum repeater network. △ Less

Submitted 21 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: 13 pages, 10 figures

arXiv:2403.11127 [pdf, other]

GRA: Detecting Oriented Objects through Group-wise Rotating and Attention

Authors: Jiangshan Wang, Yifan Pu, Yizeng Han, Jiayi Guo, Yiru Wang, Xiu Li, Gao Huang

Abstract: Oriented object detection, an emerging task in recent years, aims to identify and locate objects across varied orientations. This requires the detector to accurately capture the orientation information, which varies significantly within and across images. Despite the existing substantial efforts, simultaneously ensuring model effectiveness and parameter efficiency remains challenging in this scena… ▽ More Oriented object detection, an emerging task in recent years, aims to identify and locate objects across varied orientations. This requires the detector to accurately capture the orientation information, which varies significantly within and across images. Despite the existing substantial efforts, simultaneously ensuring model effectiveness and parameter efficiency remains challenging in this scenario. In this paper, we propose a lightweight yet effective Group-wise Rotating and Attention (GRA) module to replace the convolution operations in backbone networks for oriented object detection. GRA can adaptively capture fine-grained features of objects with diverse orientations, comprising two key components: Group-wise Rotating and Group-wise Attention. Group-wise Rotating first divides the convolution kernel into groups, where each group extracts different object features by rotating at a specific angle according to the object orientation. Subsequently, Group-wise Attention is employed to adaptively enhance the object-related regions in the feature. The collaborative effort of these components enables GRA to effectively capture the various orientation information while maintaining parameter efficiency. Extensive experimental results demonstrate the superiority of our method. For example, GRA achieves a new state-of-the-art (SOTA) on the DOTA-v2.0 benchmark, while saving the parameters by nearly 50% compared to the previous SOTA method. Code will be released. △ Less

Submitted 19 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

Comments: tech report

arXiv:2403.08357 [pdf]

Geometric and electronic properties of two kinds of CrO2 magnetic monolayers: D3d and D2h phases

Authors: Yang Zhang, Xianggong Bo, Jimeng Jing, Lixia Wang, Shiqian Qiao, Hong Wu, Yong Pu, Feng Li

Abstract: Due to the high magnetic coupling strength between the Cr elements, the bulk phase CrO2 is one of several ferromagnetic oxides known to have the highest Curie temperature. When the dimensionality of the material is reduced from 3D to 2D, the 2D CrO2 system material is expected to maintain a high Curie temperature. In this work, we predict two new phases of CrO2 monolayer (D3d and D2h) by using fir… ▽ More Due to the high magnetic coupling strength between the Cr elements, the bulk phase CrO2 is one of several ferromagnetic oxides known to have the highest Curie temperature. When the dimensionality of the material is reduced from 3D to 2D, the 2D CrO2 system material is expected to maintain a high Curie temperature. In this work, we predict two new phases of CrO2 monolayer (D3d and D2h) by using first-principles calculations. We have found that the Curie temperature of 2D CrO2 is much lower than that of its bulk phase, but still remains as high as 191K, which is comparable to that of Fe2Cr2Ge6. In addition, 1L D3d-CrO2 is in the ferromagnetic state, while 1L D2h-CrO2 is in the antiferromagnetic state. Also, the different geometric structure affects its electrical properties: the 1L D3d-CrO2 is a half-metal while 1L D2h-CrO2 is a semiconductor. Our studies have shown that there is a wealth of electrical and magnetic properties in CrO2. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 5 pages,4 figures

arXiv:2403.07153 [pdf, other]

2023 Low-Power Computer Vision Challenge (LPCVC) Summary

Authors: Leo Chen, Benjamin Boardley, Ping Hu, Yiru Wang, Yifan Pu, Xin Jin, Yongqiang Yao, Ruihao Gong, Bo Li, Gao Huang, Xianglong Liu, Zifu Wan, Xinwang Chen, Ning Liu, Ziyi Zhang, Dongping Liu, Ruijie Shan, Zhengping Che, Fachao Zhang, Xiaofeng Mou, Jian Tang, Maxim Chuprov, Ivan Malofeev, Alexander Goncharenko, Andrey Shcherbin , et al. (5 additional authors not shown)

Abstract: This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accu… ▽ More This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accuracy with short execution time when their CV solutions run on an embedded device, such as Raspberry PI or Nvidia Jetson Nano. The vision problem for 2023 LPCVC is segmentation of images acquired by Unmanned Aerial Vehicles (UAVs, also called drones) after disasters. The 2023 LPCVC attracted 60 international teams that submitted 676 solutions during the submission window of one month. This article explains the setup of the competition and highlights the winners' methods that improve accuracy and shorten execution time. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: LPCVC 2023, website: https://lpcv.ai/

arXiv:2402.12326 [pdf, other]

LLM Agents for Psychology: A Study on Gamified Assessments

Authors: Qisen Yang, Zekun Wang, Honghui Chen, Shenzhi Wang, Yifan Pu, Xin Gao, Wenhao Huang, Shiji Song, Gao Huang

Abstract: Psychological measurement is essential for mental health, self-understanding, and personal development. Traditional methods, such as self-report scales and psychologist interviews, often face challenges with engagement and accessibility. While game-based and LLM-based tools have been explored to improve user interest and automate assessment, they struggle to balance engagement with generalizabilit… ▽ More Psychological measurement is essential for mental health, self-understanding, and personal development. Traditional methods, such as self-report scales and psychologist interviews, often face challenges with engagement and accessibility. While game-based and LLM-based tools have been explored to improve user interest and automate assessment, they struggle to balance engagement with generalizability. In this work, we propose PsychoGAT (Psychological Game AgenTs) to achieve a generic gamification of psychological assessment. The main insight is that powerful LLMs can function both as adept psychologists and innovative game designers. By incorporating LLM agents into designated roles and carefully managing their interactions, PsychoGAT can transform any standardized scales into personalized and engaging interactive fiction games. To validate the proposed method, we conduct psychometric evaluations to assess its effectiveness and employ human evaluators to examine the generated content across various psychological constructs, including depression, cognitive distortions, and personality traits. Results demonstrate that PsychoGAT serves as an effective assessment tool, achieving statistically significant excellence in psychometric metrics such as reliability, convergent validity, and discriminant validity. Moreover, human evaluations confirm PsychoGAT's enhancements in content coherence, interactivity, interest, immersion, and satisfaction. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.03741 [pdf, other]

SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems

Authors: Oubo Ma, Yuwen Pu, Linkang Du, Yang Dai, Ruo Wang, Xiaolei Liu, Yingcai Wu, Shouling Ji

Abstract: Recent advancements in multi-agent reinforcement learning (MARL) have opened up vast application prospects, such as swarm control of drones, collaborative manipulation by robotic arms, and multi-target encirclement. However, potential security threats during the MARL deployment need more attention and thorough investigation. Recent research reveals that attackers can rapidly exploit the victim's v… ▽ More Recent advancements in multi-agent reinforcement learning (MARL) have opened up vast application prospects, such as swarm control of drones, collaborative manipulation by robotic arms, and multi-target encirclement. However, potential security threats during the MARL deployment need more attention and thorough investigation. Recent research reveals that attackers can rapidly exploit the victim's vulnerabilities, generating adversarial policies that result in the failure of specific tasks. For instance, reducing the winning rate of a superhuman-level Go AI to around 20%. Existing studies predominantly focus on two-player competitive environments, assuming attackers possess complete global state observation. In this study, we unveil, for the first time, the capability of attackers to generate adversarial policies even when restricted to partial observations of the victims in multi-agent competitive environments. Specifically, we propose a novel black-box attack (SUB-PLAY) that incorporates the concept of constructing multiple subgames to mitigate the impact of partial observability and suggests sharing transitions among subpolicies to improve attackers' exploitative ability. Extensive evaluations demonstrate the effectiveness of SUB-PLAY under three typical partial observability limitations. Visualization results indicate that adversarial policies induce significantly different activations of the victims' policy networks. Furthermore, we evaluate three potential defenses aimed at exploring ways to mitigate security threats posed by adversarial policies, providing constructive recommendations for deploying MARL in competitive environments. △ Less

Submitted 26 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: To appear in the ACM Conference on Computer and Communications Security (CCS'24), October 14-18, 2024, Salt Lake City, UT, USA

arXiv:2401.14027 [pdf, other]

The Risk of Federated Learning to Skew Fine-Tuning Features and Underperform Out-of-Distribution Robustness

Authors: Mengyao Du, Miao Zhang, Yuwen Pu, Kai Xu, Shouling Ji, Quanjun Yin

Abstract: To tackle the scarcity and privacy issues associated with domain-specific datasets, the integration of federated learning in conjunction with fine-tuning has emerged as a practical solution. However, our findings reveal that federated learning has the risk of skewing fine-tuning features and compromising the out-of-distribution robustness of the model. By introducing three robustness indicators an… ▽ More To tackle the scarcity and privacy issues associated with domain-specific datasets, the integration of federated learning in conjunction with fine-tuning has emerged as a practical solution. However, our findings reveal that federated learning has the risk of skewing fine-tuning features and compromising the out-of-distribution robustness of the model. By introducing three robustness indicators and conducting experiments across diverse robust datasets, we elucidate these phenomena by scrutinizing the diversity, transferability, and deviation within the model feature space. To mitigate the negative impact of federated learning on model robustness, we introduce GNP, a \underline{G}eneral \underline{N}oisy \underline{P}rojection-based robust algorithm, ensuring no deterioration of accuracy on the target distribution. Specifically, the key strategy for enhancing model robustness entails the transfer of robustness from the pre-trained model to the fine-tuned model, coupled with adding a small amount of Gaussian noise to augment the representative capacity of the model. Comprehensive experimental results demonstrate that our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods and confronting different levels of data heterogeneity. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 12 pages, 10 figures

arXiv:2401.11942 [pdf]

Single-Photon-Assisted Two-Photon Polymerization

Authors: Buse Unlu, Maria Isabel Álvarez-Castaño, Antoine Boniface, Ye Pu, Christophe Moser

Abstract: Light-based additive manufacturing (AM) has revolutionized the fabrication of complex three-dimensional (3D) objects offering a cost-effective and high-speed alternative to traditional machining. One-photon polymerization is a key process in this advancement, standing out for rapid printing time, albeit with limited resolution. Two-photon polymerization (2PP) empowers AM with unprecedented resolut… ▽ More Light-based additive manufacturing (AM) has revolutionized the fabrication of complex three-dimensional (3D) objects offering a cost-effective and high-speed alternative to traditional machining. One-photon polymerization is a key process in this advancement, standing out for rapid printing time, albeit with limited resolution. Two-photon polymerization (2PP) empowers AM with unprecedented resolution but is accompanied by a tradeoff of prolonged printing times. We propose combining the single-photon absorption (1PA) and 2PP to benefit from the dual capabilities, allowing for faster printing while maintaining high resolution and improved depth sectioning, respectively. In this study, we employ a blue light source to pre-excite a photocurable resin by 1PA followed by a precisely focused femtosecond (fs) beam to provide the missing energy necessary to reach the polymerization threshold to solidify the resin through two-photon absorption. First, we investigate the impact of pre-sensitization by blue light illumination on 2PP and demonstrate one order of magnitude faster printing time for a voxel size of 150 nm as compared to the same voxel size printed by 2PP only. Then, we build a custom 2PP printer utilizing blue light sensitization in a light-sheet mode and demonstrate successful 3D prints. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 18 pages, 11 figures

arXiv:2312.14677 [pdf, other]

MEAOD: Model Extraction Attack against Object Detectors

Authors: Zeyu Li, Chenghui Shi, Yuwen Pu, Xuhong Zhang, Yu Li, Jinbao Li, Shouling Ji

Abstract: The widespread use of deep learning technology across various industries has made deep neural network models highly valuable and, as a result, attractive targets for potential attackers. Model extraction attacks, particularly query-based model extraction attacks, allow attackers to replicate a substitute model with comparable functionality to the victim model and present a significant threat to th… ▽ More The widespread use of deep learning technology across various industries has made deep neural network models highly valuable and, as a result, attractive targets for potential attackers. Model extraction attacks, particularly query-based model extraction attacks, allow attackers to replicate a substitute model with comparable functionality to the victim model and present a significant threat to the confidentiality and security of MLaaS platforms. While many studies have explored threats of model extraction attacks against classification models in recent years, object detection models, which are more frequently used in real-world scenarios, have received less attention. In this paper, we investigate the challenges and feasibility of query-based model extraction attacks against object detection models and propose an effective attack method called MEAOD. It selects samples from the attacker-possessed dataset to construct an efficient query dataset using active learning and enhances the categories with insufficient objects. We additionally improve the extraction effectiveness by updating the annotations of the query dataset. According to our gray-box and black-box scenarios experiments, we achieve an extraction performance of over 70% under the given condition of a 10k query budget. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.10072 [pdf, other]

Assessing the Usability of GutGPT: A Simulation Study of an AI Clinical Decision Support System for Gastrointestinal Bleeding Risk

Authors: Colleen Chan, Kisung You, Sunny Chung, Mauro Giuffrè, Theo Saarinen, Niroop Rajashekar, Yuan Pu, Yeo Eun Shin, Loren Laine, Ambrose Wong, René Kizilcec, Jasjeet Sekhon, Dennis Shung

Abstract: Applications of large language models (LLMs) like ChatGPT have potential to enhance clinical decision support through conversational interfaces. However, challenges of human-algorithmic interaction and clinician trust are poorly understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction and management guidance, was deployed in clinical simulation scenarios alongside the electroni… ▽ More Applications of large language models (LLMs) like ChatGPT have potential to enhance clinical decision support through conversational interfaces. However, challenges of human-algorithmic interaction and clinician trust are poorly understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction and management guidance, was deployed in clinical simulation scenarios alongside the electronic health record (EHR) with emergency medicine physicians, internal medicine physicians, and medical students to evaluate its effect on physician acceptance and trust in AI clinical decision support systems (AI-CDSS). GutGPT provides risk predictions from a validated machine learning model and evidence-based answers by querying extracted clinical guidelines. Participants were randomized to GutGPT and an interactive dashboard, or the interactive dashboard and a search engine. Surveys and educational assessments taken before and after measured technology acceptance and content mastery. Preliminary results showed mixed effects on acceptance after using GutGPT compared to the dashboard or search engine but appeared to improve content mastery based on simulation performance. Overall, this study demonstrates LLMs like GutGPT could enhance effective AI-CDSS if implemented optimally and paired with interactive interfaces. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10, 2023, New Orleans, United States, 11 pages

arXiv:2312.09708 [pdf, other]

GraphRARE: Reinforcement Learning Enhanced Graph Neural Network with Relative Entropy

Authors: Tianhao Peng, Wenjun Wu, Haitao Yuan, Zhifeng Bao, Zhao Pengrui, Xin Yu, Xuetao Lin, Yu Liang, Yanjun Pu

Abstract: Graph neural networks (GNNs) have shown advantages in graph-based analysis tasks. However, most existing methods have the homogeneity assumption and show poor performance on heterophilic graphs, where the linked nodes have dissimilar features and different class labels, and the semantically related nodes might be multi-hop away. To address this limitation, this paper presents GraphRARE, a general… ▽ More Graph neural networks (GNNs) have shown advantages in graph-based analysis tasks. However, most existing methods have the homogeneity assumption and show poor performance on heterophilic graphs, where the linked nodes have dissimilar features and different class labels, and the semantically related nodes might be multi-hop away. To address this limitation, this paper presents GraphRARE, a general framework built upon node relative entropy and deep reinforcement learning, to strengthen the expressive capability of GNNs. An innovative node relative entropy, which considers node features and structural similarity, is used to measure mutual information between node pairs. In addition, to avoid the sub-optimal solutions caused by mixing useful information and noises of remote nodes, a deep reinforcement learning-based algorithm is developed to optimize the graph topology. This algorithm selects informative nodes and discards noisy nodes based on the defined node relative entropy. Extensive experiments are conducted on seven real-world datasets. The experimental results demonstrate the superiority of GraphRARE in node classification and its capability to optimize the original graph topology. △ Less

Submitted 13 April, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: 14 pages, 7 figures

arXiv:2312.06408 [pdf, other]

DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics

Authors: Zhiao Huang, Feng Chen, Yewen Pu, Chunru Lin, Hao Su, Chuang Gan

Abstract: Combining gradient-based trajectory optimization with differentiable physics simulation is an efficient technique for solving soft-body manipulation problems. Using a well-crafted optimization objective, the solver can quickly converge onto a valid trajectory. However, writing the appropriate objective functions requires expert knowledge, making it difficult to collect a large set of naturalistic… ▽ More Combining gradient-based trajectory optimization with differentiable physics simulation is an efficient technique for solving soft-body manipulation problems. Using a well-crafted optimization objective, the solver can quickly converge onto a valid trajectory. However, writing the appropriate objective functions requires expert knowledge, making it difficult to collect a large set of naturalistic problems from non-expert users. We introduce DiffVL, a method that enables non-expert users to communicate soft-body manipulation tasks -- a combination of vision and natural language, given in multiple stages -- that can be readily leveraged by a differential physics solver. We have developed GUI tools that enable non-expert users to specify 100 tasks inspired by real-life soft-body manipulations from online videos, which we'll make public. We leverage large language models to translate task descriptions into machine-interpretable optimization objectives. The optimization objectives can help differentiable physics solvers to solve these long-horizon multistage tasks that are challenging for previous baselines. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.06226 [pdf, other]

Invariant Representation via Decoupling Style and Spurious Features from Images

Authors: Ruimeng Li, Yuanhao Pu, Zhaoyi Li, Hong Xie, Defu Lian

Abstract: This paper considers the out-of-distribution (OOD) generalization problem under the setting that both style distribution shift and spurious features exist and domain labels are missing. This setting frequently arises in real-world applications and is underlooked because previous approaches mainly handle either of these two factors. The critical challenge is decoupling style and spurious features i… ▽ More This paper considers the out-of-distribution (OOD) generalization problem under the setting that both style distribution shift and spurious features exist and domain labels are missing. This setting frequently arises in real-world applications and is underlooked because previous approaches mainly handle either of these two factors. The critical challenge is decoupling style and spurious features in the absence of domain labels. To address this challenge, we first propose a structural causal model (SCM) for the image generation process, which captures both style distribution shift and spurious features. The proposed SCM enables us to design a new framework called IRSS, which can gradually separate style distribution and spurious features from images by introducing adversarial neural networks and multi-environment optimization, thus achieving OOD generalization. Moreover, it does not require additional supervision (e.g., domain labels) other than the images and their corresponding labels. Experiments on benchmark datasets demonstrate that IRSS outperforms traditional OOD methods and solves the problem of Invariant risk minimization (IRM) degradation, enabling the extraction of invariant features under distribution shift. △ Less

Submitted 1 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 10 pages, 12 figures

ACM Class: I.2.6; I.2.10

arXiv:2312.04410 [pdf, other]

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Authors: Jiayi Guo, Xingqian Xu, Yifan Pu, Zanlin Ni, Chaofei Wang, Manushree Vasu, Shiji Song, Gao Huang, Humphrey Shi

Abstract: Recently, diffusion models have made remarkable progress in text-to-image (T2I) generation, synthesizing images with high fidelity and diverse contents. Despite this advancement, latent space smoothness within diffusion models remains largely unexplored. Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image. This property proves benef… ▽ More Recently, diffusion models have made remarkable progress in text-to-image (T2I) generation, synthesizing images with high fidelity and diverse contents. Despite this advancement, latent space smoothness within diffusion models remains largely unexplored. Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image. This property proves beneficial in downstream tasks, including image interpolation, inversion, and editing. In this work, we expose the non-smoothness of diffusion latent spaces by observing noticeable visual fluctuations resulting from minor latent variations. To tackle this issue, we propose Smooth Diffusion, a new category of diffusion models that can be simultaneously high-performing and smooth. Specifically, we introduce Step-wise Variation Regularization to enforce the proportion between the variations of an arbitrary input latent and that of the output image is a constant at any diffusion training step. In addition, we devise an interpolation standard deviation (ISTD) metric to effectively assess the latent space smoothness of a diffusion model. Extensive quantitative and qualitative experiments demonstrate that Smooth Diffusion stands out as a more desirable solution not only in T2I generation but also across various downstream tasks. Smooth Diffusion is implemented as a plug-and-play Smooth-LoRA to work with various community models. Code is available at https://github.com/SHI-Labs/Smooth-Diffusion. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: GitHub: https://github.com/SHI-Labs/Smooth-Diffusion

arXiv:2311.17400 [pdf, other]

doi 10.14722/ndss.2024.24115

Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention

Authors: Lujia Shen, Yuwen Pu, Shouling Ji, Changjiang Li, Xuhong Zhang, Chunpeng Ge, Ting Wang

Abstract: Transformer-based models, such as BERT and GPT, have been widely adopted in natural language processing (NLP) due to their exceptional performance. However, recent studies show their vulnerability to textual adversarial attacks where the model's output can be misled by intentionally manipulating the text inputs. Despite various methods that have been proposed to enhance the model's robustness and… ▽ More Transformer-based models, such as BERT and GPT, have been widely adopted in natural language processing (NLP) due to their exceptional performance. However, recent studies show their vulnerability to textual adversarial attacks where the model's output can be misled by intentionally manipulating the text inputs. Despite various methods that have been proposed to enhance the model's robustness and mitigate this vulnerability, many require heavy consumption resources (e.g., adversarial training) or only provide limited protection (e.g., defensive dropout). In this paper, we propose a novel method called dynamic attention, tailored for the transformer architecture, to enhance the inherent robustness of the model itself against various adversarial attacks. Our method requires no downstream task knowledge and does not incur additional costs. The proposed dynamic attention consists of two modules: (I) attention rectification, which masks or weakens the attention value of the chosen tokens, and (ii) dynamic modeling, which dynamically builds the set of candidate tokens. Extensive experiments demonstrate that dynamic attention significantly mitigates the impact of adversarial attacks, improving up to 33\% better performance than previous methods against widely-used adversarial attacks. The model-level design of dynamic attention enables it to be easily combined with other defense methods (e.g., adversarial training) to further enhance the model's robustness. Furthermore, we demonstrate that dynamic attention preserves the state-of-the-art robustness space of the original model compared to other dynamic modeling methods. △ Less

Submitted 29 November, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.13455 [pdf, other]

Generation of Explanations for Logic Reasoning

Authors: Yanyi Pu

Abstract: This thesis delves into a fortiori arguments in deductive reasoning, underscoring their relevance in various domains such as law, philosophy, and artificial intelligence. The research is centred on employing GPT-3.5-turbo to automate the analysis of these arguments, with a focus on understanding intricate reasoning processes, generating clear and coherent explanations, and creating novel arguments… ▽ More This thesis delves into a fortiori arguments in deductive reasoning, underscoring their relevance in various domains such as law, philosophy, and artificial intelligence. The research is centred on employing GPT-3.5-turbo to automate the analysis of these arguments, with a focus on understanding intricate reasoning processes, generating clear and coherent explanations, and creating novel arguments. The methodology encompasses a series of tasks including detailed reasoning, interpretation, and the augmentation of a fortiori arguments. It involves meticulously identifying these arguments in diverse contexts, differentiating comparative elements, and categorizing them based on their logical structure. Extensive experiments reveals the challenges encountered by GPT-3.5-turbo in accurately detecting and classifying a fortiori arguments. Nevertheless, the model demonstrates a performance that rivals specialized models, particularly in extracting key components and interpreting underlying properties. The integration of external information into the model's processing significantly elevates the quality of the generated explanations. Additionally, the model exhibits a noteworthy capability in augmenting arguments, thus contributing to the enrichment of the data set. Despite facing certain limitations, this thesis makes significant contributions to the fields of artificial intelligence and logical reasoning. It introduces novel methodologies, establishes a rigorous evaluation framework, and provides deep insights that set the stage for future advancements in automated logical reasoning. The findings and methodologies presented herein not only underscore the potential of AI in complex reasoning tasks but also highlight areas for future research and development. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: 78 Pages, 16 Figures, Thesis Presentation is available at https://drive.google.com/file/d/1wLIBsjfLvO11PjCS6qx4Y9UgRBUfq3wQ/view?usp=sharing

arXiv:2311.12255 [pdf, other]

Exploring Time Granularity on Temporal Graphs for Dynamic Link Prediction in Real-world Networks

Authors: Xiangjian Jiang, Yanyi Pu

Abstract: Dynamic Graph Neural Networks (DGNNs) have emerged as the predominant approach for processing dynamic graph-structured data. However, the influence of temporal information on model performance and robustness remains insufficiently explored, particularly regarding how models address prediction tasks with different time granularities. In this paper, we explore the impact of time granularity when tra… ▽ More Dynamic Graph Neural Networks (DGNNs) have emerged as the predominant approach for processing dynamic graph-structured data. However, the influence of temporal information on model performance and robustness remains insufficiently explored, particularly regarding how models address prediction tasks with different time granularities. In this paper, we explore the impact of time granularity when training DGNNs on dynamic graphs through extensive experiments. We examine graphs derived from various domains and compare three different DGNNs to the baseline model across four varied time granularities. We mainly consider the interplay between time granularities, model architectures, and negative sampling strategies to obtain general conclusions. Our results reveal that a sophisticated memory mechanism and proper time granularity are crucial for a DGNN to deliver competitive and robust performance in the dynamic link prediction task. We also discuss drawbacks in considered models and datasets and propose promising directions for future research on the time granularity of temporal graphs. △ Less

Submitted 22 November, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: Presented at the Temporal Graph Learning Workshop @ NeurIPS 2023

arXiv:2311.10292 [pdf, other]

doi 10.1103/PhysRevX.14.021018

Realization of a programmable multi-purpose photonic quantum memory with over-thousand qubit manipulations

Authors: Sheng Zhang, Jixuan Shi, Zhaibin Cui, Ye Wang, Yukai Wu, Luming Duan, Yunfei Pu

Abstract: Quantum networks can enable various applications such as distributed quantum computing, long-distance quantum communication, and network-based quantum sensing with unprecedented performances. One of the most important building blocks for a quantum network is a photonic quantum memory which serves as the interface between the communication channel and the local functional unit. A programmable quant… ▽ More Quantum networks can enable various applications such as distributed quantum computing, long-distance quantum communication, and network-based quantum sensing with unprecedented performances. One of the most important building blocks for a quantum network is a photonic quantum memory which serves as the interface between the communication channel and the local functional unit. A programmable quantum memory which can process a large stream of flying qubits and fulfill the requirements of multiple core functions in a quantum network is still to-be-realized. Here we report a high-performance quantum memory which can simultaneously store 72 optical qubits carried by 144 spatially separated atomic ensembles and support up to a thousand consecutive write or read operations in a random access way, two orders of magnitude larger than the previous record. Due to the built-in programmability, this quantum memory can be adapted on-demand for several functions. As example applications, we realize quantum queue, stack, and buffer which closely resemble the counterpart devices for classical information processing. We further demonstrate the synchronization and reshuffle of 4 entangled pairs of photonic pulses with probabilistic arrival time and arbitrary release order via the memory, which is an essential requirement for the realization of quantum repeaters and efficient routing in quantum networks. Realization of this multi-purpose programmable quantum memory thus constitutes a key enabling building block for future large-scale fully-functional quantum networks. △ Less

Submitted 29 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: 17 pages, 19 figures

Journal ref: Phys. Rev. X 14, 021018 (2024)

arXiv:2311.05740 [pdf, other]

Generating Pragmatic Examples to Train Neural Program Synthesizers

Authors: Saujas Vaduguru, Daniel Fried, Yewen Pu

Abstract: Programming-by-example is the task of synthesizing a program that is consistent with a set of user-provided input-output examples. As examples are often an under-specification of one's intent, a good synthesizer must choose the intended program from the many that are consistent with the given set of examples. Prior work frames program synthesis as a cooperative game between a listener (that synthe… ▽ More Programming-by-example is the task of synthesizing a program that is consistent with a set of user-provided input-output examples. As examples are often an under-specification of one's intent, a good synthesizer must choose the intended program from the many that are consistent with the given set of examples. Prior work frames program synthesis as a cooperative game between a listener (that synthesizes programs) and a speaker (a user choosing examples), and shows that models of computational pragmatic inference are effective in choosing the user intended programs. However, these models require counterfactual reasoning over a large set of programs and examples, which is infeasible in realistic program spaces. In this paper, we propose a novel way to amortize this search with neural networks. We sample pairs of programs and examples via self-play between listener and speaker models, and use pragmatic inference to choose informative training examples from this sample.We then use the informative dataset to train models to improve the synthesizer's ability to disambiguate user-provided examples without human supervision. We validate our method on the challenging task of synthesizing regular expressions from example strings, and find that our method (1) outperforms models trained without choosing pragmatic examples by 23% (a 51% relative increase) (2) matches the performance of supervised learning on a dataset of pragmatic examples provided by humans, despite using no human data in training. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.17815 [pdf, other]

Stability of Inverse Problems for Steady Supersonic Flows Past Lipschitz Perturbed Cones

Authors: Gui-Qiang G. Chen, Yun Pu, Yongqian Zhang

Abstract: We are concerned with inverse problems for supersonic potential flows past infinite axisymmetric Lipschitz cones. The supersonic flows under consideration are governed by the steady isentropic Euler equations for axisymmetric potential flows, which involve a singular geometric source term. We first study the inverse problem for the stability of an oblique conical shock as an initial-boundary value… ▽ More We are concerned with inverse problems for supersonic potential flows past infinite axisymmetric Lipschitz cones. The supersonic flows under consideration are governed by the steady isentropic Euler equations for axisymmetric potential flows, which involve a singular geometric source term. We first study the inverse problem for the stability of an oblique conical shock as an initial-boundary value problem with both the generating curve of the cone surface and the leading conical shock front as free boundaries. We then establish the existence and asymptotic behavior of global entropy solutions with bounded BV norm of this problem, when the Mach number of the incoming flow is sufficiently large and the total variation of the pressure distribution on the cone is sufficiently small. To this end, we first develop a modified Glimm-type scheme to construct approximate solutions by self-similar solutions as building blocks to balance the influence of the geometric source term. Then we define a Glimm-type functional, based on the local interaction estimates between weak waves, the strong leading conical shock, and self-similar solutions, along with the construction of the approximate generating curves of the cone surface. Next, when the Mach number of the incoming flow is sufficiently large, by asymptotic analysis of the reflection coefficients in those interaction estimates, we prove that appropriate weights can be chosen so that the corresponding Glimm-type functional decreases in the flow direction. Finally, we determine the generating curves of the cone surface and establish the existence of global entropy solutions containing a strong leading conical shock, besides weak waves. Moreover, the entropy solution is proved to approach asymptotically the self-similar solution determined by the incoming flow and the asymptotic pressure on the cone surface at infinity. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 41 pages, 5 figures. arXiv admin note: text overlap with arXiv:2008.02409

MSC Class: 35B07; 35B20; 35D30; 35L65; 35L67; 76J20; 76L05; 76N10

arXiv:2310.15590 [pdf, other]

Facial Data Minimization: Shallow Model as Your Privacy Filter

Authors: Yuwen Pu, Jiahao Chen, Jiayu Pan, Hao li, Diqun Yan, Xuhong Zhang, Shouling Ji

Abstract: Face recognition service has been used in many fields and brings much convenience to people. However, once the user's facial data is transmitted to a service provider, the user will lose control of his/her private data. In recent years, there exist various security and privacy issues due to the leakage of facial data. Although many privacy-preserving methods have been proposed, they usually fail w… ▽ More Face recognition service has been used in many fields and brings much convenience to people. However, once the user's facial data is transmitted to a service provider, the user will lose control of his/her private data. In recent years, there exist various security and privacy issues due to the leakage of facial data. Although many privacy-preserving methods have been proposed, they usually fail when they are not accessible to adversaries' strategies or auxiliary data. Hence, in this paper, by fully considering two cases of uploading facial images and facial features, which are very typical in face recognition service systems, we proposed a data privacy minimization transformation (PMT) method. This method can process the original facial data based on the shallow model of authorized services to obtain the obfuscated data. The obfuscated data can not only maintain satisfactory performance on authorized models and restrict the performance on other unauthorized models but also prevent original privacy data from leaking by AI methods and human visual theft. Additionally, since a service provider may execute preprocessing operations on the received data, we also propose an enhanced perturbation method to improve the robustness of PMT. Besides, to authorize one facial image to multiple service models simultaneously, a multiple restriction mechanism is proposed to improve the scalability of PMT. Finally, we conduct extensive experiments and evaluate the effectiveness of the proposed PMT in defending against face reconstruction, data abuse, and face attribute estimation attacks. These experimental results demonstrate that PMT performs well in preventing facial data abuse and privacy leakage while maintaining face recognition accuracy. △ Less

Submitted 12 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: 14 pages, 11 figures

arXiv:2310.11881 [pdf, other]

A Comparative Study of Image Restoration Networks for General Backbone Network Design

Authors: Xiangyu Chen, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, Chao Dong

Abstract: Despite the significant progress made by deep models in various image restoration tasks, existing image restoration networks still face challenges in terms of task generality. An intuitive manifestation is that networks which excel in certain tasks often fail to deliver satisfactory results in others. To illustrate this point, we select five representative networks and conduct a comparative study… ▽ More Despite the significant progress made by deep models in various image restoration tasks, existing image restoration networks still face challenges in terms of task generality. An intuitive manifestation is that networks which excel in certain tasks often fail to deliver satisfactory results in others. To illustrate this point, we select five representative networks and conduct a comparative study on five classic image restoration tasks. First, we provide a detailed explanation of the characteristics of different image restoration tasks and backbone networks. Following this, we present the benchmark results and analyze the reasons behind the performance disparity of different models across various tasks. Drawing from this comparative study, we propose that a general image restoration backbone network needs to meet the functional requirements of diverse tasks. Based on this principle, we design a new general image restoration backbone network, X-Restormer. Extensive experiments demonstrate that X-Restormer possesses good task generality and achieves state-of-the-art performance across a variety of tasks. △ Less

Submitted 16 July, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: Accepted to ECCV2024

arXiv:2310.11614 [pdf, other]

Learning a Hierarchical Planner from Humans in Multiple Generations

Authors: Leonardo Hernandez Cano, Yewen Pu, Robert D. Hawkins, Josh Tenenbaum, Armando Solar-Lezama

Abstract: A typical way in which a machine acquires knowledge from humans is by programming. Compared to learning from demonstrations or experiences, programmatic learning allows the machine to acquire a novel skill as soon as the program is written, and, by building a library of programs, a machine can quickly learn how to perform complex tasks. However, as programs often take their execution contexts for… ▽ More A typical way in which a machine acquires knowledge from humans is by programming. Compared to learning from demonstrations or experiences, programmatic learning allows the machine to acquire a novel skill as soon as the program is written, and, by building a library of programs, a machine can quickly learn how to perform complex tasks. However, as programs often take their execution contexts for granted, they are brittle when the contexts change, making it difficult to adapt complex programs to new contexts. We present natural programming, a library learning system that combines programmatic learning with a hierarchical planner. Natural programming maintains a library of decompositions, consisting of a goal, a linguistic description of how this goal decompose into sub-goals, and a concrete instance of its decomposition into sub-goals. A user teaches the system via curriculum building, by identifying a challenging yet not impossible goal along with linguistic hints on how this goal may be decomposed into sub-goals. The system solves for the goal via hierarchical planning, using the linguistic hints to guide its probability distribution in proposing the right plans. The system learns from this interaction by adding newly found decompositions in the successful search into its library. Simulated studies and a human experiment (n=360) on a controlled environment demonstrate that natural programming can robustly compose programs learned from different users and contexts, adapting faster and solving more complex tasks when compared to programmatic baselines. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: First two authors contributed equally

arXiv:2310.08854 [pdf, other]

Rank-DETR for High Quality Object Detection

Authors: Yifan Pu, Weicong Liang, Yiduo Hao, Yuhui Yuan, Yukang Yang, Chao Zhang, Han Hu, Gao Huang

Abstract: Modern detection transformers (DETRs) use a set of object queries to predict a list of bounding boxes, sort them by their classification confidence scores, and select the top-ranked predictions as the final detection results for the given input image. A highly performant object detector requires accurate ranking for the bounding box predictions. For DETR-based detectors, the top-ranked bounding bo… ▽ More Modern detection transformers (DETRs) use a set of object queries to predict a list of bounding boxes, sort them by their classification confidence scores, and select the top-ranked predictions as the final detection results for the given input image. A highly performant object detector requires accurate ranking for the bounding box predictions. For DETR-based detectors, the top-ranked bounding boxes suffer from less accurate localization quality due to the misalignment between classification scores and localization accuracy, thus impeding the construction of high-quality detectors. In this work, we introduce a simple and highly performant DETR-based object detector by proposing a series of rank-oriented designs, combinedly called Rank-DETR. Our key contributions include: (i) a rank-oriented architecture design that can prompt positive predictions and suppress the negative ones to ensure lower false positive rates, as well as (ii) a rank-oriented loss function and matching cost design that prioritizes predictions of more accurate localization accuracy during ranking to boost the AP under high IoU thresholds. We apply our method to improve the recent SOTA methods (e.g., H-DETR and DINO-DETR) and report strong COCO object detection results when using different backbones such as ResNet-$50$, Swin-T, and Swin-L, demonstrating the effectiveness of our approach. Code is available at \url{https://github.com/LeapLabTHU/Rank-DETR}. △ Less

Submitted 2 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023

arXiv:2310.08348 [pdf, other]

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

Authors: Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu

Abstract: Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari. However, it has been deemed challenging or even infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse real-world applications, especially when these environments involve complex action spaces and signific… ▽ More Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari. However, it has been deemed challenging or even infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse real-world applications, especially when these environments involve complex action spaces and significant simulation costs, or inherent stochasticity. In this work, we introduce LightZero, the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios. Specificially, we summarize the most critical challenges in designing a general MCTS-style decision-making solver, then decompose the tightly-coupled algorithm and system design of tree-search RL methods into distinct sub-modules. By incorporating more appropriate exploration and optimization strategies, we can significantly enhance these sub-modules and construct powerful LightZero agents to tackle tasks across a wide range of domains, such as board games, Atari, MuJoCo, MiniGrid and GoBigger. Detailed benchmark results reveal the significant potential of such methods in building scalable and efficient decision intelligence. The code is available as part of OpenDILab at https://github.com/opendilab/LightZero. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023 Spotlight

arXiv:2309.16077 [pdf, other]

Task-Oriented Koopman-Based Control with Contrastive Encoder

Authors: Xubo Lyu, Hanyang Hu, Seth Siriya, Ye Pu, Mo Chen

Abstract: We present task-oriented Koopman-based control that utilizes end-to-end reinforcement learning and contrastive encoder to simultaneously learn the Koopman latent embedding, operator, and associated linear controller within an iterative loop. By prioritizing the task cost as the main objective for controller learning, we reduce the reliance of controller design on a well-identified model, which, fo… ▽ More We present task-oriented Koopman-based control that utilizes end-to-end reinforcement learning and contrastive encoder to simultaneously learn the Koopman latent embedding, operator, and associated linear controller within an iterative loop. By prioritizing the task cost as the main objective for controller learning, we reduce the reliance of controller design on a well-identified model, which, for the first time to the best of our knowledge, extends Koopman control from low to high-dimensional, complex nonlinear systems, including pixel-based tasks and a real robot with lidar observations. Code and videos are available \href{https://sites.google.com/view/kpmlilatsupp/}{here}. △ Less

Submitted 1 November, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: Accepted by the 7th Annual Conference on Robot Learning (CoRL), 2023 (oral spotlight)

arXiv:2309.05660 [pdf, other]

Hypothesis Search: Inductive Reasoning with Language Models

Authors: Ruocheng Wang, Eric Zelikman, Gabriel Poesia, Yewen Pu, Nick Haber, Noah D. Goodman

Abstract: Inductive reasoning is a core problem-solving capacity: humans can identify underlying principles from a few examples, which robustly generalize to novel scenarios. Recent work evaluates large language models (LLMs) on inductive reasoning tasks by directly prompting them yielding "in context learning." This works well for straightforward inductive tasks but performs poorly on complex tasks such as… ▽ More Inductive reasoning is a core problem-solving capacity: humans can identify underlying principles from a few examples, which robustly generalize to novel scenarios. Recent work evaluates large language models (LLMs) on inductive reasoning tasks by directly prompting them yielding "in context learning." This works well for straightforward inductive tasks but performs poorly on complex tasks such as the Abstraction and Reasoning Corpus (ARC). In this work, we propose to improve the inductive reasoning ability of LLMs by generating explicit hypotheses at multiple levels of abstraction: we prompt the LLM to propose multiple abstract hypotheses about the problem, in natural language, then implement the natural language hypotheses as concrete Python programs. These programs can be verified by running on observed examples and generalized to novel inputs. To reduce the hypothesis search space, we explore steps to filter the set of hypotheses to implement: we either ask the LLM to summarize them into a smaller set of hypotheses or ask human annotators to select a subset. We verify our pipeline's effectiveness on the ARC visual inductive reasoning benchmark, its variant 1D-ARC, string transformation dataset SyGuS, and list transformation dataset List Functions. On a random 100-problem subset of ARC, our automated pipeline using LLM summaries achieves 30% accuracy, outperforming the direct prompting baseline (accuracy of 17%). With the minimal human input of selecting from LLM-generated candidates, performance is boosted to 33%. Our ablations show that both abstract hypothesis generation and concrete program representations benefit LLMs on inductive reasoning tasks. △ Less

Submitted 30 May, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: ICLR 2024. The first two authors contributed equally. Code: https://github.com/Relento/hypothesis_search

arXiv:2309.03225 [pdf, other]

Amortizing Pragmatic Program Synthesis with Rankings

Authors: Yewen Pu, Saujas Vaduguru, Priyan Vaithilingam, Elena Glassman, Daniel Fried

Abstract: In program synthesis, an intelligent system takes in a set of user-generated examples and returns a program that is logically consistent with these examples. The usage of Rational Speech Acts (RSA) framework has been successful in building \emph{pragmatic} program synthesizers that return programs which -- in addition to being logically consistent -- account for the fact that a user chooses their… ▽ More In program synthesis, an intelligent system takes in a set of user-generated examples and returns a program that is logically consistent with these examples. The usage of Rational Speech Acts (RSA) framework has been successful in building \emph{pragmatic} program synthesizers that return programs which -- in addition to being logically consistent -- account for the fact that a user chooses their examples informatively. However, the computational burden of running the RSA algorithm has restricted the application of pragmatic program synthesis to domains with a small number of possible programs. This work presents a novel method of amortizing the RSA algorithm by leveraging a \emph{global pragmatic ranking} -- a single, total ordering of all the hypotheses. We prove that for a pragmatic synthesizer that uses a single demonstration, our global ranking method exactly replicates RSA's ranked responses. We further empirically show that global rankings effectively approximate the full pragmatic synthesizer in an online, multi-demonstration setting. Experiments on two program synthesis domains using our pragmatic ranking method resulted in orders of magnitudes of speed ups compared to the RSA synthesizer, while outperforming the standard, non-pragmatic synthesizer. △ Less

Submitted 1 September, 2023; originally announced September 2023.

ACM Class: I.2.2; D.3.0

arXiv:2309.00399 [pdf, other]

Fine-grained Recognition with Learnable Semantic Data Augmentation

Authors: Yifan Pu, Yizeng Han, Yulin Wang, Junlan Feng, Chao Deng, Gao Huang

Abstract: Fine-grained image recognition is a longstanding computer vision challenge that focuses on differentiating objects belonging to multiple subordinate categories within the same meta-category. Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories. Although commonly used image-l… ▽ More Fine-grained image recognition is a longstanding computer vision challenge that focuses on differentiating objects belonging to multiple subordinate categories within the same meta-category. Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories. Although commonly used image-level data augmentation techniques have achieved great success in generic image classification problems, they are rarely applied in fine-grained scenarios, because their random editing-region behavior is prone to destroy the discriminative visual cues residing in the subtle regions. In this paper, we propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Specifically, we produce diversified augmented samples by translating image features along semantically meaningful directions. The semantic directions are estimated with a covariance prediction network, which predicts a sample-wise covariance matrix to adapt to the large intra-class variation inherent in fine-grained images. Furthermore, the covariance prediction network is jointly optimized with the classification network in a meta-learning manner to alleviate the degenerate solution problem. Experiments on four competitive fine-grained recognition benchmarks (CUB-200-2011, Stanford Cars, FGVC Aircrafts, NABirds) demonstrate that our method significantly improves the generalization performance on several popular classification networks (e.g., ResNets, DenseNets, EfficientNets, RegNets and ViT). Combined with a recently proposed method, our semantic data augmentation approach achieves state-of-the-art performance on the CUB-200-2011 dataset. The source code will be released. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2308.15949 [pdf, other]

Latency-aware Unified Dynamic Networks for Efficient Image Recognition

Authors: Yizeng Han, Zeyu Liu, Zhihang Yuan, Yifan Pu, Chaofei Wang, Shiji Song, Gao Huang

Abstract: Dynamic computation has emerged as a promising avenue to enhance the inference efficiency of deep networks. It allows selective activation of computational units, leading to a reduction in unnecessary computations for each input sample. However, the actual efficiency of these dynamic models can deviate from theoretical predictions. This mismatch arises from: 1) the lack of a unified approach due t… ▽ More Dynamic computation has emerged as a promising avenue to enhance the inference efficiency of deep networks. It allows selective activation of computational units, leading to a reduction in unnecessary computations for each input sample. However, the actual efficiency of these dynamic models can deviate from theoretical predictions. This mismatch arises from: 1) the lack of a unified approach due to fragmented research; 2) the focus on algorithm design over critical scheduling strategies, especially in CUDA-enabled GPU contexts; and 3) challenges in measuring practical latency, given that most libraries cater to static operations. Addressing these issues, we unveil the Latency-Aware Unified Dynamic Networks (LAUDNet), a framework that integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. To bridge the theoretical and practical efficiency gap, LAUDNet merges algorithmic design with scheduling optimization, guided by a latency predictor that accurately gauges dynamic operator latency. We've tested LAUDNet across multiple vision tasks, demonstrating its capacity to notably reduce the latency of models like ResNet-101 by over 50% on platforms such as V100, RTX3090, and TX2 GPUs. Notably, LAUDNet stands out in balancing accuracy and efficiency. Code is available at: https://www.github.com/LeapLabTHU/LAUDNet. △ Less

Submitted 20 February, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.06656 [pdf, other]

The Usability of Pragmatic Communication in Regular Expression Synthesis

Authors: Priyan Vaithilingam, Yewen Pu, Elena L. Glassman

Abstract: Programming-by-example (PBE) systems aim to alleviate the burden of programming. However, user-specified examples are often ambiguous, leaving multiple programs to satisfy the specification. Consequently, in most prior work, users have had to provide additional examples, particularly negative ones, to further constrain the search over compatible programs. Recent work resolves additional ambiguity… ▽ More Programming-by-example (PBE) systems aim to alleviate the burden of programming. However, user-specified examples are often ambiguous, leaving multiple programs to satisfy the specification. Consequently, in most prior work, users have had to provide additional examples, particularly negative ones, to further constrain the search over compatible programs. Recent work resolves additional ambiguity by modeling program synthesis tasks as pragmatic communication, showing promising results on a graphics domain using a rudimentary user-study. We adapt pragmatic reasoning to a sub-domain of regular expressions and rigorously study its usability as a means of communication both with and without the ability to provide negative examples. Our user study (N=30) demonstrates that, with a pragmatic synthesizer, end-users can more successfully communicate a target regex using positive examples alone (95%) compared to using a non-pragmatic synthesizer (51%). Further, users can communicate more efficiently (57% fewer examples) with a pragmatic synthesizer compared to a non-pragmatic one. △ Less

Submitted 12 August, 2023; originally announced August 2023.

Showing 1–50 of 213 results for author: Pu, Y