-
Advancing Quantum Computing with Formal Methods
Authors:
Arend-Jan Quist,
Jingyi Mei,
Tim Coopmans,
Alfons Laarman
Abstract:
This tutorial introduces quantum computing with a focus on the applicability of formal methods in this relatively new domain. We describe quantum circuits and convey an understanding of their inherent combinatorial nature and the exponential blow-up that makes them hard to analyze. Then, we show how weighted model counting (\#SAT) can be used to solve hard analysis tasks for quantum circuits.
Th…
▽ More
This tutorial introduces quantum computing with a focus on the applicability of formal methods in this relatively new domain. We describe quantum circuits and convey an understanding of their inherent combinatorial nature and the exponential blow-up that makes them hard to analyze. Then, we show how weighted model counting (\#SAT) can be used to solve hard analysis tasks for quantum circuits.
This tutorial is aimed at everyone in the formal methods community with an interest in quantum computing. Familiarity with quantum computing is not required, but basic linear algebra knowledge (particularly matrix multiplication and basis vectors) is a prerequisite. The goal of the tutorial is to inspire the community to advance the development of quantum computing with formal methods.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Long Sequence Decoder Network for Mobile Sensing
Authors:
Jiazhong Mei,
J. Nathan Kutz
Abstract:
The reconstruction and estimation of spatio-temporal patterns poses significant challenges when sensor measurements are limited. The use of mobile sensors adds additional complexity due to the change in sensor locations over time. In such cases, historical measurement and sensor information are useful for better performance, including models such as Kalman filters, recurrent neural networks (RNNs)…
▽ More
The reconstruction and estimation of spatio-temporal patterns poses significant challenges when sensor measurements are limited. The use of mobile sensors adds additional complexity due to the change in sensor locations over time. In such cases, historical measurement and sensor information are useful for better performance, including models such as Kalman filters, recurrent neural networks (RNNs) or transformer models. However, many of these approaches often fail to efficiently handle long sequences of data in such scenarios and are sensitive to noise. In this paper, we consider a model-free approach using the {\em structured state space sequence} (S4D) model as a deep learning layer in traditional sequence models to learn a better representation of historical sensor data. Specifically, it is integrated with a shallow decoder network for reconstruction of the high-dimensional state space. We also introduce a novel initialization of the S4D model using a Butterworth filter design to reduce noise in the inputs. Consequently, we construct a robust S4D (rS4D) model by appending the filtering S4D layer before the original S4D structure. This robust variant enhances the capability to accurately reconstruct spatio-temporal patterns with noisy mobile sensor measurements in long sequence. Numerical experiments demonstrate that our model achieves better performance compared with previous approaches. Our results underscore the efficacy of leveraging state space models within the context of spatio-temporal data reconstruction and estimation using limited mobile sensor resources, particularly in terms of long-sequence dependency and robustness to noise.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
PID: Physics-Informed Diffusion Model for Infrared Image Generation
Authors:
Fangyuan Mao,
Jilin Mei,
Shun Lu,
Fuyang Liu,
Liang Chen,
Fangzhou Zhao,
Yu Hu
Abstract:
Infrared imaging technology has gained significant attention for its reliable sensing ability in low visibility conditions, prompting many studies to convert the abundant RGB images to infrared images. However, most existing image translation methods treat infrared images as a stylistic variation, neglecting the underlying physical laws, which limits their practical application. To address these i…
▽ More
Infrared imaging technology has gained significant attention for its reliable sensing ability in low visibility conditions, prompting many studies to convert the abundant RGB images to infrared images. However, most existing image translation methods treat infrared images as a stylistic variation, neglecting the underlying physical laws, which limits their practical application. To address these issues, we propose a Physics-Informed Diffusion (PID) model for translating RGB images to infrared images that adhere to physical laws. Our method leverages the iterative optimization of the diffusion model and incorporates strong physical constraints based on prior knowledge of infrared laws during training. This approach enhances the similarity between translated infrared images and the real infrared domain without increasing extra training parameters. Experimental results demonstrate that PID significantly outperforms existing state-of-the-art methods. Our code is available at https://github.com/fangyuanmao/PID.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Enhancing learning in artificial neural networks through cellular heterogeneity and neuromodulatory signaling
Authors:
Alejandro Rodriguez-Garcia,
Jie Mei,
Srikanth Ramaswamy
Abstract:
Recent progress in artificial intelligence (AI) has been driven by insights from neuroscience, particularly with the development of artificial neural networks (ANNs). This has significantly enhanced the replication of complex cognitive tasks such as vision and natural language processing. Despite these advances, ANNs struggle with continual learning, adaptable knowledge transfer, robustness, and r…
▽ More
Recent progress in artificial intelligence (AI) has been driven by insights from neuroscience, particularly with the development of artificial neural networks (ANNs). This has significantly enhanced the replication of complex cognitive tasks such as vision and natural language processing. Despite these advances, ANNs struggle with continual learning, adaptable knowledge transfer, robustness, and resource efficiency - capabilities that biological systems handle seamlessly. Specifically, ANNs often overlook the functional and morphological diversity of the brain, hindering their computational capabilities. Furthermore, incorporating cell-type specific neuromodulatory effects into ANNs with neuronal heterogeneity could enable learning at two spatial scales: spiking behavior at the neuronal level, and synaptic plasticity at the circuit level, thereby potentially enhancing their learning abilities. In this article, we summarize recent bio-inspired models, learning rules and architectures and propose a biologically-informed framework for enhancing ANNs. Our proposed dual-framework approach highlights the potential of spiking neural networks (SNNs) for emulating diverse spiking behaviors and dendritic compartments to simulate morphological and functional diversity of neuronal computations. Finally, we outline how the proposed approach integrates brain-inspired compartmental models and task-driven SNNs, balances bioinspiration and complexity, and provides scalable solutions for pressing AI challenges, such as continual learning, adaptability, robustness, and resource-efficiency.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
What If We Recaption Billions of Web Images with LLaMA-3?
Authors:
Xianhang Li,
Haoqin Tu,
Mude Hui,
Zeyu Wang,
Bingchen Zhao,
Junfei Xiao,
Sucheng Ren,
Jieru Mei,
Qing Liu,
Huangjie Zheng,
Yuyin Zhou,
Cihang Xie
Abstract:
Web-crawled image-text pairs are inherently noisy. Prior studies demonstrate that semantically aligning and enriching textual descriptions of these pairs can significantly enhance model training across various vision-language tasks, particularly text-to-image generation. However, large-scale investigations in this area remain predominantly closed-source. Our paper aims to bridge this community eff…
▽ More
Web-crawled image-text pairs are inherently noisy. Prior studies demonstrate that semantically aligning and enriching textual descriptions of these pairs can significantly enhance model training across various vision-language tasks, particularly text-to-image generation. However, large-scale investigations in this area remain predominantly closed-source. Our paper aims to bridge this community effort, leveraging the powerful and \textit{open-sourced} LLaMA-3, a GPT-4 level LLM. Our recaptioning pipeline is simple: first, we fine-tune a LLaMA-3-8B powered LLaVA-1.5 and then employ it to recaption 1.3 billion images from the DataComp-1B dataset. Our empirical results confirm that this enhanced dataset, Recap-DataComp-1B, offers substantial benefits in training advanced vision-language models. For discriminative models like CLIP, we observe enhanced zero-shot performance in cross-modal retrieval tasks. For generative models like text-to-image Diffusion Transformers, the generated images exhibit a significant improvement in alignment with users' text instructions, especially in following complex queries. Our project page is https://www.haqtu.me/Recap-Datacomp-1B/
△ Less
Submitted 18 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Autoregressive Pretraining with Mamba in Vision
Authors:
Sucheng Ren,
Xianhang Li,
Haoqin Tu,
Feng Wang,
Fangxun Shu,
Lei Zhang,
Jieru Mei,
Linjie Yang,
Peng Wang,
Heng Wang,
Alan Yuille,
Cihang Xie
Abstract:
The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual capability can be significantly enhanced through autoregressive pretraining, a direction not previously explored. Efficiency-wise, the autoregressive nature can well capitalize on the Mamba's unidirectional recurrent structur…
▽ More
The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual capability can be significantly enhanced through autoregressive pretraining, a direction not previously explored. Efficiency-wise, the autoregressive nature can well capitalize on the Mamba's unidirectional recurrent structure, enabling faster overall training speed compared to other training strategies like mask modeling. Performance-wise, autoregressive pretraining equips the Mamba architecture with markedly higher accuracy over its supervised-trained counterparts and, more importantly, successfully unlocks its scaling potential to large and even huge model sizes. For example, with autoregressive pretraining, a base-size Mamba attains 83.2\% ImageNet accuracy, outperforming its supervised counterpart by 2.0\%; our huge-size Mamba, the largest Vision Mamba to date, attains 85.0\% ImageNet accuracy (85.5\% when finetuned with $384\times384$ inputs), notably surpassing all other Mamba variants in vision. The code is available at \url{https://github.com/OliverRensu/ARM}.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Medical Vision Generalist: Unifying Medical Imaging Tasks in Context
Authors:
Sucheng Ren,
Xiaoke Huang,
Xianhang Li,
Junfei Xiao,
Jieru Mei,
Zeyu Wang,
Alan Yuille,
Yuyin Zhou
Abstract:
This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework. Specifically, MVG employs an in-context generation strategy that standardizes the handling of inputs and outputs as images. By treati…
▽ More
This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework. Specifically, MVG employs an in-context generation strategy that standardizes the handling of inputs and outputs as images. By treating these tasks as an image generation process conditioned on prompt image-label pairs and input images, this approach enables a flexible unification of various tasks, even those spanning different modalities and datasets. To capitalize on both local and global context, we design a hybrid method combining masked image modeling with autoregressive training for conditional image generation. This hybrid approach yields the most robust performance across all involved medical imaging tasks. To rigorously evaluate MVG's capabilities, we curated the first comprehensive generalist medical vision benchmark, comprising 13 datasets and spanning four imaging modalities (CT, MRI, X-ray, and micro-ultrasound). Our results consistently establish MVG's superior performance, outperforming existing vision generalists, such as Painter and LVM. Furthermore, MVG exhibits strong scalability, with its performance demonstrably improving when trained on a more diverse set of tasks, and can be effectively adapted to unseen datasets with only minimal task-specific samples. The code is available at \url{https://github.com/OliverRensu/MVG}.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Negative Feedback for Music Personalization
Authors:
M. Jeffrey Mei,
Oliver Bembom,
Andreas F. Ehmann
Abstract:
Next-item recommender systems are often trained using only positive feedback with randomly-sampled negative feedback. We show the benefits of using real negative feedback both as inputs into the user sequence and also as negative targets for training a next-song recommender system for internet radio. In particular, using explicit negative samples during training helps reduce training time by ~60%…
▽ More
Next-item recommender systems are often trained using only positive feedback with randomly-sampled negative feedback. We show the benefits of using real negative feedback both as inputs into the user sequence and also as negative targets for training a next-song recommender system for internet radio. In particular, using explicit negative samples during training helps reduce training time by ~60% while also improving test accuracy by ~6%; adding user skips as additional inputs also can considerably increase user coverage alongside slightly improving accuracy. We test the impact of using a large number of random negative samples to capture a 'harder' one and find that the test accuracy increases with more randomly-sampled negatives, but only to a point. Too many random negatives leads to false negatives that limits the lift, which is still lower than if using true negative feedback. We also find that the test accuracy is fairly robust with respect to the proportion of different feedback types, and compare the learned embeddings for different feedback types.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation
Authors:
Fengdi Che,
Chenjun Xiao,
Jincheng Mei,
Bo Dai,
Ramki Gummadi,
Oscar A Ramirez,
Christopher K Harris,
A. Rupam Mahmood,
Dale Schuurmans
Abstract:
We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision pr…
▽ More
We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision processes. Notably, using only a target network or an over-parameterized model does not provide such a convergence guarantee. Additionally, we extend our results to learning with truncated trajectories, showing that convergence is achievable for all tasks with minor modifications, akin to value truncation for the final states in trajectories. Our primary result focuses on temporal difference estimation for prediction, providing high-probability value estimation error bounds and empirical analysis on Baird's counterexample and a Four-room task. Furthermore, we explore the control setting, demonstrating that similar convergence conditions apply to Q-learning.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Authors:
Shicong Cen,
Jincheng Mei,
Katayoon Goshvadi,
Hanjun Dai,
Tong Yang,
Sherry Yang,
Dale Schuurmans,
Yuejie Chi,
Bo Dai
Abstract:
Reinforcement learning from human feedback (RLHF) has demonstrated great promise in aligning large language models (LLMs) with human preference. Depending on the availability of preference data, both online and offline RLHF are active areas of investigation. A key bottleneck is understanding how to incorporate uncertainty estimation in the reward function learned from the preference data for RLHF,…
▽ More
Reinforcement learning from human feedback (RLHF) has demonstrated great promise in aligning large language models (LLMs) with human preference. Depending on the availability of preference data, both online and offline RLHF are active areas of investigation. A key bottleneck is understanding how to incorporate uncertainty estimation in the reward function learned from the preference data for RLHF, regardless of how the preference data is collected. While the principles of optimism or pessimism under uncertainty are well-established in standard reinforcement learning (RL), a practically-implementable and theoretically-grounded form amenable to large language models is not yet available, as standard techniques for constructing confidence intervals become intractable under arbitrary policy parameterizations.
In this paper, we introduce a unified approach to online and offline RLHF -- value-incentivized preference optimization (VPO) -- which regularizes the maximum-likelihood estimate of the reward function with the corresponding value function, modulated by a $\textit{sign}$ to indicate whether the optimism or pessimism is chosen. VPO also directly optimizes the policy with implicit reward modeling, and therefore shares a simpler RLHF pipeline similar to direct preference optimization. Theoretical guarantees of VPO are provided for both online and offline settings, matching the rates of their standard RL counterparts. Moreover, experiments on text summarization and dialog verify the practicality and effectiveness of VPO.
△ Less
Submitted 5 July, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
Authors:
Jianbiao Mei,
Yukai Ma,
Xuemeng Yang,
Licheng Wen,
Xinyu Cai,
Xin Li,
Daocheng Fu,
Bo Zhang,
Pinlong Cai,
Min Dou,
Botian Shi,
Liang He,
Yong Liu,
Yu Qiao
Abstract:
Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitiv…
▽ More
Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitive process. Specifically, LeapAD emulates human attention by selecting critical objects relevant to driving decisions, simplifying environmental interpretation, and mitigating decision-making complexities. Additionally, LeapAD incorporates an innovative dual-process decision-making module, which consists of an Analytic Process (System-II) for thorough analysis and reasoning, along with a Heuristic Process (System-I) for swift and empirical processing. The Analytic Process leverages its logical reasoning to accumulate linguistic driving experience, which is then transferred to the Heuristic Process by supervised fine-tuning. Through reflection mechanisms and a growing memory bank, LeapAD continuously improves itself from past mistakes in a closed-loop environment. Closed-loop testing in CARLA shows that LeapAD outperforms all methods relying solely on camera input, requiring 1-2 orders of magnitude less labeled data. Experiments also demonstrate that as the memory bank expands, the Heuristic Process with only 1.8B parameters can inherit the knowledge from a GPT-4 powered Analytic Process and achieve continuous performance improvement. Code will be released at https://github.com/PJLab-ADG/LeapAD.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Mamba-R: Vision Mamba ALSO Needs Registers
Authors:
Feng Wang,
Jiahao Wang,
Sucheng Ren,
Guoyizhe Wei,
Jieru Mei,
Wei Shao,
Yuyin Zhou,
Alan Yuille,
Cihang Xie
Abstract:
Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba. These artifacts, corresponding to high-norm tokens emerging in low-information background areas of images, appear much more severe in Vision Mamba -- they exist prevalently even with the tiny-sized model and activate extensively across background regions. To mitigate this issue, we…
▽ More
Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba. These artifacts, corresponding to high-norm tokens emerging in low-information background areas of images, appear much more severe in Vision Mamba -- they exist prevalently even with the tiny-sized model and activate extensively across background regions. To mitigate this issue, we follow the prior solution of introducing register tokens into Vision Mamba. To better cope with Mamba blocks' uni-directional inference paradigm, two key modifications are introduced: 1) evenly inserting registers throughout the input token sequence, and 2) recycling registers for final decision predictions. We term this new architecture Mamba-R. Qualitative observations suggest, compared to vanilla Vision Mamba, Mamba-R's feature maps appear cleaner and more focused on semantically meaningful regions. Quantitatively, Mamba-R attains stronger performance and scales better. For example, on the ImageNet benchmark, our base-size Mamba-R attains 82.9% accuracy, significantly outperforming Vim-B's 81.8%; furthermore, we provide the first successful scaling to the large model size (i.e., with 341M parameters), attaining a competitive accuracy of 83.2% (84.5% if finetuned with 384x384 inputs). Additional validation on the downstream semantic segmentation task also supports Mamba-R's efficacy.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
"Community Guidelines Make this the Best Party on the Internet": An In-Depth Study of Online Platforms' Content Moderation Policies
Authors:
Brennan Schaffner,
Arjun Nitin Bhagoji,
Siyuan Cheng,
Jacqueline Mei,
Jay L. Shen,
Grace Wang,
Marshini Chetty,
Nick Feamster,
Genevieve Lakier,
Chenhao Tan
Abstract:
Moderating user-generated content on online platforms is crucial for balancing user safety and freedom of speech. Particularly in the United States, platforms are not subject to legal constraints prescribing permissible content. Each platform has thus developed bespoke content moderation policies, but there is little work towards a comparative understanding of these policies across platforms and t…
▽ More
Moderating user-generated content on online platforms is crucial for balancing user safety and freedom of speech. Particularly in the United States, platforms are not subject to legal constraints prescribing permissible content. Each platform has thus developed bespoke content moderation policies, but there is little work towards a comparative understanding of these policies across platforms and topics. This paper presents the first systematic study of these policies from the 43 largest online platforms hosting user-generated content, focusing on policies around copyright infringement, harmful speech, and misleading content. We build a custom web-scraper to obtain policy text and develop a unified annotation scheme to analyze the text for the presence of critical components. We find significant structural and compositional variation in policies across topics and platforms, with some variation attributable to disparate legal groundings. We lay the groundwork for future studies of ever-evolving content moderation policies and their impact on users.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Surprising pressure-induced magnetic transformations from Helimagnetic order to Antiferromagnetic state in NiI2
Authors:
Qiye Liu,
Wenjie Su,
Yue Gu,
Xi Zhang,
Xiuquan Xia,
Le Wang,
Ke Xiao,
Xiaodong Cui,
Xiaolong Zou,
Bin Xi,
Jia-Wei Mei,
Jun-Feng Dai
Abstract:
Interlayer magnetic interactions play a pivotal role in determining the magnetic arrangement within van der Waals (vdW) magnets, and the remarkable tunability of these interactions through applied pressure further enhances their significance. Here, we investigate NiI2 flakes, a representative vdW magnet, under hydrostatic pressures up to 11 GPa. We reveal a notable increase in magnetic transition…
▽ More
Interlayer magnetic interactions play a pivotal role in determining the magnetic arrangement within van der Waals (vdW) magnets, and the remarkable tunability of these interactions through applied pressure further enhances their significance. Here, we investigate NiI2 flakes, a representative vdW magnet, under hydrostatic pressures up to 11 GPa. We reveal a notable increase in magnetic transition temperatures for both helimagnetic and antiferromagnetic states, and find that a reversible transition from helimagnetic to antiferromagnetic (AFM) phases at approximately 7 GPa challenges established theoretical and experimental expectations. While the increase in transition temperature aligns with pressure-enhanced overall exchange interaction strengths, we identify the significant role of the second-nearest neighbor interlayer interaction, which competes with intra-layer frustration and favors the AFM state as demonstrated in the Monte Carlo simulations. Experimental and simulated results converge on the existence of an intermediate helimagnetic ordered state in NiI2 before transitioning to the AFM state. These findings underscore the pivotal role of interlayer interactions in shaping the magnetic ground state, providing fresh perspectives for innovative applications in nanoscale magnetic device design.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework
Authors:
Junyi Mei,
Shixuan Sun,
Chao Li,
Cheng Xu,
Cheng Chen,
Yibo Liu,
Jing Wang,
Cheng Zhao,
Xiaofeng Hou,
Minyi Guo,
Bingsheng He,
Xiaoliang Cong
Abstract:
Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarras…
▽ More
Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarrassed to parallelize. In this paper, we propose FlowWalker, a GPU-based dynamic graph random walk framework. FlowWalker implements an efficient parallel sampling method to fully exploit the GPU parallelism and reduce space complexity. Moreover, it employs a sampler-centric paradigm alongside a dynamic scheduling strategy to handle the huge amounts of walking queries. FlowWalker stands as a memory-efficient framework that requires no auxiliary data structures in GPU global memory. We examine the performance of FlowWalker extensively on ten datasets, and experiment results show that FlowWalker achieves up to 752.2x, 72.1x, and 16.4x speedup compared with existing CPU, GPU, and FPGA random walk frameworks, respectively. Case study shows that FlowWalker diminishes random walk time from 35% to 3% in a pipeline of ByteDance friend recommendation GNN training.
△ Less
Submitted 26 April, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata
Authors:
Jinghong Chen,
Weizhe Lin,
Jingbiao Mei,
Bill Byrne
Abstract:
The Directed Acyclic Transformer is a fast non-autoregressive (NAR) model that performs well in Neural Machine Translation. Two issues prevent its application to general Natural Language Generation (NLG) tasks: frequent Out-Of-Vocabulary (OOV) errors and the inability to faithfully generate entity names. We introduce Control-DAG, a constrained decoding algorithm for our Directed Acyclic T5 (DA-T5)…
▽ More
The Directed Acyclic Transformer is a fast non-autoregressive (NAR) model that performs well in Neural Machine Translation. Two issues prevent its application to general Natural Language Generation (NLG) tasks: frequent Out-Of-Vocabulary (OOV) errors and the inability to faithfully generate entity names. We introduce Control-DAG, a constrained decoding algorithm for our Directed Acyclic T5 (DA-T5) model which offers lexical, vocabulary and length control. We show that Control-DAG significantly enhances DA-T5 on the Schema Guided Dialogue and the DART datasets, establishing strong NAR results for Task-Oriented Dialogue and Data-to-Text NLG.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Quantum Natural Language Processing
Authors:
Dominic Widdows,
Willie Aboumrad,
Dohun Kim,
Sayonee Ray,
Jonathan Mei
Abstract:
Language processing is at the heart of current developments in artificial intelligence, and quantum computers are becoming available at the same time. This has led to great interest in quantum natural language processing, and several early proposals and experiments.
This paper surveys the state of this area, showing how NLP-related techniques have been used in quantum language processing. We exa…
▽ More
Language processing is at the heart of current developments in artificial intelligence, and quantum computers are becoming available at the same time. This has led to great interest in quantum natural language processing, and several early proposals and experiments.
This paper surveys the state of this area, showing how NLP-related techniques have been used in quantum language processing. We examine the art of word embeddings and sequential models, proposing some avenues for future investigation and discussing the tradeoffs present in these directions. We also highlight some recent methods to compute attention in transformer models, and perform grammatical parsing. We also introduce a new quantum design for the basic task of text encoding (representing a string of characters in memory), which has not been addressed in detail before.
Quantum theory has contributed toward quantifying uncertainty and explaining "What is intelligence?" In this context, we argue that "hallucinations" in modern artificial intelligence systems are a misunderstanding of the way facts are conceptualized: language can express many plausible hypotheses, of which only a few become actual.
△ Less
Submitted 26 April, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Equivalence Checking of Quantum Circuits by Model Counting
Authors:
Jingyi Mei,
Tim Coopmans,
Marcello Bonsangue,
Alfons Laarman
Abstract:
Verifying equivalence between two quantum circuits is a hard problem, that is nonetheless crucial in compiling and optimizing quantum algorithms for real-world devices. This paper gives a Turing reduction of the (universal) quantum circuits equivalence problem to weighted model counting (WMC). Our starting point is a folklore theorem showing that equivalence checking of quantum circuits can be don…
▽ More
Verifying equivalence between two quantum circuits is a hard problem, that is nonetheless crucial in compiling and optimizing quantum algorithms for real-world devices. This paper gives a Turing reduction of the (universal) quantum circuits equivalence problem to weighted model counting (WMC). Our starting point is a folklore theorem showing that equivalence checking of quantum circuits can be done in the so-called Pauli-basis. We combine this insight with a WMC encoding of quantum circuit simulation, which we extend with support for the Toffoli gate. Finally, we prove that the weights computed by the model counter indeed realize the reduction. With an open-source implementation, we demonstrate that this novel approach can outperform a state-of-the-art equivalence-checking tool based on ZX calculus and decision diagrams.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
3D-TransUNet for Brain Metastases Segmentation in the BraTS2023 Challenge
Authors:
Siwei Yang,
Xianhang Li,
Jieru Mei,
Jieneng Chen,
Cihang Xie,
Yuyin Zhou
Abstract:
Segmenting brain tumors is complex due to their diverse appearances and scales. Brain metastases, the most common type of brain tumor, are a frequent complication of cancer. Therefore, an effective segmentation model for brain metastases must adeptly capture local intricacies to delineate small tumor regions while also integrating global context to understand broader scan features. The TransUNet m…
▽ More
Segmenting brain tumors is complex due to their diverse appearances and scales. Brain metastases, the most common type of brain tumor, are a frequent complication of cancer. Therefore, an effective segmentation model for brain metastases must adeptly capture local intricacies to delineate small tumor regions while also integrating global context to understand broader scan features. The TransUNet model, which combines Transformer self-attention with U-Net's localized information, emerges as a promising solution for this task. In this report, we address brain metastases segmentation by training the 3D-TransUNet model on the Brain Tumor Segmentation (BraTS-METS) 2023 challenge dataset. Specifically, we explored two architectural configurations: the Encoder-only 3D-TransUNet, employing Transformers solely in the encoder, and the Decoder-only 3D-TransUNet, utilizing Transformers exclusively in the decoder. For Encoder-only 3D-TransUNet, we note that Masked-Autoencoder pre-training is required for a better initialization of the Transformer Encoder and thus accelerates the training process. We identify that the Decoder-only 3D-TransUNet model should offer enhanced efficacy in the segmentation of brain metastases, as indicated by our 5-fold cross-validation on the training set. However, our use of the Encoder-only 3D-TransUNet model already yield notable results, with an average lesion-wise Dice score of 59.8\% on the test set, securing second place in the BraTS-METS 2023 challenge.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Simulating Quantum Circuits by Model Counting
Authors:
Jingyi Mei,
Marcello Bonsangue,
Alfons Laarman
Abstract:
Quantum circuit compilation comprises many computationally hard reasoning tasks that nonetheless lie inside #$\mathbf{P}$ and its decision counterpart in $\mathbf{PP}$. The classical simulation of general quantum circuits is a core example. We show for the first time that a strong simulation of universal quantum circuits can be efficiently tackled through weighted model counting by providing a lin…
▽ More
Quantum circuit compilation comprises many computationally hard reasoning tasks that nonetheless lie inside #$\mathbf{P}$ and its decision counterpart in $\mathbf{PP}$. The classical simulation of general quantum circuits is a core example. We show for the first time that a strong simulation of universal quantum circuits can be efficiently tackled through weighted model counting by providing a linear encoding of Clifford+T circuits. To achieve this, we exploit the stabilizer formalism by Knill, Gottesmann, and Aaronson and the fact that stabilizer states form a basis for density operators. With an open-source simulator implementation, we demonstrate empirically that model counting often outperforms state-of-the-art simulation techniques based on the ZX calculus and decision diagrams. Our work paves the way to apply the existing array of powerful classical reasoning tools to realize efficient quantum circuit compilation; one of the obstacles on the road towards quantum supremacy.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Magnetic catalysis and diamagnetism from pion fluctuations
Authors:
Jie Mei,
Rui Wen,
Shijun Mao,
Mei Huang,
Kun Xu
Abstract:
In the framework of Nambu--Jona-Lasinio model beyond mean field approximation, the effects of pion fluctuations on (inverse) magnetic catalysis and magnetic susceptibility are studied. The negative magnetic susceptibility at low temperature is observed when contributions from both neutral and charged pions are taken into account. In weak field approximation, it is observed that at finite temperatu…
▽ More
In the framework of Nambu--Jona-Lasinio model beyond mean field approximation, the effects of pion fluctuations on (inverse) magnetic catalysis and magnetic susceptibility are studied. The negative magnetic susceptibility at low temperature is observed when contributions from both neutral and charged pions are taken into account. In weak field approximation, it is observed that at finite temperature, the magnetic inhibition effect in the chiral limit, resulting from the difference between the transverse and longitudinal velocities of neutral pions, converts to weak magnetic catalysis when considering a non-zero current quark mass. Moreover, the magnetic catalysis is amplified by the charged pions.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Online Time-Optimal Trajectory Generation for Two Quadrotors with Multi-Waypoints Constraints
Authors:
Fangguo Zhao,
Jiahao Mei,
Jin Zhou,
Jiming Chen,
Shuo Li
Abstract:
The autonomous quadrotor's flying speed has kept increasing in the past 5 years, especially in the field of autonomous drone racing. However, the majority of the research mainly focuses on the aggressive flight of a single quadrotor. In this letter, we propose a novel method called Pairwise Model Predictive Control (PMPC) that can guide two quadrotors online to fly through the waypoints with minim…
▽ More
The autonomous quadrotor's flying speed has kept increasing in the past 5 years, especially in the field of autonomous drone racing. However, the majority of the research mainly focuses on the aggressive flight of a single quadrotor. In this letter, we propose a novel method called Pairwise Model Predictive Control (PMPC) that can guide two quadrotors online to fly through the waypoints with minimum time without collisions. The flight task is first modeled as a nonlinear optimization problem and then an efficient two-step mass point velocity search method is used to provide initial values and references to improve the solving efficiency so that the method can run online with a frequency of 50 Hz and can handle dynamic waypoints. The simulation and real-world experiments validate the feasibility of the proposed method and in the real-world experiments, the two quadrotors can achieve a top speed of 8.1m/s in a 6-waypoint racing track in a compact flying arena of 6m*4m*2m.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
New method for estimating molecular cloud distances based on Gaia, 2MASS, and the TRILEGAL galaxy model
Authors:
Juan Mei,
Zhiwei Chen,
Zhibo Jiang,
Sheng Zheng,
Haoran Feng
Abstract:
We propose a new method for estimating the distances of molecular clouds traced by CO line emission. Stars from 2MASS and Gaia EDR3 are selected as on-cloud stars when they are projected on a cloud. The background on-cloud stars have redder colors on average than the foreground stars. Instead of searching for stars projected away from the cloud, we employed the TRILEGA galaxy model to mimic the st…
▽ More
We propose a new method for estimating the distances of molecular clouds traced by CO line emission. Stars from 2MASS and Gaia EDR3 are selected as on-cloud stars when they are projected on a cloud. The background on-cloud stars have redder colors on average than the foreground stars. Instead of searching for stars projected away from the cloud, we employed the TRILEGA galaxy model to mimic the stellar population without cloud extinction along the sightline toward the cloud. Our method does not require an exact boundary of a cloud. The boundaries are highly variable and depend on the sensitivity of the molecular line data. For each cloud, we compared the distributions of on-cloud stars to the TRILEGAL stellar populations in the diagram of $J-K_s$ color versus distance. The intrinsic $J-K_s$ colors of main-sequence and evolved stars from TRILEGAL were considered separately, and they were used as the baseline for subtracting the observed $J-K_s$ colors. The baseline-corrected $J-K_s$ color was deployed with the Bayesian analysis and Markov chain Monte Carlo sampling to determine the distance at which the $J-K_s$ color jump is largest. This method was successfully applied to measure the distances of 27 molecular clouds, which were selected from previously published cloud samples. By replacing TRILEGAL with the GALAXIA galaxy model, we were able to measure the distances for 21 of the 27 clouds. The distances of the 21 clouds based on the GALAXIA model agree well with those based on the TRILEGAL model. The distances of the 27 clouds estimated by this method are consistent with previous estimates. We will apply this new method to a larger region of the gaseous galactic plane, in particular, for the inner galactic region, where a region free of CO emission is hard to separate from the crowded field of clouds.
△ Less
Submitted 5 May, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Stochastic Gradient Succeeds for Bandits
Authors:
Jincheng Mei,
Zixin Zhong,
Bo Dai,
Alekh Agarwal,
Csaba Szepesvari,
Dale Schuurmans
Abstract:
We show that the \emph{stochastic gradient} bandit algorithm converges to a \emph{globally optimal} policy at an $O(1/t)$ rate, even with a \emph{constant} step size. Remarkably, global convergence of the stochastic gradient bandit algorithm has not been previously established, even though it is an old algorithm known to be applicable to bandits. The new result is achieved by establishing two nove…
▽ More
We show that the \emph{stochastic gradient} bandit algorithm converges to a \emph{globally optimal} policy at an $O(1/t)$ rate, even with a \emph{constant} step size. Remarkably, global convergence of the stochastic gradient bandit algorithm has not been previously established, even though it is an old algorithm known to be applicable to bandits. The new result is achieved by establishing two novel technical findings: first, the noise of the stochastic updates in the gradient bandit algorithm satisfies a strong ``growth condition'' property, where the variance diminishes whenever progress becomes small, implying that additional noise control via diminishing step sizes is unnecessary; second, a form of ``weak exploration'' is automatically achieved through the stochastic gradient updates, since they prevent the action probabilities from decaying faster than $O(1/t)$, thus ensuring that every action is sampled infinitely often with probability $1$. These two findings can be used to show that the stochastic gradient update is already ``sufficient'' for bandits in the sense that exploration versus exploitation is automatically balanced in a manner that ensures almost sure convergence to a global optimum. These novel theoretical findings are further verified by experimental results.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Inversion-symmetric Electron Gases as New Platforms for Topological Planar Josephson Junctions
Authors:
Jiong Mei,
Kun Jiang,
Shengshan Qin,
Jiangping Hu
Abstract:
Intrinsic Rashba spin-orbital coupling (SOC) can exist in centrosymmetric materials with local inversion symmetry breaking. Here we show that such a SOC can induce topological superconductivity together with an in-plane Zeeman field in planar Josephson junctions formed by the centrosymmetric materials. A single Majorana mode can be created at each end of the junction. We demonstrate this result in…
▽ More
Intrinsic Rashba spin-orbital coupling (SOC) can exist in centrosymmetric materials with local inversion symmetry breaking. Here we show that such a SOC can induce topological superconductivity together with an in-plane Zeeman field in planar Josephson junctions formed by the centrosymmetric materials. A single Majorana mode can be created at each end of the junction. We demonstrate this result in a model based on iron-based superconductors. We derive the necessary Fermi surface condition for the topological planar junction and calculate the topological phase diagram with respect to the in-plane Zeeman field and the phase difference between the two superconductors. We provide experimental characteristics for the topological superconductivity, including the differential conductance and the Fano factor tomography which can be measured in the scanning tunneling spectroscopy. Our study reveals that the centrosymmetric systems with local-inversion-symmetry breaking can serve as new platforms for the topological planar Josephson junctions, and help to find more experimentally feasible materials for the topological superconductors.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Imitation Learning-Based Online Time-Optimal Control with Multiple-Waypoint Constraints for Quadrotors
Authors:
Jin Zhou,
Jiahao Mei,
Fangguo Zhao,
Jiming Chen,
Shuo Li
Abstract:
Over the past decade, there has been a remarkable surge in utilizing quadrotors for various purposes due to their simple structure and aggressive maneuverability, such as search and rescue, delivery and autonomous drone racing, etc. One of the key challenges preventing quadrotors from being widely used in these scenarios is online waypoint-constrained time-optimal trajectory generation and control…
▽ More
Over the past decade, there has been a remarkable surge in utilizing quadrotors for various purposes due to their simple structure and aggressive maneuverability, such as search and rescue, delivery and autonomous drone racing, etc. One of the key challenges preventing quadrotors from being widely used in these scenarios is online waypoint-constrained time-optimal trajectory generation and control technique. This letter proposes an imitation learning-based online solution to efficiently navigate the quadrotor through multiple waypoints with time-optimal performance. The neural networks (WN&CNets) are trained to learn the control law from the dataset generated by the time-consuming CPC algorithm and then deployed to generate the optimal control commands online to guide the quadrotors. To address the challenge of limited training data and the hover maneuver at the final waypoint, we propose a transition phase strategy that utilizes polynomials to help the quadrotor 'jump over' the stop-and-go maneuver when switching waypoints. Our method is demonstrated in both simulation and real-world experiments, achieving a maximum speed of 7 m/s while navigating through 7 waypoints in a confined space of 6.0 m * 4.0 m * 2.0 m. The results show that with a slight loss in optimality, the WN&CNets significantly reduce the processing time and enable online optimal control for multiple-waypoint-constrained flight tasks.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Few-Shot Object Detection with Sparse Context Transformers
Authors:
Jie Mei,
Mingyuan Jiu,
Hichem Sahbi,
Xiaoheng Jiang,
Mingliang Xu
Abstract:
Few-shot detection is a major task in pattern recognition which seeks to localize objects using models trained with few labeled data. One of the mainstream few-shot methods is transfer learning which consists in pretraining a detection model in a source domain prior to its fine-tuning in a target domain. However, it is challenging for fine-tuned models to effectively identify new classes in the ta…
▽ More
Few-shot detection is a major task in pattern recognition which seeks to localize objects using models trained with few labeled data. One of the mainstream few-shot methods is transfer learning which consists in pretraining a detection model in a source domain prior to its fine-tuning in a target domain. However, it is challenging for fine-tuned models to effectively identify new classes in the target domain, particularly when the underlying labeled training data are scarce. In this paper, we devise a novel sparse context transformer (SCT) that effectively leverages object knowledge in the source domain, and automatically learns a sparse context from only few training images in the target domain. As a result, it combines different relevant clues in order to enhance the discrimination power of the learned detectors and reduce class confusion. We evaluate the proposed method on two challenging few-shot object detection benchmarks, and empirical results show that the proposed method obtains competitive performance compared to the related state-of-the-art.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
On-shell Bootstrap for n-gluons and gravitons scattering in (A)dS, Unitarity and Soft limit
Authors:
Jiajie Mei,
Yuyu Mo
Abstract:
We propose an algorithm to recursively bootstrap $n$-point gluon and graviton Mellin-Momentum amplitudes in (A)dS spacetime using only three-point amplitude. We discover that gluon amplitudes are simply determined by factorization for $n\geq 5$. The same principle applies to $n$-point graviton amplitudes, but additional constraints such as flat space and soft limits are needed to fix contact terms…
▽ More
We propose an algorithm to recursively bootstrap $n$-point gluon and graviton Mellin-Momentum amplitudes in (A)dS spacetime using only three-point amplitude. We discover that gluon amplitudes are simply determined by factorization for $n\geq 5$. The same principle applies to $n$-point graviton amplitudes, but additional constraints such as flat space and soft limits are needed to fix contact terms. Furthermore, we establish a mapping from $n$-point Mellin-Momentum amplitudes to $n$-point cosmological correlators. We efficiently compute explicit examples up to five points. This leads to the first five-graviton amplitude in $AdS_{d+1}$.
△ Less
Submitted 16 March, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers
Authors:
Weizhe Lin,
Jingbiao Mei,
Jinghong Chen,
Bill Byrne
Abstract:
Large Multimodal Models (LMMs) excel in natural language and visual understanding but are challenged by exacting tasks such as Knowledge-based Visual Question Answering (KB-VQA) which involve the retrieval of relevant information from document collections to use in shaping answers to questions. We present an extensive training and evaluation framework, M2KR, for KB-VQA. M2KR contains a collection…
▽ More
Large Multimodal Models (LMMs) excel in natural language and visual understanding but are challenged by exacting tasks such as Knowledge-based Visual Question Answering (KB-VQA) which involve the retrieval of relevant information from document collections to use in shaping answers to questions. We present an extensive training and evaluation framework, M2KR, for KB-VQA. M2KR contains a collection of vision and language tasks which we have incorporated into a single suite of benchmark tasks for training and evaluating general-purpose multi-modal retrievers. We use M2KR to develop PreFLMR, a pre-trained version of the recently developed Fine-grained Late-interaction Multi-modal Retriever (FLMR) approach to KB-VQA, and we report new state-of-the-art results across a range of tasks. We also present investigations into the scaling behaviors of PreFLMR intended to be useful in future developments in general-purpose multi-modal retrievers.
△ Less
Submitted 5 June, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Continuum of spin excitations in an ordered magnet
Authors:
Jieming Sheng,
Le Wang,
Wenrui Jiang,
Han Ge,
Nan Zhao,
Tiantian Li,
Maiko Kofu,
Dehong Yu,
Wei Zhu,
Jia-Wei Mei,
Zhentao Wang,
Liusuo Wu
Abstract:
Continuum of spin excitations observed in inelastic neutron scattering experiments are often considered as a strong evidence of quantum spin liquid formation. When quantum spin liquid is indeed the ground state of a disorder-free magnetic compound, the elementary excitation is no longer the conventional spin waves (magnons). Instead, the magnons fractionalize into spinons, leaving only a two-spino…
▽ More
Continuum of spin excitations observed in inelastic neutron scattering experiments are often considered as a strong evidence of quantum spin liquid formation. When quantum spin liquid is indeed the ground state of a disorder-free magnetic compound, the elementary excitation is no longer the conventional spin waves (magnons). Instead, the magnons fractionalize into spinons, leaving only a two-spinon continuum detectable in inelastic neutron scattering experiments. For a clean ordered antiferromagnet, it was unclear if we can observe a continuous spectrum similar to the ones in a quantum spin liquid state. Here we show that the magnetically ordered state in Na$_2$BaCo(PO$_4$)$_2$ is able to host a spin excitation continuum induced by strong quantum fluctuations. Thus, a second thought is necessary when concluding such continuum as signature of quantum spin liquid in new material explorations.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Constraining the EdGB Theory with Extreme Mass-Ratio Inspirals
Authors:
Jing Tan,
Jian-dong Zhang,
Hui-Min Fan,
Jianwei Mei
Abstract:
The Einstein-dilaton-Gauss-Bonnet (EdGB) theory is a modified theory which include a scalar field to couple with the higher order curvature terms. It has already been constrained with various observations include the gravitational wave (GW) with LIGO, Virgo and KAGRA (LVK) Collaboration. In this work, we study the problem of using the GW of Extreme Mass-Ratio Inspiral (EMRI) to constrain the EdGB…
▽ More
The Einstein-dilaton-Gauss-Bonnet (EdGB) theory is a modified theory which include a scalar field to couple with the higher order curvature terms. It has already been constrained with various observations include the gravitational wave (GW) with LIGO, Virgo and KAGRA (LVK) Collaboration. In this work, we study the problem of using the GW of Extreme Mass-Ratio Inspiral (EMRI) to constrain the EdGB theory. We use the "numerical kludge (NK)" method to construct the waveform of EMRI in the EdGB theory, focusing on the case when the central black hole is spinless. We then study how a future space-borne gravitational wave detector, TianQin, for example, can place constraints on the EdGB theory through the detection of EMRIs. With the analysis using mismatch and Fisher Information Matrix (FM), we find that the EdGB parameter $\sqrtα$ is expected to be constrained to the level of $\sim\mathcal{O}(0.1)$ km.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Beyond Expectations: Learning with Stochastic Dominance Made Practical
Authors:
Shicong Cen,
Jincheng Mei,
Hanjun Dai,
Dale Schuurmans,
Yuejie Chi,
Bo Dai
Abstract:
Stochastic dominance models risk-averse preferences for decision making with uncertain outcomes, which naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations. Despite theoretically appealing, the application of stochastic dominance in machine learning has been scarce, due to the following challenges: $\textbf{i)}$, the original…
▽ More
Stochastic dominance models risk-averse preferences for decision making with uncertain outcomes, which naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations. Despite theoretically appealing, the application of stochastic dominance in machine learning has been scarce, due to the following challenges: $\textbf{i)}$, the original concept of stochastic dominance only provides a $\textit{partial order}$, therefore, is not amenable to serve as an optimality criterion; and $\textbf{ii)}$, an efficient computational recipe remains lacking due to the continuum nature of evaluating stochastic dominance.%, which barriers its application for machine learning.
In this work, we make the first attempt towards establishing a general framework of learning with stochastic dominance. We first generalize the stochastic dominance concept to enable feasible comparisons between any arbitrary pair of random variables. We next develop a simple and computationally efficient approach for finding the optimal solution in terms of stochastic dominance, which can be seamlessly plugged into many learning tasks. Numerical experiments demonstrate that the proposed method achieves comparable performance as standard risk-neutral strategies and obtains better trade-offs against risk across a variety of applications including supervised learning, reinforcement learning, and portfolio optimization.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Testing space-time non-commutativity with TianQin
Authors:
Zeyu Huang,
Changfu Shi,
Xiangyu Lyu,
Jianwei Mei
Abstract:
The direct detection of gravitational waves offers a powerful tool to explore the nature of gravity and the structure of space-time. This paper focuses on the capabilities of space-based gravitational wave detectors in testing space-time non-commutativity. Our findings indicate that TianQin has the potential to impose constraints on the non-commutative scale at a sub-Planckian level using massive…
▽ More
The direct detection of gravitational waves offers a powerful tool to explore the nature of gravity and the structure of space-time. This paper focuses on the capabilities of space-based gravitational wave detectors in testing space-time non-commutativity. Our findings indicate that TianQin has the potential to impose constraints on the non-commutative scale at a sub-Planckian level using massive black hole binaries. Additionally, we have developed a pipeline tailored to this specific topic.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Bayesian parameter estimation of massive black hole binaries with TianQin-LISA
Authors:
Jie Gao,
Yi-Ming Hu,
En-Kun Li,
Jian-dong Zhang,
Jianwei Mei
Abstract:
This paper analyses the impact of various parameter changes on the estimation of parameters for massive black hole binary (MBHB) systems using a Bayesian inference technique. Several designed MBHB systems were chosen for comparison with a fiducial system to explore the influence of parameters such as sky location, inclination angle, anti-spin, large mass ratio and light mass. And the two reported…
▽ More
This paper analyses the impact of various parameter changes on the estimation of parameters for massive black hole binary (MBHB) systems using a Bayesian inference technique. Several designed MBHB systems were chosen for comparison with a fiducial system to explore the influence of parameters such as sky location, inclination angle, anti-spin, large mass ratio and light mass. And the two reported MBHB candidates named OJ287 and Tick-Tock are also considered. The study found that the network of TianQin and LISA can break certain degeneracies among different parameters, improving the estimation of parameters, particularly for extrinsic parameters. Meanwhile, the degeneracies between different intrinsic parameters are highly sensitive to the value of the parameters. Additionally, the small inclination angles and limited detection of the inspiral phase can introduce significant bias in the estimation of parameters. The presence of instrument noise will also introduce bias and worsen the precision. The paper concludes that the network of TianQin and LISA can significantly improve the estimation of extrinsic parameters by about one order of magnitude while yielding slight improvements in the intrinsic parameters. Moreover, parameter estimation can still be subject to biases even with a sufficiently high signal-to-noise ratio if the detected signal does not encompass all stages of the inspiral, merger, and ringdown.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Electronic and magnetic excitations in La$_3$Ni$_2$O$_7$
Authors:
Xiaoyang Chen,
Jaewon Choi,
Zhicheng Jiang,
Jiong Mei,
Kun Jiang,
Jie Li,
Stefano Agrestini,
Mirian Garcia-Fernandez,
Xing Huang,
Hualei Sun,
Dawei Shen,
Meng Wang,
Jiangping Hu,
Yi Lu,
Ke-Jin Zhou,
Donglai Feng
Abstract:
The striking discovery of high-temperature superconductivity (HTSC) of 80 K in a bilayer nickelate La$_3$Ni$_2$O$_7$ under a moderately high pressure of about 14 GPa ignited a new wave of studying HTSC in nickelates. The properties of the parental phase at ambient pressure may contain key information on basic interactions therein and bosons that may mediate pairing giving birth to superconductivit…
▽ More
The striking discovery of high-temperature superconductivity (HTSC) of 80 K in a bilayer nickelate La$_3$Ni$_2$O$_7$ under a moderately high pressure of about 14 GPa ignited a new wave of studying HTSC in nickelates. The properties of the parental phase at ambient pressure may contain key information on basic interactions therein and bosons that may mediate pairing giving birth to superconductivity. Moreover, the bilayer structure of La$_3$Ni$_2$O$_7$ may suggest a distinct minimal model in comparison to cuprate superconductors. Here using X-ray absorption spectroscopy and resonant inelastic X-ray scattering, we studied La$_3$Ni$_2$O$_7$ at ambient pressure, and found that Ni 3$d_{x^2-y^2}$, Ni 3$d_{z^2}$, and ligand oxygen 2$p$ orbitals dominate the low-energy physics with a small charge-transfer energy. Remarkably, well-defined optical-like magnetic excitations were found to soften into a quasi-static spin-density-wave ordering, evidencing the strong electronic correlations and rich magnetic properties. Based on a Heisenberg spin model, we found that the inter-layer effective magnetic superexchange interaction is much larger than the intra-layer ones, and proposed two viable magnetic structures. Our results set the foundation for further exploration of La$_3$Ni$_2$O$_7$ superconductor.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
Authors:
Mengmeng Wang,
Jiazheng Xing,
Boyuan Jiang,
Jun Chen,
Jianbiao Mei,
Xingxing Zuo,
Guang Dai,
Jingdong Wang,
Yong Liu
Abstract:
Recently, the rise of large-scale vision-language pretrained models like CLIP, coupled with the technology of Parameter-Efficient FineTuning (PEFT), has captured substantial attraction in video action recognition. Nevertheless, prevailing approaches tend to prioritize strong supervised performance at the expense of compromising the models' generalization capabilities during transfer. In this paper…
▽ More
Recently, the rise of large-scale vision-language pretrained models like CLIP, coupled with the technology of Parameter-Efficient FineTuning (PEFT), has captured substantial attraction in video action recognition. Nevertheless, prevailing approaches tend to prioritize strong supervised performance at the expense of compromising the models' generalization capabilities during transfer. In this paper, we introduce a novel Multimodal, Multi-task CLIP adapting framework named \name to address these challenges, preserving both high supervised performance and robust transferability. Firstly, to enhance the individual modality architectures, we introduce multimodal adapters to both the visual and text branches. Specifically, we design a novel visual TED-Adapter, that performs global Temporal Enhancement and local temporal Difference modeling to improve the temporal representation capabilities of the visual encoder. Moreover, we adopt text encoder adapters to strengthen the learning of semantic label information. Secondly, we design a multi-task decoder with a rich set of supervisory signals to adeptly satisfy the need for strong supervised performance and generalization within a multimodal framework. Experimental results validate the efficacy of our approach, demonstrating exceptional performance in supervised learning while maintaining strong generalization in zero-shot scenarios.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Bayesian analysis of gravitational wave memory effect with TianQin
Authors:
Shuo Sun,
Changfu Shi,
Jian-dong Zhang,
Jianwei Mei
Abstract:
The memory effect in gravitational waves is a direct prediction of general relativity. The presence of the memory effect in gravitational wave signals not only serves as a test for general relativity but also establishes connections between soft theorem, and asymptotic symmetries, serving as a bridge for exploring fundamental physics. Furthermore, with the ongoing progress in space-based gravitati…
▽ More
The memory effect in gravitational waves is a direct prediction of general relativity. The presence of the memory effect in gravitational wave signals not only serves as a test for general relativity but also establishes connections between soft theorem, and asymptotic symmetries, serving as a bridge for exploring fundamental physics. Furthermore, with the ongoing progress in space-based gravitational wave detection projects, the gravitational wave memory effect generated by the merger of massive binary black hole binaries is becoming increasingly significant and cannot be ignored. In this work, we perform the full Bayesian analysis of the gravitational wave memory effect with TianQin. The results indicate that the memory effect has a certain impact on parameter estimation but does not deviate beyond the 1$σ$ range. Additionally, the Bayes factor analysis suggests that when the signal-to-noise ratio of the memory effect in TianQin is approximately 2.36, the $\text{log}_{10}$ Bayes factor reaches 8. This result is consistent with the findings obtained from a previous mismatch threshold.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Operation Scheme Optimizations to Achieve Ultra-high Endurance (1010) in Flash Memory with Robust Reliabilities
Authors:
Yang Feng,
Zhaohui Sun,
Chengcheng Wang,
Xinyi Guo,
Junyao Mei,
Yueran Qi,
Jing Liu,
Junyu Zhang,
Jixuan Wu,
Xuepeng Zhan,
Jiezhi Chen
Abstract:
Flash memory has been widely adopted as stand-alone memory and embedded memory due to its robust reliability. However, the limited endurance obstacles its further applications in storage class memory (SCM) and to proceed endurance-required computing-in-memory (CIM) tasks. In this work, the optimization strategies have been studied to tackle this concern. It is shown that by adopting the channel ho…
▽ More
Flash memory has been widely adopted as stand-alone memory and embedded memory due to its robust reliability. However, the limited endurance obstacles its further applications in storage class memory (SCM) and to proceed endurance-required computing-in-memory (CIM) tasks. In this work, the optimization strategies have been studied to tackle this concern. It is shown that by adopting the channel hot electrons injection (CHEI) and hot hole injection (HHI) to implement program/erase (PE) cycling together with a balanced memory window (MW) at the high-Vth (HV) mode, impressively, the endurance can be greatly extended to 1010 PE cycles, which is a record-high value in flash memory. Moreover, by using the proposed electric-field-assisted relaxation (EAR) scheme, the degradation of flash cells can be well suppressed with better subthreshold swings (SS) and lower leakage currents (sub-10pA after 1010 PE cycles). Our results shed light on the optimization strategy of flash memory to serve as SCM and implementendurance-required CIM tasks.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Digital Retina for IoV Towards 6G: Architecture, Opportunities, and Challenges
Authors:
Kan Zheng,
Jie Mei,
Haojun Yang,
Lu Hou,
Siwei Ma
Abstract:
Vehicles are no longer isolated entities in traffic environments, thanks to the development of IoV powered by 5G networks and their evolution into 6G. However, it is not enough for vehicles in a highly dynamic and complex traffic environment to make reliable and efficient decisions. As a result, this paper proposes a cloud-edge-end computing system with multi-streams for IoV, referred to as Vehicu…
▽ More
Vehicles are no longer isolated entities in traffic environments, thanks to the development of IoV powered by 5G networks and their evolution into 6G. However, it is not enough for vehicles in a highly dynamic and complex traffic environment to make reliable and efficient decisions. As a result, this paper proposes a cloud-edge-end computing system with multi-streams for IoV, referred to as Vehicular Digital Retina (VDR). Local computing and edge computing are effectively integrated in the VDR system through the aid of vehicle-to-everything (V2X) networks, resulting in a heterogeneous computing environment that improves vehicles' perception and decision-making abilities with collaborative strategies. Once the system framework is presented, various important functions in the VDR system are explained in detail, including V2X-aided collaborative perception, V2X-aided stream sharing for collaborative learning, and V2X-aided secured collaboration. All of them enable the development of efficient mechanisms of data sharing and information interaction with high security for collaborative intelligent driving. We also present a case study with simulation results to demonstrate the effectiveness of the proposed VDR system.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
SPFormer: Enhancing Vision Transformer with Superpixel Representation
Authors:
Jieru Mei,
Liang-Chieh Chen,
Alan Yuille,
Cihang Xie
Abstract:
In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation. Addressing the limitations of traditional Vision Transformers' fixed-size, non-adaptive patch partitioning, SPFormer employs superpixels that adapt to the image's content. This approach divides the image into irregular, semantically coherent regions, effectively capturing intricate details and ap…
▽ More
In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation. Addressing the limitations of traditional Vision Transformers' fixed-size, non-adaptive patch partitioning, SPFormer employs superpixels that adapt to the image's content. This approach divides the image into irregular, semantically coherent regions, effectively capturing intricate details and applicable at both initial and intermediate feature levels.
SPFormer, trainable end-to-end, exhibits superior performance across various benchmarks. Notably, it exhibits significant improvements on the challenging ImageNet benchmark, achieving a 1.4% increase over DeiT-T and 1.1% over DeiT-S respectively. A standout feature of SPFormer is its inherent explainability. The superpixel structure offers a window into the model's internal processes, providing valuable insights that enhance the model's interpretability. This level of clarity significantly improves SPFormer's robustness, particularly in challenging scenarios such as image rotations and occlusions, demonstrating its adaptability and resilience.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Anomalous exchange bias effect in ferromagnetic VI3 flakes
Authors:
Xi Zhang,
Xiuquan Xia,
Qiye Liu,
Yonggang He,
Le Wang,
Junhao Lin,
Jia-Wei Mei,
Yingchun Cheng,
Jun-Feng Dai
Abstract:
The exchange bias (EB) effect, pivotal in magnetic data storage and sensing devices, has been observed not only in interfacial regions but also in intrinsic ferromagnetic materials. Here, we've uncovered a robust and stable exchange bias effect within the layered van der Waals (vdW) ferromagnet VI3 employing magnetic circular dichroism microscopy. At 10 K, we observed a significant exchange field…
▽ More
The exchange bias (EB) effect, pivotal in magnetic data storage and sensing devices, has been observed not only in interfacial regions but also in intrinsic ferromagnetic materials. Here, we've uncovered a robust and stable exchange bias effect within the layered van der Waals (vdW) ferromagnet VI3 employing magnetic circular dichroism microscopy. At 10 K, we observed a significant exchange field of approximately 0.1 T, accompanied by random shifts (positive or negative relative to zero magnetic field) after zero-field cooling. Notably, this effect is effectively controllable after field cooling, with shift direction opposing the applied magnetic field. The presence of strong magnetic anisotropic energy within VI3 results in larger coercivity-bound magnetic domains. These domains dictate the neighboring ferromagnetic alignment and induce shifts in the hysteresis loop. Our study not only contributes to comprehending fundamental nanoscale magnetic interactions but also sheds light on emergent phenomena within layered van der Waals magnets.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
The Subtle Simplicity of Cosmological Correlators
Authors:
Chandramouli Chowdhury,
Arthur Lipstein,
Jiajie Mei,
Ivo Sachs,
Pierre Vanhove
Abstract:
We investigate cosmological correlators for conformally coupled $φ^4$ theory in four-dimensional de Sitter space. These \textit{in-in} correlators differ from scattering amplitudes for massless particles in flat space due to the spacelike structure of future infinity in de Sitter. They also require a regularization which preserves de Sitter-invariance, which makes the flat space limit subtle to de…
▽ More
We investigate cosmological correlators for conformally coupled $φ^4$ theory in four-dimensional de Sitter space. These \textit{in-in} correlators differ from scattering amplitudes for massless particles in flat space due to the spacelike structure of future infinity in de Sitter. They also require a regularization which preserves de Sitter-invariance, which makes the flat space limit subtle to define at loop-level. Nevertheless we find that up to two loops, the \textit{in-in} correlators are structurally simpler than the wave function and have the same transcendentality as flat space amplitudes. Moreover, we show that their loop integrands can be recast in terms of flat space integrands and can be derived from a novel recursion relation.
△ Less
Submitted 29 February, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties
Authors:
Junfei Xiao,
Ziqi Zhou,
Wenxuan Li,
Shiyi Lan,
Jieru Mei,
Zhiding Yu,
Alan Yuille,
Yuyin Zhou,
Cihang Xie
Abstract:
This paper introduces ProLab, a novel approach using property-level label space for creating strong interpretable segmentation models. Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models. It is based on two core designs. First, we employ Large Language Models (LLMs) and carefully craft…
▽ More
This paper introduces ProLab, a novel approach using property-level label space for creating strong interpretable segmentation models. Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models. It is based on two core designs. First, we employ Large Language Models (LLMs) and carefully crafted prompts to generate descriptions of all involved categories that carry meaningful common sense knowledge and follow a structured format. Second, we introduce a description embedding model preserving semantic correlation across descriptions and then cluster them into a set of descriptive properties (e.g., 256) using K-Means. These properties are based on interpretable common sense knowledge consistent with theories of human recognition. We empirically show that our approach makes segmentation models perform stronger on five classic benchmarks (e.g., ADE20K, COCO-Stuff, Pascal Context, Cityscapes, and BDD). Our method also shows better scalability with extended training steps than category-level supervision. Our interpretable segmentation framework also emerges with the generalization ability to segment out-of-domain or unknown categories using only in-domain descriptive properties. Code is available at https://github.com/lambert-x/ProLab.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
CR-SFP: Learning Consistent Representation for Soft Filter Pruning
Authors:
Jingyang Xiang,
Zhuangzhi Chen,
Jianbiao Mei,
Siqi Li,
Jun Chen,
Yong Liu
Abstract:
Soft filter pruning~(SFP) has emerged as an effective pruning technique for allowing pruned filters to update and the opportunity for them to regrow to the network. However, this pruning strategy applies training and pruning in an alternative manner, which inevitably causes inconsistent representations between the reconstructed network~(R-NN) at the training and the pruned network~(P-NN) at the in…
▽ More
Soft filter pruning~(SFP) has emerged as an effective pruning technique for allowing pruned filters to update and the opportunity for them to regrow to the network. However, this pruning strategy applies training and pruning in an alternative manner, which inevitably causes inconsistent representations between the reconstructed network~(R-NN) at the training and the pruned network~(P-NN) at the inference, resulting in performance degradation. In this paper, we propose to mitigate this gap by learning consistent representation for soft filter pruning, dubbed as CR-SFP. Specifically, for each training step, CR-SFP optimizes the R-NN and P-NN simultaneously with different distorted versions of the same training data, while forcing them to be consistent by minimizing their posterior distribution via the bidirectional KL-divergence loss. Meanwhile, the R-NN and P-NN share backbone parameters thus only additional classifier parameters are introduced. After training, we can export the P-NN for inference. CR-SFP is a simple yet effective training framework to improve the accuracy of P-NN without introducing any additional inference cost. It can also be combined with a variety of pruning criteria and loss functions. Extensive experiments demonstrate our CR-SFP achieves consistent improvements across various CNN architectures. Notably, on ImageNet, our CR-SFP reduces more than 41.8\% FLOPs on ResNet18 with 69.2\% top-1 accuracy, improving SFP by 2.1\% under the same training settings. The code will be publicly available on GitHub.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning
Authors:
Bingchen Zhao,
Haoqin Tu,
Chen Wei,
Jieru Mei,
Cihang Xie
Abstract:
This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a domain adaptation process, i.e., transitioning from text understanding to embracing multiple modalities, we intriguingly note that, within each attention block, tuning LayerNorm suffices to yield strong performance. Moreov…
▽ More
This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a domain adaptation process, i.e., transitioning from text understanding to embracing multiple modalities, we intriguingly note that, within each attention block, tuning LayerNorm suffices to yield strong performance. Moreover, when benchmarked against other tuning approaches like full parameter finetuning or LoRA, its benefits on efficiency are substantial. For example, when compared to LoRA on a 13B model scale, performance can be enhanced by an average of over 20% across five multi-modal tasks, and meanwhile, results in a significant reduction of trainable parameters by 41.9% and a decrease in GPU memory usage by 17.6%. On top of this LayerNorm strategy, we showcase that selectively tuning only with conversational data can improve efficiency further. Beyond these empirical outcomes, we provide a comprehensive analysis to explore the role of LayerNorm in adapting LLMs to the multi-modal domain and improving the expressive power of the model.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Camera-based 3D Semantic Scene Completion with Sparse Guidance Network
Authors:
Jianbiao Mei,
Yu Yang,
Mengmeng Wang,
Junyu Zhu,
Xiangrui Zhao,
Jongwon Ra,
Laijian Li,
Yong Liu
Abstract:
Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D mod…
▽ More
Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D models to directly process the lifted 3D features that are not discriminative enough for clear segmentation boundaries. In this paper, we adopt the dense-sparse-dense design and propose an end-to-end camera-based SSC framework, termed SGN, to diffuse semantics from the semantic- and occupancy-aware seed voxels to the whole scene based on geometry prior and occupancy information. By designing hybrid guidance (sparse semantic and geometry guidance) and effective voxel aggregation for spatial occupancy and geometry priors, we enhance the feature separation between different categories and expedite the convergence of semantic diffusion. Extensive experimental results on the SemanticKITTI dataset demonstrate the superiority of our SGN over existing state-of-the-art methods.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Gauge symmetry of excited states in projected entangled-pair state simulations
Authors:
Yi Tan,
Ji-Yao Chen,
Didier Poilblanc,
Jia-Wei Mei
Abstract:
While gauge symmetry is a well-established requirement for representing topological orders in projected entangled-pair state (PEPS), its impact on the properties of low-lying excited states remains relatively unexplored. Here we perform PEPS simulations of low-energy dynamics in the Kitaev honeycomb model, which supports fractionalized gauge flux (vison) excitations. We identify gauge symmetry eme…
▽ More
While gauge symmetry is a well-established requirement for representing topological orders in projected entangled-pair state (PEPS), its impact on the properties of low-lying excited states remains relatively unexplored. Here we perform PEPS simulations of low-energy dynamics in the Kitaev honeycomb model, which supports fractionalized gauge flux (vison) excitations. We identify gauge symmetry emerging upon optimizing an unbiased PEPS ground state. Using the PEPS adapted local mode approximation, we further classify the low-lying excited states by discerning different vison sectors. Our simulations of spin and spin-dimer dynamical correlations establish close connections with experimental observations. Notably, the selection rule imposed by the locally conserved visons results in nearly flat dispersions in momentum space for excited states belonging to the 2-vison or 4-vison sectors.
△ Less
Submitted 7 April, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
Authors:
Feng Wang,
Jieru Mei,
Alan Yuille
Abstract:
Recent advances in contrastive language-image pretraining (CLIP) have demonstrated strong capabilities in zero-shot classification by aligning visual representations with target text embeddings in an image level. However, in dense prediction tasks, CLIP often struggles to localize visual features within an image and fails to give accurate pixel-level predictions, which prevents it from functioning…
▽ More
Recent advances in contrastive language-image pretraining (CLIP) have demonstrated strong capabilities in zero-shot classification by aligning visual representations with target text embeddings in an image level. However, in dense prediction tasks, CLIP often struggles to localize visual features within an image and fails to give accurate pixel-level predictions, which prevents it from functioning as a generalized visual foundation model. In this work, we aim to enhance CLIP's potential for semantic segmentation with minimal modifications to its pretrained models. By rethinking self-attention, we surprisingly find that CLIP can adapt to dense prediction tasks by simply introducing a novel Correlative Self-Attention (CSA) mechanism. Specifically, we replace the traditional self-attention block of CLIP vision encoder's last layer by our CSA module and reuse its pretrained projection matrices of query, key, and value, leading to a training-free adaptation approach for CLIP's zero-shot semantic segmentation. Extensive experiments show the advantage of CSA: we obtain a 38.2% average zero-shot mIoU across eight semantic segmentation benchmarks highlighted in this paper, significantly outperforming the existing SoTA's 33.9% and the vanilla CLIP's 14.1%.
△ Less
Submitted 2 January, 2024; v1 submitted 3 December, 2023;
originally announced December 2023.
-
Identifying Majorana Zero Modes in Vortex Lattices Using Fano Factor Tomography
Authors:
Jiong Mei,
Kun Jiang,
Jiangping Hu
Abstract:
In this work, we investigate the tunneling characteristics of Majorana zero modes (MZMs) in vortex lattices based on scanning tunneling microscopy measurement. We find that zero bias conductance does not reach the quantized value owing to the coupling between the MZMs. On the contrary, the Fano factor measured in the high voltage regime reflects the local particle-hole asymmetry of the bound state…
▽ More
In this work, we investigate the tunneling characteristics of Majorana zero modes (MZMs) in vortex lattices based on scanning tunneling microscopy measurement. We find that zero bias conductance does not reach the quantized value owing to the coupling between the MZMs. On the contrary, the Fano factor measured in the high voltage regime reflects the local particle-hole asymmetry of the bound states and is insensitive to the energy splitting between them. We propose using spatially resolved Fano factor tomography as a tool to identify the existence of MZMs. In both cases of isolated MZM or MZMs forming bands, there is a spatially resolved Fano factor plateau at one in the vicinity of a vortex core, regardless of the tunneling parameter details, which is in stark contrast to other trivial bound states. These results reveal new tunneling properties of MZMs in vortex lattices and provide measurement tools for topological quantum devices.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Cascaded Interaction with Eroded Deep Supervision for Salient Object Detection
Authors:
Hewen Xiao,
Jie Mei,
Guangfu Ma,
Weiren Wu
Abstract:
Deep convolutional neural networks have been widely applied in salient object detection and have achieved remarkable results in this field. However, existing models suffer from information distortion caused by interpolation during up-sampling and down-sampling. In response to this drawback, this article starts from two directions in the network: feature and label. On the one hand, a novel cascaded…
▽ More
Deep convolutional neural networks have been widely applied in salient object detection and have achieved remarkable results in this field. However, existing models suffer from information distortion caused by interpolation during up-sampling and down-sampling. In response to this drawback, this article starts from two directions in the network: feature and label. On the one hand, a novel cascaded interaction network with a guidance module named global-local aligned attention (GAA) is designed to reduce the negative impact of interpolation on the feature side. On the other hand, a deep supervision strategy based on edge erosion is proposed to reduce the negative guidance of label interpolation on lateral output. Extensive experiments on five popular datasets demonstrate the superiority of our method.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.