-
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
Authors:
Shunqi Mao,
Chaoyi Zhang,
Hang Su,
Hwanjun Song,
Igor Shalyminov,
Weidong Cai
Abstract:
Contextualized Image Captioning (CIC) evolves traditional image captioning into a more complex domain, necessitating the ability for multimodal reasoning. It aims to generate image captions given specific contextual information. This paper further introduces a novel domain of Controllable Contextualized Image Captioning (Ctrl-CIC). Unlike CIC, which solely relies on broad context, Ctrl-CIC accentu…
▽ More
Contextualized Image Captioning (CIC) evolves traditional image captioning into a more complex domain, necessitating the ability for multimodal reasoning. It aims to generate image captions given specific contextual information. This paper further introduces a novel domain of Controllable Contextualized Image Captioning (Ctrl-CIC). Unlike CIC, which solely relies on broad context, Ctrl-CIC accentuates a user-defined highlight, compelling the model to tailor captions that resonate with the highlighted aspects of the context. We present two approaches, Prompting-based Controller (P-Ctrl) and Recalibration-based Controller (R-Ctrl), to generate focused captions. P-Ctrl conditions the model generation on highlight by prepending captions with highlight-driven prefixes, whereas R-Ctrl tunes the model to selectively recalibrate the encoder embeddings for highlighted tokens. Additionally, we design a GPT-4V empowered evaluator to assess the quality of the controlled captions alongside standard assessment methods. Extensive experimental results demonstrate the efficient and effective controllability of our method, charting a new direction in achieving user-adaptive image captioning. Code is available at https://github.com/ShunqiM/Ctrl-CIC .
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening
Authors:
Siyuan Cheng,
Guangyu Shen,
Kaiyuan Zhang,
Guanhong Tao,
Shengwei An,
Hanxi Guo,
Shiqing Ma,
Xiangyu Zhang
Abstract:
Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent ad…
▽ More
Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent advanced attacks. In this paper, we introduce a novel post-training defense technique UNIT that can effectively eliminate backdoor effects for a variety of attacks. In specific, UNIT approximates a unique and tight activation distribution for each neuron in the model. It then proactively dispels substantially large activation values that exceed the approximated boundaries. Our experimental results demonstrate that UNIT outperforms 7 popular defense methods against 14 existing backdoor attacks, including 2 advanced attacks, using only 5\% of clean training data. UNIT is also cost efficient. The code is accessible at https://github.com/Megum1/UNIT.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models
Authors:
Qingcheng Zeng,
Mingyu Jin,
Qinkai Yu,
Zhenting Wang,
Wenyue Hua,
Zihao Zhou,
Guangyan Sun,
Yanda Meng,
Shiqing Ma,
Qifan Wang,
Felix Juefei-Xu,
Kaize Ding,
Fan Yang,
Ruixiang Tang,
Yongfeng Zhang
Abstract:
Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial. One commonly used method to assess the reliability of LLMs' responses is uncertainty estimation, which gauges the likelihood of their answers being correct. While many studies focus on improving the accuracy of uncertainty estimations for LLMs, our research investigates…
▽ More
Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial. One commonly used method to assess the reliability of LLMs' responses is uncertainty estimation, which gauges the likelihood of their answers being correct. While many studies focus on improving the accuracy of uncertainty estimations for LLMs, our research investigates the fragility of uncertainty estimation and explores potential attacks. We demonstrate that an attacker can embed a backdoor in LLMs, which, when activated by a specific trigger in the input, manipulates the model's uncertainty without affecting the final output. Specifically, the proposed backdoor attack method can alter an LLM's output probability distribution, causing the probability distribution to converge towards an attacker-predefined distribution while ensuring that the top-1 prediction remains unchanged. Our experimental results demonstrate that this attack effectively undermines the model's self-evaluation reliability in multiple-choice questions. For instance, we achieved a 100 attack success rate (ASR) across three different triggering strategies in four models. Further, we investigate whether this manipulation generalizes across different prompts and domains. This work highlights a significant threat to the reliability of LLMs and underscores the need for future defenses against such attacks. The code is available at https://github.com/qcznlp/uncertainty_attack.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Authors:
Hongyu Wang,
Shuming Ma,
Ruiping Wang,
Furu Wei
Abstract:
We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs). Q-Sparse enables full sparsity of activations in LLMs which can bring significant efficiency gains in inference. This is achieved by applying top-K sparsification to the activations and the straight-through-estimator to the training. The key results from this work are, (1) Q-Sparse…
▽ More
We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs). Q-Sparse enables full sparsity of activations in LLMs which can bring significant efficiency gains in inference. This is achieved by applying top-K sparsification to the activations and the straight-through-estimator to the training. The key results from this work are, (1) Q-Sparse can achieve results comparable to those of baseline LLMs while being much more efficient at inference time; (2) We present an inference-optimal scaling law for sparsely-activated LLMs; (3) Q-Sparse is effective in different settings, including training-from-scratch, continue-training of off-the-shelf LLMs, and finetuning; (4) Q-Sparse works for both full-precision and 1-bit LLMs (e.g., BitNet b1.58). Particularly, the synergy of BitNet b1.58 and Q-Sparse (can be equipped with MoE) provides the cornerstone and a clear path to revolutionize the efficiency, including cost and energy consumption, of future LLMs.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval
Authors:
Shengjie Ma,
Chengjin Xu,
Xuhui Jiang,
Muzhi Li,
Huaren Qu,
Jian Guo
Abstract:
Retrieval-augmented generation (RAG) has significantly advanced large language models (LLMs) by enabling dynamic information retrieval to mitigate knowledge gaps and hallucinations in generated content. However, these systems often falter with complex reasoning and consistency across diverse queries. In this work, we present Think-on-Graph 2.0, an enhanced RAG framework that aligns questions with…
▽ More
Retrieval-augmented generation (RAG) has significantly advanced large language models (LLMs) by enabling dynamic information retrieval to mitigate knowledge gaps and hallucinations in generated content. However, these systems often falter with complex reasoning and consistency across diverse queries. In this work, we present Think-on-Graph 2.0, an enhanced RAG framework that aligns questions with the knowledge graph and uses it as a navigational tool, which deepens and refines the RAG paradigm for information collection and integration. The KG-guided navigation fosters deep and long-range associations to uphold logical consistency and optimize the scope of retrieval for precision and interoperability. In conjunction, factual consistency can be better ensured through semantic similarity guided by precise directives. ToG${2.0}$ not only improves the accuracy and reliability of LLMs' responses but also demonstrates the potential of hybrid structured knowledge systems to significantly advance LLM reasoning, aligning it closer to human-like performance. We conducted extensive experiments on four public datasets to demonstrate the advantages of our method compared to the baseline.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models
Authors:
Xinjian Wu,
Ruisong Zhang,
Jie Qin,
Shijie Ma,
Cheng-Lin Liu
Abstract:
Segmenting and recognizing diverse object parts is crucial in computer vision and robotics. Despite significant progress in object segmentation, part-level segmentation remains underexplored due to complex boundaries and scarce annotated data. To address this, we propose a novel Weakly-supervised Part Segmentation (WPS) setting and an approach called WPS-SAM, built on the large-scale pre-trained v…
▽ More
Segmenting and recognizing diverse object parts is crucial in computer vision and robotics. Despite significant progress in object segmentation, part-level segmentation remains underexplored due to complex boundaries and scarce annotated data. To address this, we propose a novel Weakly-supervised Part Segmentation (WPS) setting and an approach called WPS-SAM, built on the large-scale pre-trained vision foundation model, Segment Anything Model (SAM). WPS-SAM is an end-to-end framework designed to extract prompt tokens directly from images and perform pixel-level segmentation of part regions. During its training phase, it only uses weakly supervised labels in the form of bounding boxes or points. Extensive experiments demonstrate that, through exploiting the rich knowledge embedded in pre-trained foundation models, WPS-SAM outperforms other segmentation models trained with pixel-level strong annotations. Specifically, WPS-SAM achieves 68.93% mIOU and 79.53% mACC on the PartImageNet dataset, surpassing state-of-the-art fully supervised methods by approximately 4% in terms of mIOU.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Non-Hermitian dynamics of Cooper pair splitter
Authors:
E. S. Ma,
Z. Song
Abstract:
We propose a non-Hermitian model for Cooper pair splitters, in which the process of electron tunneling into electrodes is characterized by non-Hermitian terms. We find that across a broad range of parameters, the energy levels consistently remain real, and coalescing states are always present. The Coulomb repulsion between electrons in a quantum dot affects the order of the coalescing states. This…
▽ More
We propose a non-Hermitian model for Cooper pair splitters, in which the process of electron tunneling into electrodes is characterized by non-Hermitian terms. We find that across a broad range of parameters, the energy levels consistently remain real, and coalescing states are always present. The Coulomb repulsion between electrons in a quantum dot affects the order of the coalescing states. This gives rise to two distinct dynamic behaviors: (i) when the initial state is an empty state, the final state supports a nonzero electron-escaping rate; (ii) the electron-escaping rate is zero for a single-electron initial state. In the former case, our exact solutions reveal that the average electron-escaping rate vanishes along a set of hyperbolic curves in the plane of the chemical potentials of the two quantum dots. The stability of the results in the presence of disordered perturbation is also investigated. Our findings pave the way for investigating Cooper pair splitters within the framework of non-Hermitian quantum mechanics.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Redefinition of Digital Twin and its Situation Awareness Framework Designing Towards Fourth Paradigm for Energy Internet of Things
Authors:
Xing He,
Yuezhong Tang,
Shuyan Ma,
Qian Ai,
Fei Tao,
Robert Qiu
Abstract:
Traditional knowledge-based situation awareness (SA) modes struggle to adapt to the escalating complexity of today's Energy Internet of Things (EIoT), necessitating a pivotal paradigm shift. In response, this work introduces a pioneering data-driven SA framework, termed digital twin-based situation awareness (DT-SA), aiming to bridge existing gaps between data and demands, and further to enhance S…
▽ More
Traditional knowledge-based situation awareness (SA) modes struggle to adapt to the escalating complexity of today's Energy Internet of Things (EIoT), necessitating a pivotal paradigm shift. In response, this work introduces a pioneering data-driven SA framework, termed digital twin-based situation awareness (DT-SA), aiming to bridge existing gaps between data and demands, and further to enhance SA capabilities within the complex EIoT landscape. First, we redefine the concept of digital twin (DT) within the EIoT context, aligning it with data-intensive scientific discovery paradigm (the Fourth Paradigm) so as to waken EIoT's sleeping data; this contextual redefinition lays the cornerstone of our DT-SA framework for EIoT. Then, the framework is comprehensively explored through its four fundamental steps: digitalization, simulation, informatization, and intellectualization. These steps initiate a virtual ecosystem conducive to a continuously self-adaptive, self-learning, and self-evolving big model (BM), further contributing to the evolution and effectiveness of DT-SA in engineering. Our framework is characterized by the incorporation of system theory and Fourth Paradigm as guiding ideologies, DT as data engine, and BM as intelligence engine. This unique combination forms the backbone of our approach. This work extends beyond engineering, stepping into the domain of data science -- DT-SA not only enhances management practices for EIoT users/operators, but also propels advancements in pattern analysis and machine intelligence (PAMI) within the intricate fabric of a complex system. Numerous real-world cases validate our DT-SA framework.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Semantic Feature Division Multiple Access for Multi-user Digital Interference Networks
Authors:
Shuai Ma,
Chuanhui Zhang,
Bin Shen,
Youlong Wu,
Hang Li,
Shiyin Li,
Guangming Shi,
Naofal Al-Dhahir
Abstract:
With the ever-increasing user density and quality of service (QoS) demand,5G networks with limited spectrum resources are facing massive access challenges. To address these challenges, in this paper, we propose a novel discrete semantic feature division multiple access (SFDMA) paradigm for multi-user digital interference networks. Specifically, by utilizing deep learning technology, SFDMA extracts…
▽ More
With the ever-increasing user density and quality of service (QoS) demand,5G networks with limited spectrum resources are facing massive access challenges. To address these challenges, in this paper, we propose a novel discrete semantic feature division multiple access (SFDMA) paradigm for multi-user digital interference networks. Specifically, by utilizing deep learning technology, SFDMA extracts multi-user semantic information into discrete representations in distinguishable semantic subspaces, which enables multiple users to transmit simultaneously over the same time-frequency resources. Furthermore, based on a robust information bottleneck, we design a SFDMA based multi-user digital semantic interference network for inference tasks, which can achieve approximate orthogonal transmission. Moreover, we propose a SFDMA based multi-user digital semantic interference network for image reconstruction tasks, where the discrete outputs of the semantic encoders of the users are approximately orthogonal, which significantly reduces multi-user interference. Furthermore, we propose an Alpha-Beta-Gamma (ABG) formula for semantic communications, which is the first theoretical relationship between inference accuracy and transmission power. Then, we derive adaptive power control methods with closed-form expressions for inference tasks. Extensive simulations verify the effectiveness and superiority of the proposed SFDMA.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
$N$ -Laplacian and $N/2$-Hessian type equations with exponential reaction term and measure data
Authors:
Shiguang Ma,
Zijian Wang
Abstract:
In this article, we will prove existence results for the equations of the type $-Δ_{N}u=H_{l}(u)+μ$ and $F_{\frac{N}{2}}[-u]=H_{l}(u)+μ$ in a bounded domain $Ω$, with Dirichlet boundary condition, where the source term $H_{l}(r)$ takes the form $e^{r}-\sum_{j=0}^{l-1}\frac{r^{j}}{j!}$ and $μ$ is a nonnegative Radon measure.
In this article, we will prove existence results for the equations of the type $-Δ_{N}u=H_{l}(u)+μ$ and $F_{\frac{N}{2}}[-u]=H_{l}(u)+μ$ in a bounded domain $Ω$, with Dirichlet boundary condition, where the source term $H_{l}(r)$ takes the form $e^{r}-\sum_{j=0}^{l-1}\frac{r^{j}}{j!}$ and $μ$ is a nonnegative Radon measure.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction
Authors:
Yili Liu,
Linzhan Mou,
Xuan Yu,
Chenrui Han,
Sitong Mao,
Rong Xiong,
Yue Wang
Abstract:
Accurate perception of the dynamic environment is a fundamental task for autonomous driving and robot systems. This paper introduces Let Occ Flow, the first self-supervised work for joint 3D occupancy and occupancy flow prediction using only camera inputs, eliminating the need for 3D annotations. Utilizing TPV for unified scene representation and deformable attention layers for feature aggregation…
▽ More
Accurate perception of the dynamic environment is a fundamental task for autonomous driving and robot systems. This paper introduces Let Occ Flow, the first self-supervised work for joint 3D occupancy and occupancy flow prediction using only camera inputs, eliminating the need for 3D annotations. Utilizing TPV for unified scene representation and deformable attention layers for feature aggregation, our approach incorporates a backward-forward temporal attention module to capture dynamic object dependencies, followed by a 3D refine module for fine-gained volumetric representation. Besides, our method extends differentiable rendering to 3D volumetric flow fields, leveraging zero-shot 2D segmentation and optical flow cues for dynamic decomposition and motion optimization. Extensive experiments on nuScenes and KITTI datasets demonstrate the competitive performance of our approach over prior state-of-the-art methods.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Existence of positive solutions for Kirchhoff type problems with critical exponent in exterior domains
Authors:
Liqian Jia,
Xinfu Li,
Shiwang Ma
Abstract:
In this paper, by using variational methods we study the existence of positive solutions for the following Kirchhoff type problem: $$ \left\{ \begin{array}{ll} -\left(a+b\mathlarger{\int}_Ω|\nabla u|^{2}dx\right)Δu+V(x)u=u^{5}, \ & x\inΩ,\\ \\ u=0,\ & x\in\partial Ω, \end{array}\right. $$ where $a>0$, $b\geq0$, $Ω\subset\mathbb R^3$ is an unbounded exterior domain, $\partialΩ\neq\emptyset$,…
▽ More
In this paper, by using variational methods we study the existence of positive solutions for the following Kirchhoff type problem: $$ \left\{ \begin{array}{ll} -\left(a+b\mathlarger{\int}_Ω|\nabla u|^{2}dx\right)Δu+V(x)u=u^{5}, \ & x\inΩ,\\ \\ u=0,\ & x\in\partial Ω, \end{array}\right. $$ where $a>0$, $b\geq0$, $Ω\subset\mathbb R^3$ is an unbounded exterior domain, $\partialΩ\neq\emptyset$, $\mathbb{R}^{3}\backslashΩ$ is bounded, $u\in D_{0}^{1,2}(Ω)$, and $V\in L^{\frac{3}{2}}(Ω)$ is a non-negative continuous function. It turns out that the above Kirchhoff equation has no ground state solution. Nonetheless, by establishing some global compact lemma and constructing a suitable minimax value $c$ at a higher energy level where so called Palais-Smale condition holds, we succeed to obtain a positive solution for such a problem whenever $V$ and the hole $\mathbb{R}^{3}\setminusΩ$ are suitable small in some senses. To the best of our knowledge, there are few similar results published in the literature concerning the existence of positive solutions for Kirchhoff equation in exterior domains. Our result also holds true in the case $Ω=\mathbb R^3$, particularly, if $a=1$ and $b=0$, we improve some existing results (such as Benci, Cerami, Existence of positive solutions of the equation $-Δu+a(x)u=u^{(N+2)/(N-2)}$ in $\emph{R}^{N}$, J. Funct. Anal., 88 (1990), 90--117) for the corresponding Schrödinger equation in the whole space.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion
Authors:
Xiaoli Zhang,
Liying Wang,
Libo Zhao,
Xiongfei Li,
Siwei Ma
Abstract:
Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images. To tackle the problem of insufficient feature extraction and lack of semantic awareness for complex scenes, this paper focuses on how to model correlation-driven decomposing features and reason high-level graph representation by efficiently extracting complementary features and mult…
▽ More
Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images. To tackle the problem of insufficient feature extraction and lack of semantic awareness for complex scenes, this paper focuses on how to model correlation-driven decomposing features and reason high-level graph representation by efficiently extracting complementary features and multi-guided feature aggregation. We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy. The transformer with Multi-Dconv Transposed Attention and Local-enhanced Feed Forward network is used to extract shallow features after the depthwise convolution. In the three parallel branches encoder, Cross Attention and Invertible Block (CAI) enables to extract local features and preserve high-frequency texture details. Base feature extraction module (BFE) with residual connections can capture long-range dependency and enhance shared-modality expression capabilities. Graph Reasoning Module (GR) is introduced to reason high-level cross-modality relations and extract low-level details features as CAI's specific-modality complementary information simultaneously. Experiments demonstrate that our method has obtained competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks. Moreover, we surpass other fusion methods in terms of subsequent tasks, averagely scoring 9.78% mAP@.5 higher in object detection and 6.46% mIoU higher in semantic segmentation.
△ Less
Submitted 11 June, 2024;
originally announced July 2024.
-
Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning
Authors:
Yadong Zhang,
Shaoguang Mao,
Wenshan Wu,
Yan Xia,
Tao Ge,
Man Lan,
Furu Wei
Abstract:
This paper introduces BI-Directional DEliberation Reasoning (BIDDER), a novel reasoning approach to enhance the decision rationality of language models. Traditional reasoning methods typically rely on historical information and employ uni-directional (left-to-right) reasoning strategy. This lack of bi-directional deliberation reasoning results in limited awareness of potential future outcomes and…
▽ More
This paper introduces BI-Directional DEliberation Reasoning (BIDDER), a novel reasoning approach to enhance the decision rationality of language models. Traditional reasoning methods typically rely on historical information and employ uni-directional (left-to-right) reasoning strategy. This lack of bi-directional deliberation reasoning results in limited awareness of potential future outcomes and insufficient integration of historical context, leading to suboptimal decisions. BIDDER addresses this gap by incorporating principles of rational decision-making, specifically managing uncertainty and predicting expected utility. Our approach involves three key processes: Inferring hidden states to represent uncertain information in the decision-making process from historical data; Using these hidden states to predict future potential states and potential outcomes; Integrating historical information (past contexts) and long-term outcomes (future contexts) to inform reasoning. By leveraging bi-directional reasoning, BIDDER ensures thorough exploration of both past and future contexts, leading to more informed and rational decisions. We tested BIDDER's effectiveness in two well-defined scenarios: Poker (Limit Texas Hold'em) and Negotiation. Our experiments demonstrate that BIDDER significantly improves the decision-making capabilities of LLMs and LLM agents.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Quasinormal modes and greybody factor of Schwarzschild Black Hole in the Cold Dark Matter Halo
Authors:
Shi-Jie Ma,
Rui-Bo Wang,
Tian-Chi Ma,
He-Xu Zhang,
Jian-Bo Deng,
Xian-Ru Hu
Abstract:
In this article, we firstly studied wave function in static spherically symmetric spacetime and obtained effective potential of perturbed fields with spin. Then we applied $6^{\rm{th}}$ order WKB approximation to analyze quasinormal modes of Schwarzschild black hole in the Cold Dark Matter halo in perturbed fields with different spins and derived quasinormal frequencies. Further, to study the rela…
▽ More
In this article, we firstly studied wave function in static spherically symmetric spacetime and obtained effective potential of perturbed fields with spin. Then we applied $6^{\rm{th}}$ order WKB approximation to analyze quasinormal modes of Schwarzschild black hole in the Cold Dark Matter halo in perturbed fields with different spins and derived quasinormal frequencies. Further, to study the relation between quasinormal frequencies and optics, we compare the results of WKB method with eikonal limit formula. At last, we discussed greybody factor in different perturbed fields under this spacetime.
△ Less
Submitted 10 July, 2024; v1 submitted 7 July, 2024;
originally announced July 2024.
-
Joint identification of spatially variable genes via a network-assisted Bayesian regularization approach
Authors:
Mingcong Wu,
Yang Li,
Shuangge Ma,
Mengyun Wu
Abstract:
Identifying genes that display spatial patterns is critical to investigating expression interactions within a spatial context and further dissecting biological understanding of complex mechanistic functionality. Despite the increase in statistical methods designed to identify spatially variable genes, they are mostly based on marginal analysis and share the limitation that the dependence (network)…
▽ More
Identifying genes that display spatial patterns is critical to investigating expression interactions within a spatial context and further dissecting biological understanding of complex mechanistic functionality. Despite the increase in statistical methods designed to identify spatially variable genes, they are mostly based on marginal analysis and share the limitation that the dependence (network) structures among genes are not well accommodated, where a biological process usually involves changes in multiple genes that interact in a complex network. Moreover, the latent cellular composition within spots may introduce confounding variations, negatively affecting identification accuracy. In this study, we develop a novel Bayesian regularization approach for spatial transcriptomic data, with the confounding variations induced by varying cellular distributions effectively corrected. Significantly advancing from the existing studies, a thresholded graph Laplacian regularization is proposed to simultaneously identify spatially variable genes and accommodate the network structure among genes. The proposed method is based on a zero-inflated negative binomial distribution, effectively accommodating the count nature, zero inflation, and overdispersion of spatial transcriptomic data. Extensive simulations and the application to real data demonstrate the competitive performance of the proposed method.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Authors:
Ye Bai,
Jingping Chen,
Jitong Chen,
Wei Chen,
Zhuo Chen,
Chuang Ding,
Linhao Dong,
Qianqian Dong,
Yujiao Du,
Kepan Gao,
Lu Gao,
Yi Guo,
Minglun Han,
Ting Han,
Wenchao Hu,
Xinying Hu,
Yuxiang Hu,
Deyu Hua,
Lu Huang,
Mingkun Huang,
Youjia Huang,
Jishuo Jin,
Fanliu Kong,
Zongwei Lan,
Tianyu Li
, et al. (30 additional authors not shown)
Abstract:
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor…
▽ More
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further deployed to support specific needs in various scenarios without requiring extra language models. Compared to recently released large ASR models, Seed-ASR achieves 10%-40% reduction in word (or character, for Chinese) error rates on Chinese and English public test sets, further demonstrating its powerful performance.
△ Less
Submitted 10 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Parallel fast random bit generation based on spectrotemporally uncorrelated Brillouin random fiber lasing oscillation
Authors:
Yuxi Pang,
Shaonian Ma,
Qiang Ji,
Xian Zhao,
Zengguang Qin,
Zhaojun Liu,
Ping Lu,
Xiaoyi Bao,
Yanping Xu
Abstract:
Correlations existing between spectral components in multi-wavelength lasers have been the key challenge that hinders these laser sources from being developed to chaotic comb entropy sources for parallel random bit generation. Herein, spectrotemporally uncorrelated multi-order Stokes/anti-Stokes emissions are achieved by cooperatively exploiting nonlinear optical processes including cascaded stimu…
▽ More
Correlations existing between spectral components in multi-wavelength lasers have been the key challenge that hinders these laser sources from being developed to chaotic comb entropy sources for parallel random bit generation. Herein, spectrotemporally uncorrelated multi-order Stokes/anti-Stokes emissions are achieved by cooperatively exploiting nonlinear optical processes including cascaded stimulated Brillouin scattering and quasi-phase-matched four-wave mixing in a Brillouin random fiber laser. Chaotic instabilities induced by random mode resonance are enhanced and disorderly redistributed among different lasing lines through complex nonlinear optical interactions, which comprehensively releases the inherent correlation among multiple Stokes/anti-Stokes emission lines, realizing a chaotic frequency comb with multiple spectrotemporally uncorrelated channels. Parallel fast random bit generation is fulfilled with 31 channels, single-channel bit rate of 35-Gbps and total bit rate of 1.085-Tbps. National Institute of Standards and Technology statistic tests verify the randomness of generated bit streams. This work, in a simple and efficient way, breaks the correlation barrier for utilizing multi-wavelength laser to achieve high-quality spectrotemporally uncorrelated chaotic laser source, opening new avenues for achieving greatly accelerated random bit generation through parallelization and potentially revolutionizing the current architecture of secure communication and high-performance computation.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Observation of Co-propagating Chiral Zero Modes in Magnetic Photonic Crystals
Authors:
Zhongfu Li,
Shaojie Ma,
Shuwei Li,
Oubo you,
Yachao Liu,
Qingdong Yang,
Yuanjiang Xiang,
Peiheng Zhou,
Shuang Zhang
Abstract:
Topological singularities, such as Weyl points and Dirac points, can give rise to unidirectional propagation channels known as chiral zero modes (CZMs) when subject to a magnetic field. These CZMs are responsible for intriguing phenomena like the chiral anomaly in quantum systems. The propagation direction of each CZM is determined by both the applied magnetic field and the topological charge of t…
▽ More
Topological singularities, such as Weyl points and Dirac points, can give rise to unidirectional propagation channels known as chiral zero modes (CZMs) when subject to a magnetic field. These CZMs are responsible for intriguing phenomena like the chiral anomaly in quantum systems. The propagation direction of each CZM is determined by both the applied magnetic field and the topological charge of the singularity point. While counter-propagating CZMs have been observed in 2D and 3D systems, the realization of co-propagating CZMs has remained elusive. Here we present the first experimental observation of co-propagating CZMs in magnetic photonic crystals hosting a single pair of ideal Weyl points WPs. By manipulating the crystal's structural configuration, we spatially alter the locations of the WPs, creating pseudo-magnetic fields in opposite directions between them. This arrangement results in a pair of CZMs that possess the same group velocity and co-propagate. Our work opens up new possibilities for topological manipulation of wave propagation and may lead to advancements in optical waveguides, switches, and various other applications.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Efficient DNN-Powered Software with Fair Sparse Models
Authors:
Xuanqi Gao,
Weipeng Jiang,
Juan Zhai,
Shiqing Ma,
Xiaoyu Zhang,
Chao Shen
Abstract:
With the emergence of the Software 3.0 era, there is a growing trend of compressing and integrating large models into software systems, with significant societal implications. Regrettably, in numerous instances, model compression techniques impact the fairness performance of these models and thus the ethical behavior of DNN-powered software. One of the most notable example is the Lottery Ticket Hy…
▽ More
With the emergence of the Software 3.0 era, there is a growing trend of compressing and integrating large models into software systems, with significant societal implications. Regrettably, in numerous instances, model compression techniques impact the fairness performance of these models and thus the ethical behavior of DNN-powered software. One of the most notable example is the Lottery Ticket Hypothesis (LTH), a prevailing model pruning approach. This paper demonstrates that fairness issue of LTHbased pruning arises from both its subnetwork selection and training procedures, highlighting the inadequacy of existing remedies. To address this, we propose a novel pruning framework, Ballot, which employs a novel conflict-detection-based subnetwork selection to find accurate and fair subnetworks, coupled with a refined training process to attain a high-performance model, thereby improving the fairness of DNN-powered software. By means of this procedure, Ballot improves the fairness of pruning by 38.00%, 33.91%, 17.96%, and 35.82% compared to state-of-the-art baselines, namely Magnitude Pruning, Standard LTH, SafeCompress, and FairScratch respectively, based on our evaluation of five popular datasets and three widely used models. Our code is available at https://anonymous.4open.science/r/Ballot-506E.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis
Authors:
Tianyu Cui,
Shiyu Ma,
Ziang Chen,
Tong Xiao,
Shimin Tao,
Yilun Liu,
Shenglin Zhang,
Duoming Lin,
Changchang Liu,
Yuzhe Cai,
Weibin Meng,
Yongqian Sun,
Dan Pei
Abstract:
Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant potential in natural language processing tasks. In the AIOps domain, they excel in tasks such as anomaly detection, root cause analysis of faults, operations and maint…
▽ More
Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant potential in natural language processing tasks. In the AIOps domain, they excel in tasks such as anomaly detection, root cause analysis of faults, operations and maintenance script generation, and alert information summarization. However, the performance of current LLMs in log analysis tasks remains inadequately validated. To address this gap, we introduce LogEval, a comprehensive benchmark suite designed to evaluate the capabilities of LLMs in various log analysis tasks for the first time. This benchmark covers tasks such as log parsing, log anomaly detection, log fault diagnosis, and log summarization. LogEval evaluates each task using 4,000 publicly available log data entries and employs 15 different prompts for each task to ensure a thorough and fair assessment. By rigorously evaluating leading LLMs, we demonstrate the impact of various LLM technologies on log analysis performance, focusing on aspects such as self-consistency and few-shot contextual learning. We also discuss findings related to model quantification, Chinese-English question-answering evaluation, and prompt engineering. These findings provide insights into the strengths and weaknesses of LLMs in multilingual environments and the effectiveness of different prompt strategies. Various evaluation methods are employed for different tasks to accurately measure the performance of LLMs in log analysis, ensuring a comprehensive assessment. The insights gained from LogEvals evaluation reveal the strengths and limitations of LLMs in log analysis tasks, providing valuable guidance for researchers and practitioners.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
WaveShot: A Compact Portable Unmanned Surface Vessel for Dynamic Water Surface Videography and Media Production
Authors:
Shijian Ma,
Shicong Ma,
Weize Ma
Abstract:
This paper presents WaveShot, an innovative portable unmanned surface vessel that aims to transform water surface videography by offering a highly maneuverable, cost-effective, and safe alternative to traditional filming methods. WaveShot is specially designed for the modern demands of film production, advertising, documentaries, and visual arts, equipped with professional-grade waterproof cameras…
▽ More
This paper presents WaveShot, an innovative portable unmanned surface vessel that aims to transform water surface videography by offering a highly maneuverable, cost-effective, and safe alternative to traditional filming methods. WaveShot is specially designed for the modern demands of film production, advertising, documentaries, and visual arts, equipped with professional-grade waterproof cameras and advanced technology to capture both static and dynamic scenes on waterways. We discuss the development and advantages of WaveShot, highlighting its portability, ease of transport, and rapid deployment capabilities. Experimental validation that is showcasing WaveShot's stability and high-quality video capture in various water conditions, and the integration of monocular depth estimation algorithms to enhance the operator's spatial perception. The paper concludes with an exploration of WaveShot's real-world applications, its user-friendly remote operation, and future enhancements such as gimbal integration and advanced computer vision for optimized videography on water surfaces.
△ Less
Submitted 12 March, 2024;
originally announced July 2024.
-
PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction
Authors:
Xuan Yu,
Yili Liu,
Chenrui Han,
Sitong Mao,
Shunbo Zhou,
Rong Xiong,
Yiyi Liao,
Yue Wang
Abstract:
Panoptic reconstruction is a challenging task in 3D scene understanding. However, most existing methods heavily rely on pre-trained semantic segmentation models and known 3D object bounding boxes for 3D panoptic segmentation, which is not available for in-the-wild scenes. In this paper, we propose a novel zero-shot panoptic reconstruction method from RGB-D images of scenes. For zero-shot segmentat…
▽ More
Panoptic reconstruction is a challenging task in 3D scene understanding. However, most existing methods heavily rely on pre-trained semantic segmentation models and known 3D object bounding boxes for 3D panoptic segmentation, which is not available for in-the-wild scenes. In this paper, we propose a novel zero-shot panoptic reconstruction method from RGB-D images of scenes. For zero-shot segmentation, we leverage open-vocabulary instance segmentation, but it has to face partial labeling and instance association challenges. We tackle both challenges by propagating partial labels with the aid of dense generalized features and building a 3D instance graph for associating 2D instance IDs. Specifically, we exploit partial labels to learn a classifier for generalized semantic features to provide complete labels for scenes with dense distilled features. Moreover, we formulate instance association as a 3D instance graph segmentation problem, allowing us to fully utilize the scene geometry prior and all 2D instance masks to infer global unique pseudo 3D instance ID. Our method outperforms state-of-the-art methods on the indoor dataset ScanNet V2 and the outdoor dataset KITTI-360, demonstrating the effectiveness of our graph segmentation method and reconstruction network.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Multi-Functional Beamforming Design for Integrated Sensing, Communication, and Computation
Authors:
Yapeng Zhao,
Qingqing Wu,
Wen Chen,
Yong Zeng,
Ruiqi Liu,
Weidong Mei,
Fen Hou,
Shaodan Ma
Abstract:
Integrated sensing and communication (ISAC) systems may face a heavy computation burden since the sensory data needs to be further processed. This paper studies a novel system that integrates sensing, communication, and computation, aiming to provide services for different objectives efficiently. This system consists of a multi-antenna multi-functional base station (BS), an edge server, a target,…
▽ More
Integrated sensing and communication (ISAC) systems may face a heavy computation burden since the sensory data needs to be further processed. This paper studies a novel system that integrates sensing, communication, and computation, aiming to provide services for different objectives efficiently. This system consists of a multi-antenna multi-functional base station (BS), an edge server, a target, and multiple singleantenna communication users. The BS needs to allocate the available resources to efficiently provide sensing, communication, and computation services. Due to the heavy service burden and limited power budget, the BS can partially offload the tasks to the nearby edge server instead of computing them locally. We consider the estimation of the target response matrix, a general problem in radar sensing, and utilize Cramer-Rao bound (CRB) as the corresponding performance metric. To tackle the non-convex optimization problem, we propose both semidefinite relaxation (SDR)-based alternating optimization and SDR-based successive convex approximation (SCA) algorithms to minimize the CRB of radar sensing while meeting the requirement of communication users and the need for task computing. Furthermore, we demonstrate that the optimal rankone solutions of both the alternating and SCA algorithms can be directly obtained via the solver or further constructed even when dealing with multiple functionalities. Simulation results show that the proposed algorithms can provide higher target estimation performance than state-of-the-art benchmarks while satisfying the communication and computation constraints.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science
Authors:
Xinna Lin,
Siqi Ma,
Junjie Shan,
Xiaojing Zhang,
Shell Xu Hu,
Tiannan Guo,
Stan Z. Li,
Kaicheng Yu
Abstract:
Pursuing artificial intelligence for biomedical science, a.k.a. AI Scientist, draws increasing attention, where one common approach is to build a copilot agent driven by Large Language Models (LLMs). However, to evaluate such systems, people either rely on direct Question-Answering (QA) to the LLM itself, or in a biomedical experimental manner. How to precisely benchmark biomedical agents from an…
▽ More
Pursuing artificial intelligence for biomedical science, a.k.a. AI Scientist, draws increasing attention, where one common approach is to build a copilot agent driven by Large Language Models (LLMs). However, to evaluate such systems, people either rely on direct Question-Answering (QA) to the LLM itself, or in a biomedical experimental manner. How to precisely benchmark biomedical agents from an AI Scientist perspective remains largely unexplored. To this end, we draw inspiration from one most important abilities of scientists, understanding the literature, and introduce BioKGBench. In contrast to traditional evaluation benchmark that only focuses on factual QA, where the LLMs are known to have hallucination issues, we first disentangle "Understanding Literature" into two atomic abilities, i) "Understanding" the unstructured text from research papers by performing scientific claim verification, and ii) Ability to interact with structured Knowledge-Graph Question-Answering (KGQA) as a form of "Literature" grounding. We then formulate a novel agent task, dubbed KGCheck, using KGQA and domain-based Retrieval-Augmented Generation (RAG) to identify the factual errors of existing large-scale knowledge graph databases. We collect over two thousand data for two atomic tasks and 225 high-quality annotated data for the agent task. Surprisingly, we discover that state-of-the-art agents, both daily scenarios and biomedical ones, have either failed or inferior performance on our benchmark. We then introduce a simple yet effective baseline, dubbed BKGAgent. On the widely used popular knowledge graph, we discover over 90 factual errors which provide scenarios for agents to make discoveries and demonstrate the effectiveness of our approach. The code and data are available at https://github.com/westlake-autolab/BioKGBench.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Prompt Refinement with Image Pivot for Text-to-Image Generation
Authors:
Jingtao Zhan,
Qingyao Ai,
Yiqun Liu,
Yingwei Pan,
Ting Yao,
Jiaxin Mao,
Shaoping Ma,
Tao Mei
Abstract:
For text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from "user languages" into "system languages". However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement mod…
▽ More
For text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from "user languages" into "system languages". However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement model. Inspired by zero-shot machine translation techniques, we introduce Prompt Refinement with Image Pivot (PRIP). PRIP innovatively uses the latent representation of a user-preferred image as an intermediary "pivot" between the user and system languages. It decomposes the refinement process into two data-rich tasks: inferring representations of user-preferred images from user languages and subsequently translating image representations into system languages. Thus, it can leverage abundant data for training. Extensive experiments show that PRIP substantially outperforms a wide range of baselines and effectively transfers to unseen systems in a zero-shot manner.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
HarmonICA: Neural non-stationarity correction and source separation for motor neuron interfaces
Authors:
Alexander Kenneth Clarke,
Agnese Grison,
Irene Mendez Guerra,
Pranav Mamidanna,
Shihan Ma,
Silvia Muceli,
Dario Farina
Abstract:
A major outstanding problem when interfacing with spinal motor neurons is how to accurately compensate for non-stationary effects in the signal during source separation routines, particularly when they cannot be estimated in advance. This forces current systems to instead use undifferentiated bulk signal, which limits the potential degrees of freedom for control. In this study we propose a potenti…
▽ More
A major outstanding problem when interfacing with spinal motor neurons is how to accurately compensate for non-stationary effects in the signal during source separation routines, particularly when they cannot be estimated in advance. This forces current systems to instead use undifferentiated bulk signal, which limits the potential degrees of freedom for control. In this study we propose a potential solution, using an unsupervised learning algorithm to blindly correct for the effects of latent processes which drive the signal non-stationarities. We implement this methodology within the theoretical framework of a quasilinear version of independent component analysis (ICA). The proposed design, HarmonICA, sidesteps the identifiability problems of nonlinear ICA, allowing for equivalent predictability to linear ICA whilst retaining the ability to learn complex nonlinear relationships between non-stationary latents and their effects on the signal. We test HarmonICA on both invasive and non-invasive recordings both simulated and real, demonstrating an ability to blindly compensate for the non-stationary effects specific to each, and thus to significantly enhance the quality of a source separation routine.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Precise determination of the bottom-quark on-shell mass using its four-loop relation to the $\overline{\rm MS}$-scheme running mass
Authors:
Shun-Yue Ma,
Xu-Dong Huang,
Xu-Chang Zheng,
Xing-Gang Wu
Abstract:
In this paper, we explore the properties of the bottom-quark on-shell mass ($M_b$) by using its relation to the $\overline{\rm MS}$ mass (${\overline m}_b$). At present, this $\overline{\rm MS}$-on-shell relation has been known up to four-loop QCD corrections, which however still has a $\sim 2\%$ scale uncertainty by taking the renormalization scale as ${\overline m}_b({\overline m}_b)$ and varyin…
▽ More
In this paper, we explore the properties of the bottom-quark on-shell mass ($M_b$) by using its relation to the $\overline{\rm MS}$ mass (${\overline m}_b$). At present, this $\overline{\rm MS}$-on-shell relation has been known up to four-loop QCD corrections, which however still has a $\sim 2\%$ scale uncertainty by taking the renormalization scale as ${\overline m}_b({\overline m}_b)$ and varying it within the usual range of $[{\overline m}_b({\overline m}_b)/2, 2 {\overline m}_b({\overline m}_b)]$. The principle of maximum conformality (PMC) has been adopted to achieve a more precise $\overline{\rm MS}$-on-shell relation by eliminating such scale uncertainty. As a step forward, we also estimate the magnitude of the uncalculated higher-order terms by using the Padé approximation approach. Numerically, by using the $\overline{\rm MS}$ mass ${\overline m}_b({\overline m}_b)=4.18^{+0.03}_{-0.02}$ GeV as an input, our predicted value for the bottom-quark on-shell mass becomes $M_b\simeq 5.36^{+0.10}_{-0.07}$ GeV, where the uncertainty is the squared average of the ones caused by $Δα_s(M_Z)$, $Δ{\overline m}_b({\overline m}_b)$, and the estimated magnitude of the higher-order terms.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
A hybrid FEM-NN optimization method to learn the physics-constrained constitutive relations from full-field data
Authors:
Xinxin Wu Kaiqiang Sun,
Shaohua Yang,
Huan Wang,
Ye Xu,
Yin Zhang,
Sheng Mao
Abstract:
Neural networks (NNs) have demonstrated strong capabilities of representing high-dimensional, complex functional relations, and hence have been widely used to characterize complex constitutive relations for various types of materials, such as polycrystals, polymers, etc. However, to construct a reliable NN-based constitutive model, a considerable amount of data, i.e. stress-strain states along dif…
▽ More
Neural networks (NNs) have demonstrated strong capabilities of representing high-dimensional, complex functional relations, and hence have been widely used to characterize complex constitutive relations for various types of materials, such as polycrystals, polymers, etc. However, to construct a reliable NN-based constitutive model, a considerable amount of data, i.e. stress-strain states along different loading paths is needed, which can be expensive to collect. To address such challenge, we develop a hybrid finite element method (FEM) - NN optimization framework to learn complex hyperelastic constitutive relations from full-field data. The key advantage of this framework is that it can make use of the non-uniform displacement field due to the geometric inhomogeneities for training NN-based constitutive models. Since such data can provide many different stress-strain states in a single test, it can greatly reduce the number of experiments needed for the training of NNs. Besides, we adopt a mechanics-informed neural network (MINN) as our architecture to ensure that our NN-based models satisfy all necessary physical constraints by construction, such as objectivity, material symmetry, polyconvexity, etc. Such architecture is also key to the convergence of our optimization framework. We then use both synthetic and experimental data to test the performance of our proposed framework on various isotropic hyperelastic materials. Results show that our optimization framework can be used to train NN-based constitutive models for hyperelastic materials with high accuracy and efficiency using data generated from simple tests, which can also be easily adapted to characterize complex constitutive models for a broader range of materials.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions
Authors:
Sihan Ma,
Jing Zhang,
Qiong Cao,
Dacheng Tao
Abstract:
Pose estimation aims to accurately identify anatomical keypoints in humans and animals using monocular images, which is crucial for various applications such as human-machine interaction, embodied AI, and autonomous driving. While current models show promising results, they are typically trained and tested on clean data, potentially overlooking the corruption during real-world deployment and thus…
▽ More
Pose estimation aims to accurately identify anatomical keypoints in humans and animals using monocular images, which is crucial for various applications such as human-machine interaction, embodied AI, and autonomous driving. While current models show promising results, they are typically trained and tested on clean data, potentially overlooking the corruption during real-world deployment and thus posing safety risks in practical scenarios. To address this issue, we introduce PoseBench, a comprehensive benchmark designed to evaluate the robustness of pose estimation models against real-world corruption. We evaluated 60 representative models, including top-down, bottom-up, heatmap-based, regression-based, and classification-based methods, across three datasets for human and animal pose estimation. Our evaluation involves 10 types of corruption in four categories: 1) blur and noise, 2) compression and color loss, 3) severe lighting, and 4) masks. Our findings reveal that state-of-the-art models are vulnerable to common real-world corruptions and exhibit distinct behaviors when tackling human and animal pose estimation tasks. To improve model robustness, we delve into various design considerations, including input resolution, pre-training datasets, backbone capacity, post-processing, and data augmentations. We hope that our benchmark will serve as a foundation for advancing research in robust pose estimation. The benchmark and source code will be released at https://xymsh.github.io/PoseBench
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
LQCD constrained magnetic field dependent coupling constant in an effective model
Authors:
Shijun Mao
Abstract:
A magnetic field dependent coupling constant $G(eB)$ is investigated in the two-flavor magnetized NJL model. Based on LQCD results of the neutral (charged) pion mass spectra at vanishing temperature and finite magnetic field, we determine the $G(eB)=G^0(eB)$ ($G(eB)=G^+(eB)$) in the NJL model. $G^0(eB)$ and $G^+(eB)$ are both non-monotonic functions of magnetic fields, but they are different from…
▽ More
A magnetic field dependent coupling constant $G(eB)$ is investigated in the two-flavor magnetized NJL model. Based on LQCD results of the neutral (charged) pion mass spectra at vanishing temperature and finite magnetic field, we determine the $G(eB)=G^0(eB)$ ($G(eB)=G^+(eB)$) in the NJL model. $G^0(eB)$ and $G^+(eB)$ are both non-monotonic functions of magnetic fields, but they are different from each other. Furthermore, we calculate the pseudo-critical temperatures $T_{pc}(eB)$ of chiral restoration phase transition with $G^0(eB)$ and $G^+(eB)$ in the magnetized NJL model, respectively. The resulting $T_{pc}(eB)$ are non-monotonic functions of magnetic fields. In previous work, $G(eB)$ in the NJL model fitted from the chiral condensate or pseudo-critical temperature of LQCD simulations is a decreasing function of magnetic field. It can not explain the saturation behavior of mass spectra of neutral pion and decreasing behavior of mass spectra of charged pion with strong magnetic field. We conclude that a magnetic field dependent coupling constant $G(eB)$ in the NJL model can not simultaneously explain the reduction of pseudo-critical temperature of chiral restoration phase transition and the light meson mass spectra under external magnetic field.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
State-of-the-Art Review: The Use of Digital Twins to Support Artificial Intelligence-Guided Predictive Maintenance
Authors:
Sizhe Ma,
Katherine A. Flanigan,
Mario Bergés
Abstract:
In recent years, predictive maintenance (PMx) has gained prominence for its potential to enhance efficiency, automation, accuracy, and cost-effectiveness while reducing human involvement. Importantly, PMx has evolved in tandem with digital advancements, such as Big Data and the Internet of Things (IOT). These technological strides have enabled Artificial Intelligence (AI) to revolutionize PMx proc…
▽ More
In recent years, predictive maintenance (PMx) has gained prominence for its potential to enhance efficiency, automation, accuracy, and cost-effectiveness while reducing human involvement. Importantly, PMx has evolved in tandem with digital advancements, such as Big Data and the Internet of Things (IOT). These technological strides have enabled Artificial Intelligence (AI) to revolutionize PMx processes, with increasing capacities for real-time automation of monitoring, analysis, and prediction tasks. However, PMx still faces challenges such as poor explainability and sample inefficiency in data-driven methods and high complexity in physics-based models, hindering broader adoption. This paper posits that Digital Twins (DTs) can be integrated into PMx to overcome these challenges, paving the way for more automated PMx applications across various stakeholders. Despite their potential, current DTs have not fully matured to bridge existing gaps. Our paper provides a comprehensive roadmap for DT evolution, addressing current limitations to foster large-scale automated PMx progression. We structure our approach in three stages: First, we reference prior work where we identified and defined the Information Requirements (IRs) and Functional Requirements (FRs) for PMx, forming the blueprint for a unified framework. Second, we conduct a literature review to assess current DT applications integrating these IRs and FRs, revealing standardized DT models and tools that support automated PMx. Lastly, we highlight gaps in current DT implementations, particularly those IRs and FRs not fully supported, and outline the necessary components for a comprehensive, automated PMx system. Our paper concludes with research directions aimed at seamlessly integrating DTs into the PMx paradigm to achieve this ambitious vision.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
The Aligned Orbit of a Hot Jupiter around the M Dwarf TOI-4201
Authors:
Tianjun Gan,
Sharon X. Wang,
Fei Dai,
Joshua N. Winn,
Shude Mao,
Siyi Xu,
Enric Pallé,
Jacob L. Bean,
Madison Brady,
Nina Brown,
Cicero Lu,
Rafael Luque,
Teo Mocnik,
Andreas Seifahrt,
Guðmundur K. Stefánsson
Abstract:
Measuring the obliquities of stars hosting giant planets may shed light on the dynamical history of planetary systems. Significant efforts have been made to measure the obliquities of FGK stars with hot Jupiters, mainly based on observations of the Rossiter-McLaughlin effect. In contrast, M dwarfs with hot Jupiters have hardly been explored, because such systems are rare and often not favorable fo…
▽ More
Measuring the obliquities of stars hosting giant planets may shed light on the dynamical history of planetary systems. Significant efforts have been made to measure the obliquities of FGK stars with hot Jupiters, mainly based on observations of the Rossiter-McLaughlin effect. In contrast, M dwarfs with hot Jupiters have hardly been explored, because such systems are rare and often not favorable for such precise observations. Here, we report the first detection of the Rossiter-McLaughlin effect for an M dwarf with a hot Jupiter, TOI-4201, using the Gemini-North/MAROON-X spectrograph. We find TOI-4201 to be well-aligned with its giant planet, with a sky-projected obliquity of $λ=-3.0_{-3.2}^{+3.7}\ ^{\circ}$ and a true obliquity of $ψ=21.3_{-12.8}^{+12.5}\ ^{\circ}$ with an upper limit of $40^{\circ}$ at a 95% confidence level. The result agrees with dynamically quiet formation or tidal obliquity damping that realigned the system. As the first hot Jupiter around an M dwarf with its obliquity measured, TOI-4201b joins the group of aligned giant planets around cool stars ($T_{\rm eff}<6250\ K$), as well as the small but growing sample of planets with relatively high planet-to-star mass ratio ($M_p/M_\ast\gtrsim 3\times 10^{-3}$) that also appear to be mostly aligned.
△ Less
Submitted 19 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
CITADEL: Context Similarity Based Deep Learning Framework Bug Finding
Authors:
Xiaoyu Zhang,
Juan Zhai,
Shiqing Ma,
Shiwei Wang,
Chao Shen
Abstract:
With deep learning (DL) technology becoming an integral part of the new intelligent software, tools of DL framework testing and bug-finding are in high demand. Existing DL framework testing tools have limited coverage on bug types. For example, they lack the capability of finding performance bugs, which are critical for DL model training and inference regarding performance, economics, and the envi…
▽ More
With deep learning (DL) technology becoming an integral part of the new intelligent software, tools of DL framework testing and bug-finding are in high demand. Existing DL framework testing tools have limited coverage on bug types. For example, they lack the capability of finding performance bugs, which are critical for DL model training and inference regarding performance, economics, and the environment. This problem is challenging due to the difficulty of getting test oracles of performance bugs. Moreover, existing tools are inefficient, generating hundreds of test cases with few trigger bugs. In this paper, we propose CITADEL, a method that accelerates the finding of bugs in terms of efficiency and effectiveness. We observe that many DL framework bugs are similar due to the similarity of operators and algorithms belonging to the same family (e.g., Conv2D and Conv3D). Orthogonal to existing bug-finding tools, CITADEL aims to find new bugs that are similar to reported ones that have known test oracles. It works by first collecting existing bug reports and identifying problematic APIs. CITADEL defines context similarity to measure the similarity of DL framework API pairs and automatically generates test cases with oracles for APIs that are similar to the problematic APIs in existing bug reports. CITADEL respectively covers 1,436 PyTorch and 5,380 TensorFlow APIs and effectively detects 79 and 80 API bugs, among which 58 and 68 are new, and 36 and 58 have been confirmed, many of which, e.g., the 11 performance bugs cannot be detected by existing tools. Moreover, a remarkable 35.40% of the test cases generated by CITADEL can trigger bugs, which significantly transcends the ratios of 0.74%, 1.23%, and 3.90% exhibited by the state-of-the-art methods, DocTer, DeepREL, and TitanFuzz.
△ Less
Submitted 18 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Authors:
DeepSeek-AI,
Qihao Zhu,
Daya Guo,
Zhihong Shao,
Dejian Yang,
Peiyi Wang,
Runxin Xu,
Y. Wu,
Yukun Li,
Huazuo Gao,
Shirong Ma,
Wangding Zeng,
Xiao Bi,
Zihui Gu,
Hanwei Xu,
Damai Dai,
Kai Dong,
Liyue Zhang,
Yishi Piao,
Zhibin Gou,
Zhenda Xie,
Zhewen Hao,
Bingxuan Wang,
Junxiao Song,
Deli Chen
, et al. (15 additional authors not shown)
Abstract:
We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe…
▽ More
We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Meta Reasoning for Large Language Models
Authors:
Peizhong Gao,
Ao Xie,
Shaoguang Mao,
Wenshan Wu,
Yan Xia,
Haipeng Mi,
Furu Wei
Abstract:
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) inspired by human meta-reasoning. Traditional in-context learning-based reasoning techniques, such as Tree-of-Thoughts, show promise but lack consistent state-of-the-art performance across diverse tasks due to their specialized nature. MRP addresses this limitation by guiding…
▽ More
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) inspired by human meta-reasoning. Traditional in-context learning-based reasoning techniques, such as Tree-of-Thoughts, show promise but lack consistent state-of-the-art performance across diverse tasks due to their specialized nature. MRP addresses this limitation by guiding LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task, optimizing both performance and computational efficiency. With MRP, LLM reasoning operates in two phases. Initially, the LLM identifies the most appropriate reasoning method using task input cues and objective descriptions of available methods. Subsequently, it applies the chosen method to complete the task. This dynamic strategy mirrors human meta-reasoning, allowing the model to excel in a wide range of problem domains. We evaluate the effectiveness of MRP through comprehensive benchmarks. The results demonstrate that MRP achieves or approaches state-of-the-art performance across diverse tasks. MRP represents a significant advancement in enabling LLMs to identify cognitive challenges across problems and leverage benefits across different reasoning approaches, enhancing their ability to handle diverse and complex problem domains efficiently. Every LLM deserves a Meta-Reasoning Prompting to unlock its full potential and ensure adaptability in an ever-evolving landscape of challenges and applications.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Authors:
Renqiu Xia,
Song Mao,
Xiangchao Yan,
Hongbin Zhou,
Bo Zhang,
Haoyang Peng,
Jiahao Pi,
Daocheng Fu,
Wenjie Wu,
Hancheng Ye,
Shiyang Feng,
Bin Wang,
Chao Xu,
Conghui He,
Pinlong Cai,
Min Dou,
Botian Shi,
Sheng Zhou,
Yongwei Wang,
Bin Wang,
Junchi Yan,
Fei Wu,
Yu Qiao
Abstract:
Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extract…
▽ More
Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extraction and understanding tasks, and their capacity to process within-document data formats such as charts and equations remains under-explored. To address these issues, we present DocGenome, a structured document benchmark constructed by annotating 500K scientific documents from 153 disciplines in the arXiv open-access community, using our custom auto-labeling pipeline. DocGenome features four key characteristics: 1) Completeness: It is the first dataset to structure data from all modalities including 13 layout attributes along with their LaTeX source codes. 2) Logicality: It provides 6 logical relationships between different entities within each scientific document. 3) Diversity: It covers various document-oriented tasks, including document classification, visual grounding, document layout detection, document transformation, open-ended single-page QA and multi-page QA. 4) Correctness: It undergoes rigorous quality control checks conducted by a specialized team. We conduct extensive experiments to demonstrate the advantages of DocGenome and objectively evaluate the performance of large models on our benchmark.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
A moduli space of stable sheaves on a cubic threefold
Authors:
Shihao Ma,
Song Yang
Abstract:
In this paper, we prove that the moduli space $\overline{M}_{X}(ν)$ of $H$-Gieseker semistable sheaves on a smooth cubic threefold $X$ with Chern character $ν=(4,-H,-\frac{5}{6}H^{2},\frac{1}{6}H^{3})$ is non-empty, smooth and irreducible of dimension $8$.
In this paper, we prove that the moduli space $\overline{M}_{X}(ν)$ of $H$-Gieseker semistable sheaves on a smooth cubic threefold $X$ with Chern character $ν=(4,-H,-\frac{5}{6}H^{2},\frac{1}{6}H^{3})$ is non-empty, smooth and irreducible of dimension $8$.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
RobustSAM: Segment Anything Robustly on Degraded Images
Authors:
Wei-Ting Chen,
Yu-Jiet Vong,
Sy-Yen Kuo,
Sizhuo Ma,
Jian Wang
Abstract:
Segment Anything Model (SAM) has emerged as a transformative approach in image segmentation, acclaimed for its robust zero-shot segmentation capabilities and flexible prompting system. Nonetheless, its performance is challenged by images with degraded quality. Addressing this limitation, we propose the Robust Segment Anything Model (RobustSAM), which enhances SAM's performance on low-quality image…
▽ More
Segment Anything Model (SAM) has emerged as a transformative approach in image segmentation, acclaimed for its robust zero-shot segmentation capabilities and flexible prompting system. Nonetheless, its performance is challenged by images with degraded quality. Addressing this limitation, we propose the Robust Segment Anything Model (RobustSAM), which enhances SAM's performance on low-quality images while preserving its promptability and zero-shot generalization. Our method leverages the pre-trained SAM model with only marginal parameter increments and computational requirements. The additional parameters of RobustSAM can be optimized within 30 hours on eight GPUs, demonstrating its feasibility and practicality for typical research laboratories. We also introduce the Robust-Seg dataset, a collection of 688K image-mask pairs with different degradations designed to train and evaluate our model optimally. Extensive experiments across various segmentation tasks and datasets confirm RobustSAM's superior performance, especially under zero-shot conditions, underscoring its potential for extensive real-world application. Additionally, our method has been shown to effectively improve the performance of SAM-based downstream tasks such as single image dehazing and deblurring.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer
Authors:
Wei-Ting Chen,
Gurunandan Krishnan,
Qiang Gao,
Sy-Yen Kuo,
Sizhuo Ma,
Jian Wang
Abstract:
Generic Face Image Quality Assessment (GFIQA) evaluates the perceptual quality of facial images, which is crucial in improving image restoration algorithms and selecting high-quality face images for downstream tasks. We present a novel transformer-based method for GFIQA, which is aided by two unique mechanisms. First, a Dual-Set Degradation Representation Learning (DSL) mechanism uses facial image…
▽ More
Generic Face Image Quality Assessment (GFIQA) evaluates the perceptual quality of facial images, which is crucial in improving image restoration algorithms and selecting high-quality face images for downstream tasks. We present a novel transformer-based method for GFIQA, which is aided by two unique mechanisms. First, a Dual-Set Degradation Representation Learning (DSL) mechanism uses facial images with both synthetic and real degradations to decouple degradation from content, ensuring generalizability to real-world scenarios. This self-supervised method learns degradation features on a global scale, providing a robust alternative to conventional methods that use local patch information in degradation learning. Second, our transformer leverages facial landmarks to emphasize visually salient parts of a face image in evaluating its perceptual quality. We also introduce a balanced and diverse Comprehensive Generic Face IQA (CGFIQA-40k) dataset of 40K images carefully designed to overcome the biases, in particular the imbalances in skin tone and gender representation, in existing datasets. Extensive analysis and evaluation demonstrate the robustness of our method, marking a significant improvement over prior methods.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior
Authors:
Baiang Li,
Sizhuo Ma,
Yanhong Zeng,
Xiaogang Xu,
Youqing Fang,
Zhao Zhang,
Jian Wang,
Kai Chen
Abstract:
Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color mapping, which enhances the visual representation by expanding the image's color range and adjusting the brightness…
▽ More
Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color mapping, which enhances the visual representation by expanding the image's color range and adjusting the brightness. However, these approaches fail to effectively restore content in dynamic range extremes, which are regions with pixel values close to 0 or 255. To address the full scope of challenges in HDR imaging and surpass the limitations of current models, we propose a novel two-stage approach. The first stage maps the color and brightness to an appropriate range while keeping the existing details, and the second stage utilizes a diffusion prior to generate content in dynamic range extremes lost during capture. This generative refinement module can also be used as a plug-and-play module to enhance and complement existing LDR enhancement models. The proposed method markedly improves the quality and details of LDR images, demonstrating superior performance through rigorous experimental validation. The project page is at https://sagiri0208.github.io
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Low-Overhead Channel Estimation via 3D Extrapolation for TDD mmWave Massive MIMO Systems Under High-Mobility Scenarios
Authors:
Binggui Zhou,
Xi Yang,
Shaodan Ma,
Feifei Gao,
Guanghua Yang
Abstract:
In TDD mmWave massive MIMO systems, the downlink CSI can be attained through uplink channel estimation thanks to the uplink-downlink channel reciprocity. However, the channel aging issue is significant under high-mobility scenarios and thus necessitates frequent uplink channel estimation. In addition, large amounts of antennas and subcarriers lead to high-dimensional CSI matrices, aggravating the…
▽ More
In TDD mmWave massive MIMO systems, the downlink CSI can be attained through uplink channel estimation thanks to the uplink-downlink channel reciprocity. However, the channel aging issue is significant under high-mobility scenarios and thus necessitates frequent uplink channel estimation. In addition, large amounts of antennas and subcarriers lead to high-dimensional CSI matrices, aggravating the pilot training overhead. To systematically reduce the pilot overhead, a spatial, frequency, and temporal domain (3D) channel extrapolation framework is proposed in this paper. Considering the marginal effects of pilots in the spatial and frequency domains and the effectiveness of traditional knowledge-driven channel estimation methods, we first propose a knowledge-and-data driven spatial-frequency channel extrapolation network (KDD-SFCEN) for uplink channel estimation by exploiting the least square estimator for coarse channel estimation and joint spatial-frequency channel extrapolation to reduce the spatial-frequency domain pilot overhead. Then, resorting to the uplink-downlink channel reciprocity and temporal domain dependencies of downlink channels, a temporal uplink-downlink channel extrapolation network (TUDCEN) is proposed for slot-level channel extrapolation, aiming to enlarge the pilot signal period and thus reduce the temporal domain pilot overhead under high-mobility scenarios. Specifically, we propose the spatial-frequency sampling embedding module to reduce the representation dimension and consequent computational complexity, and we propose to exploit the autoregressive generative Transformer for generating downlink channels autoregressively. Numerical results demonstrate the superiority of the proposed framework in significantly reducing the pilot training overhead by more than 16 times and improving the system's spectral efficiency under high-mobility scenarios.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Inverse Probability of Treatment Weighting with Deep Sequence Models Enables Accurate treatment effect Estimation from Electronic Health Records
Authors:
Junghwan Lee,
Simin Ma,
Nicoleta Serban,
Shihao Yang
Abstract:
Observational data have been actively used to estimate treatment effect, driven by the growing availability of electronic health records (EHRs). However, EHRs typically consist of longitudinal records, often introducing time-dependent confoundings that hinder the unbiased estimation of treatment effect. Inverse probability of treatment weighting (IPTW) is a widely used propensity score method sinc…
▽ More
Observational data have been actively used to estimate treatment effect, driven by the growing availability of electronic health records (EHRs). However, EHRs typically consist of longitudinal records, often introducing time-dependent confoundings that hinder the unbiased estimation of treatment effect. Inverse probability of treatment weighting (IPTW) is a widely used propensity score method since it provides unbiased treatment effect estimation and its derivation is straightforward. In this study, we aim to utilize IPTW to estimate treatment effect in the presence of time-dependent confounding using claims records. Previous studies have utilized propensity score methods with features derived from claims records through feature processing, which generally requires domain knowledge and additional resources to extract information to accurately estimate propensity scores. Deep sequence models, particularly recurrent neural networks and self-attention-based architectures, have demonstrated good performance in modeling EHRs for various downstream tasks. We propose that these deep sequence models can provide accurate IPTW estimation of treatment effect by directly estimating the propensity scores from claims records without the need for feature processing. We empirically demonstrate this by conducting comprehensive evaluations using synthetic and semi-synthetic datasets.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Infinite-dimensional Frobenius Manifolds Underlying the genus-zero Universal Whitham Hierarchy
Authors:
Shilin Ma
Abstract:
In this paper, we construct a new class of infinite-dimensional Frobenius manifolds on the spaces of pairs of meromorphic functions that are defined on specific regions of the Riemann sphere. We demonstrate that the principal hierarchy of these Frobenius manifolds serves as an extension of the genus-zero universal Whitham hierarchy.
In this paper, we construct a new class of infinite-dimensional Frobenius manifolds on the spaces of pairs of meromorphic functions that are defined on specific regions of the Riemann sphere. We demonstrate that the principal hierarchy of these Frobenius manifolds serves as an extension of the genus-zero universal Whitham hierarchy.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
VersiCode: Towards Version-controllable Code Generation
Authors:
Tongtong Wu,
Weigang Wu,
Xingyu Wang,
Kang Xu,
Suyu Ma,
Bo Jiang,
Ping Yang,
Zhenchang Xing,
Yuan-Fang Li,
Gholamreza Haffari
Abstract:
Significant research has focused on improving the performance of large language model on code-related tasks due to their practical importance. Although performance is typically evaluated using public benchmark datasets, the existing datasets do not account for the concept of \emph{version}, which is crucial in professional software development. In this paper, we introduce VersiCode, the first comp…
▽ More
Significant research has focused on improving the performance of large language model on code-related tasks due to their practical importance. Although performance is typically evaluated using public benchmark datasets, the existing datasets do not account for the concept of \emph{version}, which is crucial in professional software development. In this paper, we introduce VersiCode, the first comprehensive dataset designed to assess the ability of large language models to generate verifiable code for specific library versions. VersiCode encompasses 300 libraries across more than 2,000 versions spanning 9 years. We design two dedicated evaluation tasks: version-specific code completion (VSCC) and version-aware code editing (VACE). Comprehensive experiments are conducted to benchmark the performance of LLMs, revealing the challenging nature of these tasks and VersiCode, that even state-of-the-art LLMs struggle to generate version-correct code. This dataset, together with the proposed tasks, sheds light on LLMs' capabilities and limitations in handling version-specific code generation, and opens up an important new area of research for further investigation. The resources can be found at https://github.com/wutong8023/VersiCode.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Dual-cavity controllable quantum battery
Authors:
Dayang Zhang,
Shuangquan Ma,
Yunxiu Jiang,
Youbin Yu,
Guangri Jin,
Aixi Chen
Abstract:
With the increasing development of quantum science and technology, quantum batteries are gradually emerging. But there are still many unsolved problems in the field of quantum batteries. Such as: how to increase the space utilization rate of quantum batteries? How to increase and control the charging power of quantum batteries? And how to have better quantum batterie energy storage without reducin…
▽ More
With the increasing development of quantum science and technology, quantum batteries are gradually emerging. But there are still many unsolved problems in the field of quantum batteries. Such as: how to increase the space utilization rate of quantum batteries? How to increase and control the charging power of quantum batteries? And how to have better quantum batterie energy storage without reducing the power of quantum batteries. Therefore, we propose a controllable dual-cavity quantum batterie. It can increase the charging power of the quantum batterie by manipulating the number of atoms without consuming other resources, and can make the power of the quantum batterie effectively adjust between $N^2$ and $N^{2.5}$. And the advantage of regulation to a certain extent is greater than the advantage of the interaction force between atoms.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Entanglement and steering in quantum batteries
Authors:
Dayang Zhang,
Shuangquan Ma,
Yunxiu Jiang,
Youbin Yu,
Guangri Jin,
Aixi Chen
Abstract:
The advantage of quantum batteries is that quantum resources can be used to improve charging efficiency. The quantum resources that are known to be available are: quantum entanglement and quantum coherence. In this paper, we introduce quantum steering as a new quantum resource into batteries for the first time. We analyze the relationship between quantum steering, quantum entanglement, energy stor…
▽ More
The advantage of quantum batteries is that quantum resources can be used to improve charging efficiency. The quantum resources that are known to be available are: quantum entanglement and quantum coherence. In this paper, we introduce quantum steering as a new quantum resource into batteries for the first time. We analyze the relationship between quantum steering, quantum entanglement, energy storage, and extractable work by considering two models: Field-quantum battery and Cavity-Heisenberg quantum battery. We find that in the steerable range, the quantum steering of different qubits has a maximum or minimum value, which corresponds to the energy storage of the battery, and the extractable work has a maximum value. The occurrence of the minimum value of quantum entanglement is always accompanied by the occurrence of the maximum value of parameters such as energy storage. Ultimately, we analyzed the reasons for these results using the purity of the system. And found a relatively general conclusion: when the purity is at the maximum, important parameters such as the energy storage of the battery are also at the maximum.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
A curious symmetric decomposition of the (des, exc)-Eulerian polynomials
Authors:
Shi-Mei Ma,
Toufik Mansour,
Yeong-Nan Yeh
Abstract:
One of the most central result in combinatorics says that the descent statistic and the excedance statistic are equidistribued over the symmetric group. As a continuation of the work of Shareshian-Wachs (Adv. Math., 225(6) (2010), 2921--2966), we provide a curious $t$-symmetric decomposition for the generating polynomial of the joint distribution of the descent and excedance statistics over the sy…
▽ More
One of the most central result in combinatorics says that the descent statistic and the excedance statistic are equidistribued over the symmetric group. As a continuation of the work of Shareshian-Wachs (Adv. Math., 225(6) (2010), 2921--2966), we provide a curious $t$-symmetric decomposition for the generating polynomial of the joint distribution of the descent and excedance statistics over the symmetric group.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
MeanSparse: Post-Training Robustness Enhancement Through Mean-Centered Feature Sparsification
Authors:
Sajjad Amini,
Mohammadreza Teymoorianfard,
Shiqing Ma,
Amir Houmansadr
Abstract:
We present a simple yet effective method to improve the robustness of Convolutional Neural Networks (CNNs) against adversarial examples by post-processing an adversarially trained model. Our technique, MeanSparse, cascades the activation functions of a trained model with novel operators that sparsify mean-centered feature vectors. This is equivalent to reducing feature variations around the mean,…
▽ More
We present a simple yet effective method to improve the robustness of Convolutional Neural Networks (CNNs) against adversarial examples by post-processing an adversarially trained model. Our technique, MeanSparse, cascades the activation functions of a trained model with novel operators that sparsify mean-centered feature vectors. This is equivalent to reducing feature variations around the mean, and we show that such reduced variations merely affect the model's utility, yet they strongly attenuate the adversarial perturbations and decrease the attacker's success rate. Our experiments show that, when applied to the top models in the RobustBench leaderboard, it achieves a new robustness record of 72.08% (from 71.07%) and 59.64% (from 59.56%) on CIFAR-10 and ImageNet, respectively, in term of AutoAttack accuracy. Code is available at https://github.com/SPIN-UMass/MeanSparse
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions
Authors:
Cheng Tan,
Dongxin Lyu,
Siyuan Li,
Zhangyang Gao,
Jingxuan Wei,
Siqi Ma,
Zicheng Liu,
Stan Z. Li
Abstract:
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-r…
▽ More
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers. We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources, including the top-tier conference and prestigious journal. This dataset is meticulously designed to facilitate the applications of LLMs for multi-turn dialogues, effectively simulating the complete peer-review process. Furthermore, we propose a series of metrics to evaluate the performance of LLMs for each role under this reformulated peer-review setting, ensuring fair and comprehensive evaluations. We believe this work provides a promising perspective on enhancing the LLM-driven peer-review process by incorporating dynamic, role-based interactions. It aligns closely with the iterative and interactive nature of real-world academic peer review, offering a robust foundation for future research and development in this area. We open-source the dataset at https://github.com/chengtan9907/ReviewMT.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.