subscribe to arXiv mailings

arXiv:2407.11921 [pdf, other]

IPA-NeRF: Illusory Poisoning Attack Against Neural Radiance Fields

Authors: Wenxiang Jiang, Hanwei Zhang, Shuo Zhao, Zhongwen Guo, Hao Wang

Abstract: Neural Radiance Field (NeRF) represents a significant advancement in computer vision, offering implicit neural network-based scene representation and novel view synthesis capabilities. Its applications span diverse fields including robotics, urban mapping, autonomous navigation, virtual reality/augmented reality, etc., some of which are considered high-risk AI applications. However, despite its wi… ▽ More Neural Radiance Field (NeRF) represents a significant advancement in computer vision, offering implicit neural network-based scene representation and novel view synthesis capabilities. Its applications span diverse fields including robotics, urban mapping, autonomous navigation, virtual reality/augmented reality, etc., some of which are considered high-risk AI applications. However, despite its widespread adoption, the robustness and security of NeRF remain largely unexplored. In this study, we contribute to this area by introducing the Illusory Poisoning Attack against Neural Radiance Fields (IPA-NeRF). This attack involves embedding a hidden backdoor view into NeRF, allowing it to produce predetermined outputs, i.e. illusory, when presented with the specified backdoor view while maintaining normal performance with standard inputs. Our attack is specifically designed to deceive users or downstream models at a particular position while ensuring that any abnormalities in NeRF remain undetectable from other viewpoints. Experimental results demonstrate the effectiveness of our Illusory Poisoning Attack, successfully presenting the desired illusory on the specified viewpoint without impacting other views. Notably, we achieve this attack by introducing small perturbations solely to the training set. The code can be found at https://github.com/jiang-wenxiang/IPA-NeRF. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.10892 [pdf, other]

First Measurement of Solar $^8$B Neutrino Flux through Coherent Elastic Neutrino-Nucleus Scattering in PandaX-4T

Authors: PandaX Collaboration, Zihao Bo, Wei Chen, Xun Chen, Yunhua Chen, Zhaokan Cheng, Xiangyi Cui, Yingjie Fan, Deqing Fang, Zhixing Gao, Lisheng Geng, Karl Giboni, Xunan Guo, Xuyuan Guo, Zichao Guo, Chencheng Han, Ke Han, Changda He, Jinrong He, Di Huang, Houqi Huang, Junting Huang, Ruquan Hou, Yu Hou, Xiangdong Ji , et al. (77 additional authors not shown)

Abstract: The PandaX-4T liquid xenon detector at the China Jinping Underground Laboratory is used to measure the solar $^8$B neutrino flux by detecting neutrinos through coherent scattering with xenon nuclei. Data samples requiring the coincidence of scintillation and ionization signals (paired), as well as unpaired ionization-only signals (US2), are selected with energy threshold of approximately 1.1 keV (… ▽ More The PandaX-4T liquid xenon detector at the China Jinping Underground Laboratory is used to measure the solar $^8$B neutrino flux by detecting neutrinos through coherent scattering with xenon nuclei. Data samples requiring the coincidence of scintillation and ionization signals (paired), as well as unpaired ionization-only signals (US2), are selected with energy threshold of approximately 1.1 keV (0.33 keV) nuclear recoil energy. Combining the commissioning run and the first science run of PandaX-4T, a total exposure of 1.25 and 1.04 tonne$\cdot$year are collected for the paired and US2, respectively. After unblinding, 3 and 332 events are observed with an expectation of 2.8$\pm$0.5 and 251$\pm$32 background events, for the paired and US2 data, respectively. A combined analysis yields a best-fit $^8$B neutrino signal of 3.5 (75) events from the paired (US2) data sample, with $\sim$37\% uncertainty, and the background-only hypothesis is disfavored at 2.64$σ$ significance. This gives a solar $^8$B neutrino flux of ($8.4\pm3.1$)$\times$10$^6$ cm$^{-2}$s$^{-1}$, consistent with the standard solar model prediction. This is the first indication of solar $^8$B neutrino ``fog'' in a dark matter direct detection experiment. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10759 [pdf, other]

Qwen2-Audio Technical Report

Authors: Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, Yuanjun Lv, Jinzheng He, Junyang Lin, Chang Zhou, Jingren Zhou

Abstract: We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data an… ▽ More We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data and tasks, and have further expanded the data volume. We have boosted the instruction-following capability of Qwen2-Audio and implemented two distinct audio interaction modes for voice chat and audio analysis. In the voice chat mode, users can freely engage in voice interactions with Qwen2-Audio without text input. In the audio analysis mode, users could provide audio and text instructions for analysis during the interaction. Note that we do not use any system prompts to switch between voice chat and audio analysis modes. Qwen2-Audio is capable of intelligently comprehending the content within audio and following voice commands to respond appropriately. For instance, in an audio segment that simultaneously contains sounds, multi-speaker conversations, and a voice command, Qwen2-Audio can directly understand the command and provide an interpretation and response to the audio. Additionally, DPO has optimized the model's performance in terms of factuality and adherence to desired behavior. According to the evaluation results from AIR-Bench, Qwen2-Audio outperformed previous SOTAs, such as Gemini-1.5-pro, in tests focused on audio-centric instruction-following capabilities. Qwen2-Audio is open-sourced with the aim of fostering the advancement of the multi-modal language community. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: https://github.com/QwenLM/Qwen2-Audio. Checkpoints, codes and scripts will be opensoursed soon

arXiv:2407.10438 [pdf]

Giant Anisotropic Magnetoresistance in Magnetic Monolayers CrPX3 (X = S, Se, Te) due to symmetry breaking between the in-plane and out-of-plane crystallographic axes

Authors: W. S. Hou, M. Q. Dong, Zhi-Xin Guo

Abstract: Anisotropic magnetoresistance (AMR) has a crucial feature for developing highly sensitive sensors and innovative memory devices. While extensively studied in bulk materials, AMR effects in these materials are typically weak. Recent advancements indicate that two-dimensional (2D) van der Waals magnetic materials possess unique magnetic properties, potentially including significant AMR characteristi… ▽ More Anisotropic magnetoresistance (AMR) has a crucial feature for developing highly sensitive sensors and innovative memory devices. While extensively studied in bulk materials, AMR effects in these materials are typically weak. Recent advancements indicate that two-dimensional (2D) van der Waals magnetic materials possess unique magnetic properties, potentially including significant AMR characteristics. In this study, we utilize density functional theory and the Boltzmann transport equation to investigate AMR in magnetic monolayers CrPX3 (X = S, Se, Te). Our findings reveal a substantially large AMR in these 2D magnetic compounds. This enhancement is attributed to magnetization (M)-dependent spin-orbit coupling (SOC), arising from the broken symmetry between in-plane and out-of-plane orientations. This results in significant M-dependent band splitting and subsequent variations in electron velocity. Additionally, we find that the M-dependent SOC is significantly enhanced by increasing the atomic number of the chalcogen X in CrPX3, achieving an exceptional 150% AMR in CrPTe3. Furthermore, our study demonstrates that AMR can be effectively modulated by applying biaxial strain, resulting in a twofold increase with a 4% strain. These findings propose a novel approach to enhancing 2D-based AMR spintronic devices, making a substantial contribution to the field. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 6 figures

arXiv:2407.10435 [pdf]

Nontrivial impact of interlayer coupling on thermal conductivity: opposing trends in in-plane and out-of-plane phonons

Authors: H. F. Feng, B. Liu, J. L. Bai, X. Zhang, Z. X. Song, Zhi-Xin Guo

Abstract: The study of heat transport in two-dimensional (2D) materials reveals novel behaviors due to quantum confinement effects, where in-plane and out-of-plane phonons play crucial roles. In 2D materials like graphene, it is widely recognized that the out-of-plane vibrational mode is the primary contributor to thermal conductivity owing to the mirror symmetry. Based on this perspective, the introduction… ▽ More The study of heat transport in two-dimensional (2D) materials reveals novel behaviors due to quantum confinement effects, where in-plane and out-of-plane phonons play crucial roles. In 2D materials like graphene, it is widely recognized that the out-of-plane vibrational mode is the primary contributor to thermal conductivity owing to the mirror symmetry. Based on this perspective, the introduction of interlayer coupling, which breaks this symmetry, is expected to induce a significant reduction in thermal conductivity within 2D materials. Nevertheless, recent studies have presented unexpected findings, indicating that interlayer coupling can actually increase thermal conductivity of 2D materials. This controversial result suggests a nontrivial underlying mechanism governing the effects of interlayer coupling on thermal conductivity in 2D materials, necessitating further exploration. In our work, we investigate the modulation of thermal conductivity through interlayer coupling in a sandwich structure composed of hexagonal boron nitride (h-BN) and bilayer graphene (BG), specifically a h- BN/BG/h-BN system. Through molecular dynamics simulations, we find that the thermal conductivity from out-of-plane phonons can be significantly reduced, while that from in-plane phonons can be significantly increased, as the interlayer coupling strength increases. This results in a nontrivial, coupling-strength-dependent overall thermal conductivity. The phonon spectrum analysis conducted using our modified package reveals that the upshift and flattening of the out-of-plane (ZA and ZO) phonon modes are mainly responsible for these variations, and the extent of the upshift and flattening is proportional to the strength of interlayer coupling. This work offers new insights into manipulating the thermal conductivity of 2D materials. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 4 figures

arXiv:2407.10052 [pdf, other]

Augmented Neural Fine-Tuning for Efficient Backdoor Purification

Authors: Nazmul Karim, Abdullah Al Arafat, Umar Khalid, Zhishan Guo, Nazanin Rahnavard

Abstract: Recent studies have revealed the vulnerability of deep neural networks (DNNs) to various backdoor attacks, where the behavior of DNNs can be compromised by utilizing certain types of triggers or poisoning mechanisms. State-of-the-art (SOTA) defenses employ too-sophisticated mechanisms that require either a computationally expensive adversarial search module for reverse-engineering the trigger dist… ▽ More Recent studies have revealed the vulnerability of deep neural networks (DNNs) to various backdoor attacks, where the behavior of DNNs can be compromised by utilizing certain types of triggers or poisoning mechanisms. State-of-the-art (SOTA) defenses employ too-sophisticated mechanisms that require either a computationally expensive adversarial search module for reverse-engineering the trigger distribution or an over-sensitive hyper-parameter selection module. Moreover, they offer sub-par performance in challenging scenarios, e.g., limited validation data and strong attacks. In this paper, we propose Neural mask Fine-Tuning (NFT) with an aim to optimally re-organize the neuron activities in a way that the effect of the backdoor is removed. Utilizing a simple data augmentation like MixUp, NFT relaxes the trigger synthesis process and eliminates the requirement of the adversarial search module. Our study further reveals that direct weight fine-tuning under limited validation data results in poor post-purification clean test accuracy, primarily due to overfitting issue. To overcome this, we propose to fine-tune neural masks instead of model weights. In addition, a mask regularizer has been devised to further mitigate the model drift during the purification process. The distinct characteristics of NFT render it highly efficient in both runtime and sample usage, as it can remove the backdoor even when a single sample is available from each class. We validate the effectiveness of NFT through extensive experiments covering the tasks of image classification, object detection, video action recognition, 3D point cloud, and natural language processing. We evaluate our method against 14 different attacks (LIRA, WaNet, etc.) on 11 benchmark data sets such as ImageNet, UCF101, Pascal VOC, ModelNet, OpenSubtitles2012, etc. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV 2024

arXiv:2407.08739 [pdf, other]

MAVIS: Mathematical Visual Instruction Tuning

Authors: Renrui Zhang, Xinyu Wei, Dongzhi Jiang, Yichi Zhang, Ziyu Guo, Chengzhuo Tong, Jiaming Liu, Aojun Zhou, Bin Wei, Shanghang Zhang, Peng Gao, Hongsheng Li

Abstract: Multi-modal Large Language Models (MLLMs) have recently emerged as a significant focus in academia and industry. Despite their proficiency in general multi-modal scenarios, the mathematical problem-solving capabilities in visual contexts remain insufficiently explored. We identify three key areas within MLLMs that need to be improved: visual encoding of math diagrams, diagram-language alignment, a… ▽ More Multi-modal Large Language Models (MLLMs) have recently emerged as a significant focus in academia and industry. Despite their proficiency in general multi-modal scenarios, the mathematical problem-solving capabilities in visual contexts remain insufficiently explored. We identify three key areas within MLLMs that need to be improved: visual encoding of math diagrams, diagram-language alignment, and mathematical reasoning skills. This draws forth an urgent demand for large-scale, high-quality data and training pipelines in visual mathematics. In this paper, we propose MAVIS, the first MAthematical VISual instruction tuning paradigm for MLLMs, involving a series of mathematical visual datasets and specialized MLLMs. Targeting the three issues, MAVIS contains three progressive training stages from scratch. First, we curate MAVIS-Caption, consisting of 558K diagram-caption pairs, to fine-tune a math-specific vision encoder (CLIP-Math) through contrastive learning, tailored for improved diagram visual encoding. Second, we utilize MAVIS-Caption to align the CLIP-Math with a large language model (LLM) by a projection layer, enhancing vision-language alignment in mathematical domains. Third, we introduce MAVIS-Instruct, including 900K meticulously collected and annotated visual math problems, which is adopted to finally instruct-tune the MLLM for robust mathematical reasoning skills. In MAVIS-Instruct, we incorporate complete chain-of-thought (CoT) rationales for each problem, and minimize textual redundancy, thereby concentrating the model towards the visual elements. Data and Models are released at https://github.com/ZrrSkywalker/MAVIS △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Work in progress. Data and Models are released at https://github.com/ZrrSkywalker/MAVIS

arXiv:2407.08680 [pdf, other]

Generalizable Implicit Motion Modeling for Video Frame Interpolation

Authors: Zujin Guo, Wei Li, Chen Change Loy

Abstract: Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we int… ▽ More Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. Specifically, to enable GIMM as an effective motion modeling paradigm, we design a motion encoding pipeline to model spatiotemporal motion latent from bidirectional flows extracted from pre-trained flow estimators, effectively representing input-specific motion priors. Then, we implicitly predict arbitrary-timestep optical flows within two adjacent input frames via an adaptive coordinate-based neural network, with spatiotemporal coordinates and motion latent as inputs. Our GIMM can be smoothly integrated with existing flow-based VFI works without further modifications. We show that GIMM performs better than the current state of the art on the VFI benchmarks. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Project Page: https://gseancdat.github.io/projects/GIMMVFI

arXiv:2407.08615 [pdf, other]

MgFNO: Multi-grid Architecture Fourier Neural Operator for Parametric Partial Differential Equations

Authors: Zi-Hao Guo, Hou-Biao Li

Abstract: In science and engineering, there is often a need to repeatedly solve large-scale and high-resolution partial differential equations (PDEs). Neural operators are a new type of models that can map between function spaces, allowing trained models to emulate the solution operators of PDEs. This paper introduces a novel Fourier neural operator with a multigrid architecture (MgFNO). The MgFNO combines… ▽ More In science and engineering, there is often a need to repeatedly solve large-scale and high-resolution partial differential equations (PDEs). Neural operators are a new type of models that can map between function spaces, allowing trained models to emulate the solution operators of PDEs. This paper introduces a novel Fourier neural operator with a multigrid architecture (MgFNO). The MgFNO combines the frequency principle of deep neural networks (DNNs) with the multigrid idea for solving linear systems. To speed up the training process of the FNO, a three-layer V-cycle multigrid architecture is used. This architecture involves training the model multiple times on a coarse grid and then transferring it to a fine grid to accelerate the training of the model. The DNN-based solver learns the solution from low to high frequency, while the multigrid method acquires the solution from high to low frequency. Note that the FNO is a resolution-invariant solution operator, therefore the corresponding calculations are greatly simplified. Finally, experiments are conducted on Burgers' equation, Darcy flow, and Navier-Stokes equation. The results demonstrate that the proposed MgFNO outperforms the traditional Fourier neural operator. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 29 pages, 15 figures, 4 tables

MSC Class: 65M55; 65M22 ACM Class: G.1.8; G.1.3

arXiv:2407.07670 [pdf, ps, other]

Stochastic Gradient Descent for Two-layer Neural Networks

Authors: Dinghao Cao, Zheng-Chu Guo, Lei Shi

Abstract: This paper presents a comprehensive study on the convergence rates of the stochastic gradient descent (SGD) algorithm when applied to overparameterized two-layer neural networks. Our approach combines the Neural Tangent Kernel (NTK) approximation with convergence analysis in the Reproducing Kernel Hilbert Space (RKHS) generated by NTK, aiming to provide a deep understanding of the convergence beha… ▽ More This paper presents a comprehensive study on the convergence rates of the stochastic gradient descent (SGD) algorithm when applied to overparameterized two-layer neural networks. Our approach combines the Neural Tangent Kernel (NTK) approximation with convergence analysis in the Reproducing Kernel Hilbert Space (RKHS) generated by NTK, aiming to provide a deep understanding of the convergence behavior of SGD in overparameterized two-layer neural networks. Our research framework enables us to explore the intricate interplay between kernel methods and optimization processes, shedding light on the optimization dynamics and convergence properties of neural networks. In this study, we establish sharp convergence rates for the last iterate of the SGD algorithm in overparameterized two-layer neural networks. Additionally, we have made significant advancements in relaxing the constraints on the number of neurons, which have been reduced from exponential dependence to polynomial dependence on the sample size or number of iterations. This improvement allows for more flexibility in the design and scaling of neural networks, and will deepen our theoretical understanding of neural network models trained with SGD. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07299 [pdf, ps, other]

Random Reed-Solomon Codes Achieve the Half-Singleton Bound for Insertions and Deletions over Linear-Sized Alphabets

Authors: Roni Con, Zeyu Guo, Ray Li, Zihan Zhang

Abstract: In this paper, we prove that with high probability, random Reed-Solomon codes approach the half-Singleton bound - the optimal rate versus error tradeoff for linear insdel codes - with linear-sized alphabets. More precisely, we prove that, for any $ε>0$ and positive integers $n$ and $k$, with high probability, random Reed--Solomon codes of length $n$ and dimension $k$ can correct… ▽ More In this paper, we prove that with high probability, random Reed-Solomon codes approach the half-Singleton bound - the optimal rate versus error tradeoff for linear insdel codes - with linear-sized alphabets. More precisely, we prove that, for any $ε>0$ and positive integers $n$ and $k$, with high probability, random Reed--Solomon codes of length $n$ and dimension $k$ can correct $(1-\varepsilon)n-2k+1$ adversarial insdel errors over alphabets of size $n+2^{\mathsf{poly}(1/\varepsilon)}k$. This significantly improves upon the alphabet size demonstrated in the work of Con, Shpilka, and Tamo (IEEE TIT, 2023), who showed the existence of Reed--Solomon codes with exponential alphabet size $\widetilde O\left(\binom{n}{2k-1}^2\right)$ precisely achieving the half-Singleton bound. Our methods are inspired by recent works on list-decoding Reed-Solomon codes. Brakensiek-Gopi-Makam (STOC 2023) showed that random Reed-Solomon codes are list-decodable up to capacity with exponential-sized alphabets, and Guo-Zhang (FOCS 2023) and Alrabiah-Guruswami-Li (STOC 2024) improved the alphabet-size to linear. We achieve a similar alphabet-size reduction by similarly establishing strong bounds on the probability that certain random rectangular matrices are full rank. To accomplish this in our insdel context, our proof combines the random matrix techniques from list-decoding with structural properties of Longest Common Subsequences. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06815 [pdf, other]

Searching Accretion-Enhanced Dark Matter Annihilation Signals in the Galactic Centre

Authors: Mei-Wen Yang, Zhi-Qi Guo, Xiao-Yi Luo, Zhao-Qiang Shen, Zi-Qing Xia, Chih-Ting Lu, Yue-Lin Sming Tsai, Yi-Zhong Fan

Abstract: This study reanalyzes the detection prospects of dark matter (DM) annihilation signals in the Galactic Center, focusing on velocity-dependent dynamics within a spike density near the supermassive black hole (Sgr~A$^{\star}$). We investigate three annihilation processes -- $p$-wave, resonance, and forbidden annihilation -- under semi-relativistic velocities, leveraging gamma-ray data from Fermi and… ▽ More This study reanalyzes the detection prospects of dark matter (DM) annihilation signals in the Galactic Center, focusing on velocity-dependent dynamics within a spike density near the supermassive black hole (Sgr~A$^{\star}$). We investigate three annihilation processes -- $p$-wave, resonance, and forbidden annihilation -- under semi-relativistic velocities, leveraging gamma-ray data from Fermi and DAMPE telescopes. Our analysis integrates a fermionic DM model with an electroweak axion-like particle (ALP) portal, exploring annihilation into two or four photons. Employing a comprehensive six-dimensional integration, we precisely calculate DM-induced gamma-ray fluxes near Sgr~A$^{\star}$, incorporating velocity and positional dependencies in the annihilation cross-section and photon yield spectra. Our findings highlight scenarios of resonance and forbidden annihilation, where the larger ALP-DM-DM coupling constant $C_{aχχ}$ can affect spike density, potentially yielding detectable gamma-ray line spectra within Fermi and DAMPE energy resolution. We set upper limits for $C_{aχχ}$ across these scenarios, offering insights into the detectability and spectral characteristics of DM annihilation signals from the Galactic Center. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06754 [pdf, other]

Threats and Defenses in Federated Learning Life Cycle: A Comprehensive Survey and Challenges

Authors: Yanli Li, Zhongliang Guo, Nan Yang, Huaming Chen, Dong Yuan, Weiping Ding

Abstract: Federated Learning (FL) offers innovative solutions for privacy-preserving collaborative machine learning (ML). Despite its promising potential, FL is vulnerable to various attacks due to its distributed nature, affecting the entire life cycle of FL services. These threats can harm the model's utility or compromise participants' privacy, either directly or indirectly. In response, numerous defense… ▽ More Federated Learning (FL) offers innovative solutions for privacy-preserving collaborative machine learning (ML). Despite its promising potential, FL is vulnerable to various attacks due to its distributed nature, affecting the entire life cycle of FL services. These threats can harm the model's utility or compromise participants' privacy, either directly or indirectly. In response, numerous defense frameworks have been proposed, demonstrating effectiveness in specific settings and scenarios. To provide a clear understanding of the current research landscape, this paper reviews the most representative and state-of-the-art threats and defense frameworks throughout the FL service life cycle. We start by identifying FL threats that harm utility and privacy, including those with potential or direct impacts. Then, we dive into the defense frameworks, analyze the relationship between threats and defenses, and compare the trade-offs among different defense strategies. Finally, we summarize current research bottlenecks and offer insights into future research directions to conclude this survey. We hope this survey sheds light on trustworthy FL research and contributes to the FL community. △ Less

Submitted 11 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06297 [pdf, other]

SGOR: Outlier Removal by Leveraging Semantic and Geometric Information for Robust Point Cloud Registration

Authors: Guiyu Zhao, Zhentao Guo, Hongbin Ma

Abstract: In this paper, we introduce a new outlier removal method that fully leverages geometric and semantic information, to achieve robust registration. Current semantic-based registration methods only use semantics for point-to-point or instance semantic correspondence generation, which has two problems. First, these methods are highly dependent on the correctness of semantics. They perform poorly in sc… ▽ More In this paper, we introduce a new outlier removal method that fully leverages geometric and semantic information, to achieve robust registration. Current semantic-based registration methods only use semantics for point-to-point or instance semantic correspondence generation, which has two problems. First, these methods are highly dependent on the correctness of semantics. They perform poorly in scenarios with incorrect semantics and sparse semantics. Second, the use of semantics is limited only to the correspondence generation, resulting in bad performance in the weak geometry scene. To solve these problems, on the one hand, we propose secondary ground segmentation and loose semantic consistency based on regional voting. It improves the robustness to semantic correctness by reducing the dependence on single-point semantics. On the other hand, we propose semantic-geometric consistency for outlier removal, which makes full use of semantic information and significantly improves the quality of correspondences. In addition, a two-stage hypothesis verification is proposed, which solves the problem of incorrect transformation selection in the weak geometry scene. In the outdoor dataset, our method demonstrates superior performance, boosting a 22.5 percentage points improvement in registration recall and achieving better robustness under various conditions. Our code is available. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Accepted by IROS 2024

arXiv:2407.06115 [pdf, other]

Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline

Authors: Qi Jia, Baoyu Fan, Cong Xu, Lu Liu, Liang Jin, Guoguang Du, Zhenhua Guo, Yaqian Zhao, Xuanjing Huang, Rengang Li

Abstract: Existing video multi-modal sentiment analysis mainly focuses on the sentiment expression of people within the video, yet often neglects the induced sentiment of viewers while watching the videos. Induced sentiment of viewers is essential for inferring the public response to videos, has broad application in analyzing public societal sentiment, effectiveness of advertising and other areas. The micro… ▽ More Existing video multi-modal sentiment analysis mainly focuses on the sentiment expression of people within the video, yet often neglects the induced sentiment of viewers while watching the videos. Induced sentiment of viewers is essential for inferring the public response to videos, has broad application in analyzing public societal sentiment, effectiveness of advertising and other areas. The micro videos and the related comments provide a rich application scenario for viewers induced sentiment analysis. In light of this, we introduces a novel research task, Multi-modal Sentiment Analysis for Comment Response of Video Induced(MSA-CRVI), aims to inferring opinions and emotions according to the comments response to micro video. Meanwhile, we manually annotate a dataset named Comment Sentiment toward to Micro Video (CSMV) to support this research. It is the largest video multi-modal sentiment dataset in terms of scale and video duration to our knowledge, containing 107,267 comments and 8,210 micro videos with a video duration of 68.83 hours. To infer the induced sentiment of comment should leverage the video content, so we propose the Video Content-aware Comment Sentiment Analysis (VC-CSA) method as baseline to address the challenges inherent in this new task. Extensive experiments demonstrate that our method is showing significant improvements over other established baselines. △ Less

Submitted 15 May, 2024; originally announced July 2024.

arXiv:2407.05374 [pdf, other]

Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition

Authors: Zirun Guo, Tao Jin, Zhou Zhao

Abstract: The development of multimodal models has significantly advanced multimodal sentiment analysis and emotion recognition. However, in real-world applications, the presence of various missing modality cases often leads to a degradation in the model's performance. In this work, we propose a novel multimodal Transformer framework using prompt learning to address the issue of missing modalities. Our meth… ▽ More The development of multimodal models has significantly advanced multimodal sentiment analysis and emotion recognition. However, in real-world applications, the presence of various missing modality cases often leads to a degradation in the model's performance. In this work, we propose a novel multimodal Transformer framework using prompt learning to address the issue of missing modalities. Our method introduces three types of prompts: generative prompts, missing-signal prompts, and missing-type prompts. These prompts enable the generation of missing modality features and facilitate the learning of intra- and inter-modality information. Through prompt learning, we achieve a substantial reduction in the number of trainable parameters. Our proposed method outperforms other methods significantly across all evaluation metrics. Extensive experiments and ablation studies are conducted to demonstrate the effectiveness and robustness of our method, showcasing its ability to effectively handle missing modalities. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Accepted to ACL 2024 Main

arXiv:2407.05283 [pdf, other]

SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning

Authors: Yi Feng, Zizhan Guo, Qijun Chen, Rui Fan

Abstract: Unsupervised monocular depth estimation frameworks have shown promising performance in autonomous driving. However, existing solutions primarily rely on a simple convolutional neural network for ego-motion recovery, which struggles to estimate precise camera poses in dynamic, complicated real-world scenarios. These inaccurately estimated camera poses can inevitably deteriorate the photometric reco… ▽ More Unsupervised monocular depth estimation frameworks have shown promising performance in autonomous driving. However, existing solutions primarily rely on a simple convolutional neural network for ego-motion recovery, which struggles to estimate precise camera poses in dynamic, complicated real-world scenarios. These inaccurately estimated camera poses can inevitably deteriorate the photometric reconstruction and mislead the depth estimation networks with wrong supervisory signals. In this article, we introduce SCIPaD, a novel approach that incorporates spatial clues for unsupervised depth-pose joint learning. Specifically, a confidence-aware feature flow estimator is proposed to acquire 2D feature positional translations and their associated confidence levels. Meanwhile, we introduce a positional clue aggregator, which integrates pseudo 3D point clouds from DepthNet and 2D feature flows into homogeneous positional representations. Finally, a hierarchical positional embedding injector is proposed to selectively inject spatial clues into semantic features for robust camera pose decoding. Extensive experiments and analyses demonstrate the superior performance of our model compared to other state-of-the-art methods. Remarkably, SCIPaD achieves a reduction of 22.2\% in average translation error and 34.8\% in average angular error for camera pose estimation task on the KITTI Odometry dataset. Our source code is available at \url{https://mias.group/SCIPaD}. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Accepted by IEEE Transactions on Intelligent Vehicles. Code is available at https://mias.group/SCIPaD

arXiv:2407.05198 [pdf, other]

Medfluencer: A Network Representation of Medical Influencers' Identities and Discourse on Social Media

Authors: Zhijin Guo, Edwin Simpson, Roberta Bernardi

Abstract: In our study, we first constructed a dataset from the tweets of the top 100 medical influencers with the highest Influencer Score during the COVID-19 pandemic. This dataset was then used to construct a socio-semantic network, mapping both their identities and key topics, which are crucial for understanding their impact on public health discourse. To achieve this, we developed a few-shot multi-labe… ▽ More In our study, we first constructed a dataset from the tweets of the top 100 medical influencers with the highest Influencer Score during the COVID-19 pandemic. This dataset was then used to construct a socio-semantic network, mapping both their identities and key topics, which are crucial for understanding their impact on public health discourse. To achieve this, we developed a few-shot multi-label classifier to identify influencers and their network actors' identities, employed BERTopic for extracting thematic content, and integrated these components into a network model to analyze their impact on health discourse. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: ACM SIGKDD 2024 Workshop epiDAMIK 2024: The 7th International Workshop on Epidemiology meets Data Mining and Knowledge Discovery

arXiv:2407.04347 [pdf, other]

On a nonlinear nonlocal reaction-diffusion system applied to image restoration

Authors: Yuhang Li, Zhichang Guo, Jingfeng Shao, Boying Wu

Abstract: This paper deals with a novel nonlinear coupled nonlocal reaction-diffusion system proposed for image restoration, characterized by the advantages of preserving low gray level features and textures.The gray level indicator in the proposed model is regularized using a new method based on porous media type equations, which is suitable for recovering noisy blurred images. The well-posedness, regulari… ▽ More This paper deals with a novel nonlinear coupled nonlocal reaction-diffusion system proposed for image restoration, characterized by the advantages of preserving low gray level features and textures.The gray level indicator in the proposed model is regularized using a new method based on porous media type equations, which is suitable for recovering noisy blurred images. The well-posedness, regularity, and other properties of the model are investigated, addressing the lack of theoretical analysis in those existing similar types of models. Numerical experiments conducted on texture and satellite images demonstrate the effectiveness of the proposed model in denoising and deblurring tasks. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 28 pages,7 figures

arXiv:2407.04284 [pdf, other]

TSC-PCAC: Voxel Transformer and Sparse Convolution Based Point Cloud Attribute Compression for 3D Broadcasting

Authors: Zixi Guo, Yun Zhang, Linwei Zhu, Hanli Wang, Gangyi Jiang

Abstract: Point cloud has been the mainstream representation for advanced 3D applications, such as virtual reality and augmented reality. However, the massive data amounts of point clouds is one of the most challenging issues for transmission and storage. In this paper, we propose an end-to-end voxel Transformer and Sparse Convolution based Point Cloud Attribute Compression (TSC-PCAC) for 3D broadcasting. F… ▽ More Point cloud has been the mainstream representation for advanced 3D applications, such as virtual reality and augmented reality. However, the massive data amounts of point clouds is one of the most challenging issues for transmission and storage. In this paper, we propose an end-to-end voxel Transformer and Sparse Convolution based Point Cloud Attribute Compression (TSC-PCAC) for 3D broadcasting. Firstly, we present a framework of the TSC-PCAC, which include Transformer and Sparse Convolutional Module (TSCM) based variational autoencoder and channel context module. Secondly, we propose a two-stage TSCM, where the first stage focuses on modeling local dependencies and feature representations of the point clouds, and the second stage captures global features through spatial and channel pooling encompassing larger receptive fields. This module effectively extracts global and local interpoint relevance to reduce informational redundancy. Thirdly, we design a TSCM based channel context module to exploit interchannel correlations, which improves the predicted probability distribution of quantized latent representations and thus reduces the bitrate. Experimental results indicate that the proposed TSC-PCAC method achieves an average of 38.53%, 21.30%, and 11.19% Bjontegaard Delta bitrate reductions compared to the Sparse-PCAC, NF-PCAC, and G-PCC v23 methods, respectively. The encoding/decoding time costs are reduced up to 97.68%/98.78% on average compared to the Sparse-PCAC. The source code and the trained models of the TSC-PCAC are available at https://github.com/igizuxo/TSC-PCAC. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.03907 [pdf, other]

The pseudospectrum and transient of Kaluza-Klein black holes in Einstein-Gauss-Bonnet gravity

Authors: Jia-Ning Chen, Liang-Bi Wu, Zong-Kuan Guo

Abstract: The spectrum and dynamical instability, as well as the transient effect of the tensor perturbation for the so-called Maeda-Dadhich black hole, a type of Kaluza-Klein black hole, in Einstein-Gauss-Bonnet gravity have been investigated in framework of pseudospectrum. We cast the problem of solving quasinormal modes (QNMs) in AdS-like spacetime as the linear evolution problem of the non-normal operat… ▽ More The spectrum and dynamical instability, as well as the transient effect of the tensor perturbation for the so-called Maeda-Dadhich black hole, a type of Kaluza-Klein black hole, in Einstein-Gauss-Bonnet gravity have been investigated in framework of pseudospectrum. We cast the problem of solving quasinormal modes (QNMs) in AdS-like spacetime as the linear evolution problem of the non-normal operator in null slicing by using ingoing Eddington-Finkelstein coordinates. In terms of spectrum instability, based on the generalised eigenvalue problem, the QNM spectrum and $ε$-pseudospectrum has been studied, while the open structure of $ε$-pseudospectrum caused by the non-normality of operator indicates the spectrum instability. In terms of dynamical instability, the concept of the distance to dynamical instability has been firstly introduced in gravity aspects, which plays a crucial role in bridging the spectrum instability and the dynamical instability. We calculate the distance, named the complex stability radius, as parameters vary. Finally, we report the transient effect in the energy norm of the evolution operator, whose behaviour can be roughly reflected by the three kinds of abscissas in context of pseudospectrum, and explain the relationship between the waveform and the transient effect. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 22 pages, 14 figures, 1 table

arXiv:2407.02873 [pdf, other]

Robot Shape and Location Retention in Video Generation Using Diffusion Models

Authors: Peng Wang, Zhihao Guo, Abdul Latheef Sait, Minh Huy Pham

Abstract: Diffusion models have marked a significant milestone in the enhancement of image and video generation technologies. However, generating videos that precisely retain the shape and location of moving objects such as robots remains a challenge. This paper presents diffusion models specifically tailored to generate videos that accurately maintain the shape and location of mobile robots. This developme… ▽ More Diffusion models have marked a significant milestone in the enhancement of image and video generation technologies. However, generating videos that precisely retain the shape and location of moving objects such as robots remains a challenge. This paper presents diffusion models specifically tailored to generate videos that accurately maintain the shape and location of mobile robots. This development offers substantial benefits to those working on detecting dangerous interactions between humans and robots by facilitating the creation of training data for collision detection models, circumventing the need for collecting data from the real world, which often involves legal and ethical issues. Our models incorporate techniques such as embedding accessible robot pose information and applying semantic mask regulation within the ConvNext backbone network. These techniques are designed to refine intermediate outputs, therefore improving the retention performance of shape and location. Through extensive experimentation, our models have demonstrated notable improvements in maintaining the shape and location of different robots, as well as enhancing overall video generation quality, compared to the benchmark diffusion model. Codes will be opensourced at \href{https://github.com/PengPaulWang/diffusion-robots}{Github}. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 8 pages, 10 figures

arXiv:2407.01306 [pdf, other]

Unveiling the Unseen: Exploring Whitebox Membership Inference through the Lens of Explainability

Authors: Chenxi Li, Abhinav Kumar, Zhen Guo, Jie Hou, Reza Tourani

Abstract: The increasing prominence of deep learning applications and reliance on personalized data underscore the urgent need to address privacy vulnerabilities, particularly Membership Inference Attacks (MIAs). Despite numerous MIA studies, significant knowledge gaps persist, particularly regarding the impact of hidden features (in isolation) on attack efficacy and insufficient justification for the root… ▽ More The increasing prominence of deep learning applications and reliance on personalized data underscore the urgent need to address privacy vulnerabilities, particularly Membership Inference Attacks (MIAs). Despite numerous MIA studies, significant knowledge gaps persist, particularly regarding the impact of hidden features (in isolation) on attack efficacy and insufficient justification for the root causes of attacks based on raw data features. In this paper, we aim to address these knowledge gaps by first exploring statistical approaches to identify the most informative neurons and quantifying the significance of the hidden activations from the selected neurons on attack accuracy, in isolation and combination. Additionally, we propose an attack-driven explainable framework by integrating the target and attack models to identify the most influential features of raw data that lead to successful membership inference attacks. Our proposed MIA shows an improvement of up to 26% on state-of-the-art MIA. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 20 pages, 10 figures, 4 tables

arXiv:2407.00468 [pdf, other]

MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

Authors: Jinsheng Huang, Liang Chen, Taian Guo, Fu Zeng, Yusheng Zhao, Bohan Wu, Ye Yuan, Haozhe Zhao, Zhihui Guo, Yichi Zhang, Jingyang Yuan, Wei Ju, Luchen Liu, Tianyu Liu, Baobao Chang, Ming Zhang

Abstract: Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial p… ▽ More Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial performance, undermining the credibility of these evaluations. To address this issue while maintaining the efficiency of MCQ evaluations, we propose MMEvalPro, a benchmark designed to avoid Type-I errors through a trilogy evaluation pipeline and more rigorous metrics. For each original question from existing benchmarks, human annotators augment it by creating one perception question and one knowledge anchor question through a meticulous annotation process. MMEvalPro comprises $2,138$ question triplets, totaling $6,414$ distinct questions. Two-thirds of these questions are manually labeled by human experts, while the rest are sourced from existing benchmarks (MMMU, ScienceQA, and MathVista). Compared with the existing benchmarks, our experiments with the latest LLMs and LMMs demonstrate that MMEvalPro is more challenging (the best LMM lags behind human performance by $31.73\%$, compared to an average gap of $8.03\%$ in previous benchmarks) and more trustworthy (the best LLM trails the best LMM by $23.09\%$, whereas the gap for previous benchmarks is just $14.64\%$). Our in-depth analysis explains the reason for the large performance gap and justifies the trustworthiness of evaluation, underscoring its significant potential for advancing future research. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 21 pages, code released at https://github.com/chenllliang/MMEvalPro, Homepage at https://mmevalpro.github.io/

arXiv:2406.18082 [pdf, other]

Octo-planner: On-device Language Model for Planner-Action Agents

Authors: Wei Chen, Zhiyuan Li, Zhen Guo, Yikang Shen

Abstract: AI agents have become increasingly significant in various domains, enabling autonomous decision-making and problem-solving. To function effectively, these agents require a planning process that determines the best course of action and then executes the planned actions. In this paper, we present an efficient on-device Planner-Action framework that separates planning and action execution into two di… ▽ More AI agents have become increasingly significant in various domains, enabling autonomous decision-making and problem-solving. To function effectively, these agents require a planning process that determines the best course of action and then executes the planned actions. In this paper, we present an efficient on-device Planner-Action framework that separates planning and action execution into two distinct components: a planner agent based on Phi-3 Mini, a 3.8 billion parameter LLM optimized for edge devices, and an action agent using the Octopus model for function execution. The planner agent first responds to user queries by decomposing tasks into a sequence of sub-steps, which are then executed by the action agent. To optimize performance on resource-constrained devices, we employ model fine-tuning instead of in-context learning, reducing computational costs and energy consumption while improving response times. Our approach involves using GPT-4 to generate diverse planning queries and responses based on available functions, with subsequent validations to ensure data quality. We fine-tune the Phi-3 Mini model on this curated dataset, achieving a 97\% success rate in our in-domain test environment. To address multi-domain planning challenges, we developed a multi-LoRA training method that merges weights from LoRAs trained on distinct function subsets. This approach enables flexible handling of complex, multi-domain queries while maintaining computational efficiency on resource-constrained devices. To support further research, we have open-sourced our model weights at \url{https://huggingface.co/NexaAIDev/octopus-planning}. For the demo, please refer to \url{https://www.nexa4ai.com/octo-planner}. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.16646 [pdf, other]

The VISTA Variables in the Vía Láctea eXtended (VVVX) ESO public survey: Completion of the observations and legacy

Authors: R. K. Saito, M. Hempel, J. Alonso-García, P. W. Lucas, D. Minniti, S. Alonso, L. Baravalle, J. Borissova, C. Caceres, A. N. Chené, N. J. G. Cross, F. Duplancic, E. R. Garro, M. Gómez, V. D. Ivanov, R. Kurtev, A. Luna, D. Majaess, M. G. Navarro, J. B. Pullen, M. Rejkuba, J. L. Sanders, L. C. Smith, P. H. C. Albino, M. V. Alonso , et al. (121 additional authors not shown)

Abstract: The ESO public survey VISTA Variables in the Vía Láctea (VVV) surveyed the inner Galactic bulge and the adjacent southern Galactic disk from $2009-2015$. Upon its conclusion, the complementary VVV eXtended (VVVX) survey has expanded both the temporal as well as spatial coverage of the original VVV area, widening it from $562$ to $1700$ sq. deg., as well as providing additional epochs in… ▽ More The ESO public survey VISTA Variables in the Vía Láctea (VVV) surveyed the inner Galactic bulge and the adjacent southern Galactic disk from $2009-2015$. Upon its conclusion, the complementary VVV eXtended (VVVX) survey has expanded both the temporal as well as spatial coverage of the original VVV area, widening it from $562$ to $1700$ sq. deg., as well as providing additional epochs in $JHK_{\rm s}$ filters from $2016-2023$. With the completion of VVVX observations during the first semester of 2023, we present here the observing strategy, a description of data quality and access, and the legacy of VVVX. VVVX took $\sim 2000$ hours, covering about 4% of the sky in the bulge and southern disk. VVVX covered most of the gaps left between the VVV and the VISTA Hemisphere Survey (VHS) areas and extended the VVV time baseline in the obscured regions affected by high extinction and hence hidden from optical observations. VVVX provides a deep $JHK_{\rm s}$ catalogue of $\gtrsim 1.5\times10^9$ point sources, as well as a $K_{\rm s}$ band catalogue of $\sim 10^7$ variable sources. Within the existing VVV area, we produced a $5D$ map of the surveyed region by combining positions, distances, and proper motions of well-understood distance indicators such as red clump stars, RR Lyrae, and Cepheid variables. In March 2023 we successfully finished the VVVX survey observations that started in 2016, an accomplishment for ESO Paranal Observatory upon 4200 hours of observations for VVV+VVVX. The VVV+VVVX catalogues complement those from the Gaia mission at low Galactic latitudes and provide spectroscopic targets for the forthcoming ESO high-multiplex spectrographs MOONS and 4MOST. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 17 pages, 11 figures (+ appendix). Accepted for publication in Astronomy and Astrophysics in section 14: Catalogs and data

arXiv:2406.16062 [pdf, other]

Towards Biologically Plausible Computing: A Comprehensive Comparison

Authors: Changze Lv, Yufei Gu, Zhengkang Guo, Zhibo Xu, Yixin Wu, Feiran Zhang, Tianyuan Shi, Zhenghua Wang, Ruicheng Yin, Yu Shang, Siqi Zhong, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Jianhao Zhu, Cenyuan Zhang, Zixuan Ling, Xiaoqing Zheng

Abstract: Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, gl… ▽ More Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, global error computation, and dual-phase training. To address this long-standing challenge, many studies have endeavored to devise biologically plausible training algorithms. However, a fully biologically plausible algorithm for training multilayer neural networks remains elusive, and interpretations of biological plausibility vary among researchers. In this study, we establish criteria for biological plausibility that a desirable learning algorithm should meet. Using these criteria, we evaluate a range of existing algorithms considered to be biologically plausible, including Hebbian learning, spike-timing-dependent plasticity, feedback alignment, target propagation, predictive coding, forward-forward algorithm, perturbation learning, local losses, and energy-based learning. Additionally, we empirically evaluate these algorithms across diverse network architectures and datasets. We compare the feature representations learned by these algorithms with brain activity recorded by non-invasive devices under identical stimuli, aiming to identify which algorithm can most accurately replicate brain activity patterns. We are hopeful that this study could inspire the development of new biologically plausible algorithms for training multilayer networks, thereby fostering progress in both the fields of neuroscience and machine learning. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.15678 [pdf, other]

Oscillation Frequencies of Moderately Rotating Delta Scuti Stars: Asymmetric Mode Splittings Due to Non-spherical Distortion

Authors: Zhao Guo, Timothy R. Bedding, A. A. Pamyatnykh, Donald W. Kurtz, Gang Li, Anuj Gautam, Simon J. Murphy, Conny Aerts

Abstract: We find that the observed pressure-mode rotational splittings of slowly/moderately rotating Delta Scuti stars and Beta Cephei stars mostly have a positive asymmetry. That is, the left frequency spacing is larger than the right spacing in the dipole mode splitting triplets and the $l=2$ mode splitting multiplets (considering $m=1, 0, -1$ modes only). This is in agreement with the second-order pertu… ▽ More We find that the observed pressure-mode rotational splittings of slowly/moderately rotating Delta Scuti stars and Beta Cephei stars mostly have a positive asymmetry. That is, the left frequency spacing is larger than the right spacing in the dipole mode splitting triplets and the $l=2$ mode splitting multiplets (considering $m=1, 0, -1$ modes only). This is in agreement with the second-order perturbative effect of the rotational non-spherical distortion: both the prograde and retrograde modes have their frequencies shifted towards lower values relative to the $m=0$ modes. We thus study the rotational perturbation both in the first and second order, as well as the near-degeneracy mode coupling effect in MESA models representing Delta Scuti stars. For faster rotators, the near-degeneracy mode coupling between the nearest radial and quadrupole modes can significantly shift the $m=0$ modes, reduce the splitting asymmetry, and even change its sign. We find the theoretical splitting asymmetry from the second-order non-spherical distortion is larger than observed asymmetry. To facilitate future detections, we predict correlations between splitting asymmetry, splitting amplitude, and pulsation frequency. We also discuss additional factors that can influence splitting asymmetry, including embedded magnetic fields, resonant mode coupling, and binarity. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: MNRAS, submitted

arXiv:2406.14653 [pdf, other]

LLM Granularity for On-the-Fly Robot Control

Authors: Peng Wang, Mattia Robbiani, Zhihao Guo

Abstract: Assistive robots have attracted significant attention due to their potential to enhance the quality of life for vulnerable individuals like the elderly. The convergence of computer vision, large language models, and robotics has introduced the `visuolinguomotor' mode for assistive robots, where visuals and linguistics are incorporated into assistive robots to enable proactive and interactive assis… ▽ More Assistive robots have attracted significant attention due to their potential to enhance the quality of life for vulnerable individuals like the elderly. The convergence of computer vision, large language models, and robotics has introduced the `visuolinguomotor' mode for assistive robots, where visuals and linguistics are incorporated into assistive robots to enable proactive and interactive assistance. This raises the question: \textit{In circumstances where visuals become unreliable or unavailable, can we rely solely on language to control robots, i.e., the viability of the `linguomotor` mode for assistive robots?} This work takes the initial steps to answer this question by: 1) evaluating the responses of assistive robots to language prompts of varying granularities; and 2) exploring the necessity and feasibility of controlling the robot on-the-fly. We have designed and conducted experiments on a Sawyer cobot to support our arguments. A Turtlebot robot case is designed to demonstrate the adaptation of the solution to scenarios where assistive robots need to maneuver to assist. Codes will be released on GitHub soon to benefit the community. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13975 [pdf, other]

MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

Authors: Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia

Abstract: Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we pr… ▽ More Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we present a process-based benchmark MR-BEN that demands a meta reasoning skill, where LMs are asked to locate and analyse potential errors in automatically generated reasoning steps. MR-BEN is a comprehensive benchmark comprising 5,975 questions collected from human experts, covering various subjects such as physics, chemistry, logic, coding, and more. Through our designed metrics for assessing meta-reasoning on this benchmark, we identify interesting limitations and weaknesses of current LLMs (open-source and closed-source models). For example, open-source models are seemingly comparable to GPT-4 on outcome-based benchmarks, but they lag far behind on our benchmark, revealing the underlying reasoning capability gap between them. Our dataset and codes are available on https://randolph-zeng.github.io/Mr-Ben.github.io/. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13958 [pdf]

Symmetry engineering in 2D bioelectronics facilitating augmented biosensing interfaces

Authors: Yizhang Wu, Yihan Liu, Yuan Li, Ziquan Wei, Sicheng Xing, Yunlang Wang, Dashuai Zhu, Ziheng Guo, Anran Zhang, Gongkai Yuan, Zhibo Zhang, Ke Huang, Yong Wang, Guorong Wu, Ke Cheng, Wubin Bai

Abstract: Symmetry lies at the heart of 2D bioelectronics, determining material properties at the fundamental level. Breaking the symmetry allows emergent functionalities and effects. However, symmetry modulation in 2D bioelectronics and the resultant applications have been largely overlooked. Here we devise an oxidized architectural MXene, referred as OXene, that couples orbit symmetric breaking with inver… ▽ More Symmetry lies at the heart of 2D bioelectronics, determining material properties at the fundamental level. Breaking the symmetry allows emergent functionalities and effects. However, symmetry modulation in 2D bioelectronics and the resultant applications have been largely overlooked. Here we devise an oxidized architectural MXene, referred as OXene, that couples orbit symmetric breaking with inverse symmetric breaking to entitle the optimized interfacial impedance and Schottky-induced piezoelectric effects. The resulting OXene validates applications ranging from microelectrode arrays, gait analysis, active transistor matrix, and wireless signaling transmission, which enables highly-fidelity signal transmission and reconfigurable logic gates. Further OXene interfaces are investigated in both rodent and porcine myocardium, featuring high-quality and spatiotemporally resolved physiological recordings, while accurate differentiated predictions, enabled via various machine learning pipelines. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12822 [pdf, other]

Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?

Authors: Pinzhen Chen, Simon Yu, Zhicheng Guo, Barry Haddow

Abstract: Large language models, particularly multilingual ones, are designed, claimed, and expected to cater to native speakers of varied languages. We hypothesise that the current practices of fine-tuning and evaluating these models may not perfectly align with this objective owing to a heavy reliance on translation, which can introduce translation artefacts and defects. It remains unknown whether the nat… ▽ More Large language models, particularly multilingual ones, are designed, claimed, and expected to cater to native speakers of varied languages. We hypothesise that the current practices of fine-tuning and evaluating these models may not perfectly align with this objective owing to a heavy reliance on translation, which can introduce translation artefacts and defects. It remains unknown whether the nature of the instruction data has an impact on the model output; conversely, it is questionable whether translated test sets can capture such nuances. Due to the often coupled practices of using translated data in both stages, such imperfections could have been overlooked. This work investigates these issues using controlled native or translated data during instruction tuning and evaluation stages. Experiments on eight base models and eight different benchmarks show that native or generation benchmarks reveal a notable difference between native and translated instruction data especially when model performance is high, whereas other types of test sets cannot. The comparison between round-trip and single-pass translations reflects the importance of knowledge from language-native resources. Finally, we demonstrate that regularization is beneficial to bridging this gap on structured but not generative tasks. △ Less

Submitted 11 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12335 [pdf, other]

Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters

Authors: Zhiyu Guo, Hidetaka Kamigaito, Taro Watanabe

Abstract: Scaling the context size of large language models (LLMs) enables them to perform various new tasks, e.g., book summarization. However, the memory cost of the Key and Value (KV) cache in attention significantly limits the practical applications of LLMs. Recent works have explored token pruning for KV cache reduction in LLMs, relying solely on attention scores as a token importance indicator. Howeve… ▽ More Scaling the context size of large language models (LLMs) enables them to perform various new tasks, e.g., book summarization. However, the memory cost of the Key and Value (KV) cache in attention significantly limits the practical applications of LLMs. Recent works have explored token pruning for KV cache reduction in LLMs, relying solely on attention scores as a token importance indicator. However, our investigation into value vector norms revealed a notably non-uniform pattern questioning their reliance only on attention scores. Inspired by this, we propose a new method: Value-Aware Token Pruning (VATP) which uses both attention scores and the $ \ell_{1} $ norm of value vectors to evaluate token importance. Extensive experiments on LLaMA2-7B-chat and Vicuna-v1.5-7B across 16 LongBench tasks demonstrate VATP's superior performance. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.11933 [pdf, other]

Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset

Authors: Fengxiang Wang, Hongzhen Wang, Di Wang, Zonghao Guo, Zhenyu Zhong, Long Lan, Jing Zhang, Zhiyuan Liu, Maosong Sun

Abstract: Masked Image Modeling (MIM) has emerged as a pivotal approach for developing foundational visual models in the field of remote sensing (RS). However, current RS datasets are limited in volume and diversity, which significantly constrains the capacity of MIM methods to learn generalizable representations. In this study, we introduce \textbf{RS-4M}, a large-scale dataset designed to enable highly ef… ▽ More Masked Image Modeling (MIM) has emerged as a pivotal approach for developing foundational visual models in the field of remote sensing (RS). However, current RS datasets are limited in volume and diversity, which significantly constrains the capacity of MIM methods to learn generalizable representations. In this study, we introduce \textbf{RS-4M}, a large-scale dataset designed to enable highly efficient MIM training on RS images. RS-4M comprises 4 million optical images encompassing abundant and fine-grained RS visual tasks, including object-level detection and pixel-level segmentation. Compared to natural images, RS images often contain massive redundant background pixels, which limits the training efficiency of the conventional MIM models. To address this, we propose an efficient MIM method, termed \textbf{SelectiveMAE}, which dynamically encodes and reconstructs a subset of patch tokens selected based on their semantic richness. SelectiveMAE roots in a progressive semantic token selection module, which evolves from reconstructing semantically analogical tokens to encoding complementary semantic dependencies. This approach transforms conventional MIM training into a progressive feature learning process, enabling SelectiveMAE to efficiently learn robust representations of RS images. Extensive experiments show that SelectiveMAE significantly boosts training efficiency by 2.2-2.7 times and enhances the classification, detection, and segmentation performance of the baseline MIM model.The dataset, source code, and trained models will be released. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11138 [pdf, other]

Diffusion Models in Low-Level Vision: A Survey

Authors: Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, Xiu Li

Abstract: Deep generative models have garnered significant attention in low-level vision tasks due to their generative capabilities. Among them, diffusion model-based solutions, characterized by a forward diffusion process and a reverse denoising process, have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity. This ensures the generation of visually compellin… ▽ More Deep generative models have garnered significant attention in low-level vision tasks due to their generative capabilities. Among them, diffusion model-based solutions, characterized by a forward diffusion process and a reverse denoising process, have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity. This ensures the generation of visually compelling results with intricate texture information. Despite their remarkable success, a noticeable gap exists in a comprehensive survey that amalgamates these pioneering diffusion model-based works and organizes the corresponding threads. This paper proposes the comprehensive review of diffusion model-based techniques. We present three generic diffusion modeling frameworks and explore their correlations with other deep generative models, establishing the theoretical foundation. Following this, we introduce a multi-perspective categorization of diffusion models, considering both the underlying framework and the target task. Additionally, we summarize extended diffusion models applied in other tasks, including medical, remote sensing, and video scenarios. Moreover, we provide an overview of commonly used benchmarks and evaluation metrics. We conduct a thorough evaluation, encompassing both performance and efficiency, of diffusion model-based techniques in three prominent tasks. Finally, we elucidate the limitations of current diffusion models and propose seven intriguing directions for future research. This comprehensive examination aims to facilitate a profound understanding of the landscape surrounding denoising diffusion models in the context of low-level vision tasks. A curated list of diffusion model-based techniques in over 20 low-level vision tasks can be found at https://github.com/ChunmingHe/awesome-diffusion-models-in-low-level-vision. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 20 pages, 23 figures, 4 tables

arXiv:2406.10616 [pdf, other]

doi 10.1145/3637528.3671660

HiFGL: A Hierarchical Framework for Cross-silo Cross-device Federated Graph Learning

Authors: Zhuoning Guo, Duanyi Yao, Qiang Yang, Hao Liu

Abstract: Federated Graph Learning (FGL) has emerged as a promising way to learn high-quality representations from distributed graph data with privacy preservation. Despite considerable efforts have been made for FGL under either cross-device or cross-silo paradigm, how to effectively capture graph knowledge in a more complicated cross-silo cross-device environment remains an under-explored problem. However… ▽ More Federated Graph Learning (FGL) has emerged as a promising way to learn high-quality representations from distributed graph data with privacy preservation. Despite considerable efforts have been made for FGL under either cross-device or cross-silo paradigm, how to effectively capture graph knowledge in a more complicated cross-silo cross-device environment remains an under-explored problem. However, this task is challenging because of the inherent hierarchy and heterogeneity of decentralized clients, diversified privacy constraints in different clients, and the cross-client graph integrity requirement. To this end, in this paper, we propose a Hierarchical Federated Graph Learning (HiFGL) framework for cross-silo cross-device FGL. Specifically, we devise a unified hierarchical architecture to safeguard federated GNN training on heterogeneous clients while ensuring graph integrity. Moreover, we propose a Secret Message Passing (SecMP) scheme to shield unauthorized access to subgraph-level and node-level sensitive information simultaneously. Theoretical analysis proves that HiFGL achieves multi-level privacy preservation with complexity guarantees. Extensive experiments on real-world datasets validate the superiority of the proposed framework against several baselines. Furthermore, HiFGL's versatile nature allows for its application in either solely cross-silo or cross-device settings, further broadening its utility in real-world FGL applications. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: Accepted by SIGKDD 2024

arXiv:2406.10593 [pdf, other]

QDA-SQL: Questions Enhanced Dialogue Augmentation for Multi-Turn Text-to-SQL

Authors: Yinggang Sun, Ziming Guo, Haining Yu, Chuanyi Liu, Xiang Li, Bingxuan Wang, Xiangzhan Yu, Tiancheng Zhao

Abstract: Fine-tuning large language models (LLMs) for specific domain tasks has achieved great success in Text-to-SQL tasks. However, these fine-tuned models often face challenges with multi-turn Text-to-SQL tasks caused by ambiguous or unanswerable questions. It is desired to enhance LLMs to handle multiple types of questions in multi-turn Text-to-SQL tasks. To address this, we propose a novel data augmen… ▽ More Fine-tuning large language models (LLMs) for specific domain tasks has achieved great success in Text-to-SQL tasks. However, these fine-tuned models often face challenges with multi-turn Text-to-SQL tasks caused by ambiguous or unanswerable questions. It is desired to enhance LLMs to handle multiple types of questions in multi-turn Text-to-SQL tasks. To address this, we propose a novel data augmentation method, called QDA-SQL, which generates multiple types of multi-turn Q\&A pairs by using LLMs. In QDA-SQL, we introduce a novel data augmentation method incorporating validation and correction mechanisms to handle complex multi-turn Text-to-SQL tasks. Experimental results demonstrate that QDA-SQL enables fine-tuned models to exhibit higher performance on SQL statement accuracy and enhances their ability to handle complex, unanswerable questions in multi-turn Text-to-SQL tasks. The generation script and test set are released at https://github.com/mcxiaoxiao/QDA-SQL. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 13 pages, 7 figures

arXiv:2406.10457 [pdf, other]

Noise-induced quantum synchronization and maximally entangled mixed states in superconducting circuits

Authors: Ziyu Tao, Finn Schmolke, Chang-Kang Hu, Wenhui Huang, Yuxuan Zhou, Jiawei Zhang, Ji Chu, Libo Zhang, Xuandong Sun, Zecheng Guo, Jingjing Niu, Wenle Weng, Song Liu, Youpeng Zhong, Dian Tan, Dapeng Yu, Eric Lutz

Abstract: Random fluctuations can lead to cooperative effects in complex systems. We here report the experimental observation of noise-induced quantum synchronization in a chain of superconducting transmon qubits with nearest-neighbor interactions. The application of Gaussian white noise to a single site leads to synchronous oscillations in the entire chain. We show that the two synchronized end qubits are… ▽ More Random fluctuations can lead to cooperative effects in complex systems. We here report the experimental observation of noise-induced quantum synchronization in a chain of superconducting transmon qubits with nearest-neighbor interactions. The application of Gaussian white noise to a single site leads to synchronous oscillations in the entire chain. We show that the two synchronized end qubits are entangled, with nonzero concurrence, and that they belong to a class of generalized Bell states known as maximally entangled mixed states, whose entanglement cannot be increased by any global unitary. We further demonstrate the stability against frequency detuning of both synchronization and entanglement by determining the corresponding generalized Arnold tongue diagrams. Our results highlight the constructive influence of noise in a quantum many-body system and uncover the potential role of synchronization for mixed-state quantum information science. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09842 [pdf, other]

Charmoniumlike Channels $1^{+}$ with Isospin $1$ from Lattice and Effective Field Theory

Authors: Mitja Sadl, Sara Collins, Zhi-Hui Guo, M. Padmanath, Sasa Prelovsek, Lin-Wan Yan

Abstract: Many exotic charmoniumlike mesons have already been discovered experimentally, of which the $Z_c$ mesons with isospin 1 are prominent examples. We investigate $J^{PC}=1^{+\pm}$ states with flavor $\bar cc \bar qq$ ($q=u,d$) in isospin 1 using lattice QCD. This is the first study of these mesons employing more than one volume and involving frames with nonzero total momentum. We utilize two… ▽ More Many exotic charmoniumlike mesons have already been discovered experimentally, of which the $Z_c$ mesons with isospin 1 are prominent examples. We investigate $J^{PC}=1^{+\pm}$ states with flavor $\bar cc \bar qq$ ($q=u,d$) in isospin 1 using lattice QCD. This is the first study of these mesons employing more than one volume and involving frames with nonzero total momentum. We utilize two $N_f=2+1$ CLS ensembles with $m_π\simeq 280\,$MeV. As the simulations are performed with unphysical quark masses and at a single lattice spacing of $a=0.086\,$fm, our results provide only qualitative insights. Resulting eigenenergies are compatible or just slightly shifted down with respect to noninteracting energies, where the most significant shifts occur for certain $D\bar D^*$ states. Both channels $1^{+\pm}$ have a virtual pole slightly below the threshold if $D\bar D^*$ is assumed to be decoupled from other channels. In addition, we perform a coupled channel analysis of $J/ψπ$ and $D\bar D^*$ scattering with $J^{PC}=1^{+-}$ within an effective field theory framework. The $J/ψπ$ and $D\bar D^*$ invariant-mass distributions from BESIII and finite-volume energies from several lattice QCD simulations, including this work, are fitted simultaneously. All fits yield two poles relatively close to the $D\bar D^*$ threshold and reasonably reproduce the experimental $Z_c$ peaks. They also reproduce lattice energies up to slightly above the $D\bar{D}^*$ threshold, while reproduction at even higher energies is better for fits that put more weight on the lattice data. Our findings suggest that the employed effective field theory can reasonably reconcile the peaks in the experimental line shapes and the lattice energies, although those lie close to noninteracting energies. We also study $J/ψπ$ scattering in s-wave and place upper bounds on the phase shift. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 23 pages plus appendices, 24 figures

arXiv:2406.08968 [pdf, other]

Covariate Selection for Optimizing Balance with Covariate-Adjusted Response-Adaptive Randomization

Authors: Ziqing Guo, Yang Liu, Lucy Xia

Abstract: Balancing influential covariates is crucial for valid treatment comparisons in clinical studies. While covariate-adaptive randomization is commonly used to achieve balance, its performance can be inadequate when the number of baseline covariates is large. It is therefore essential to identify the influential factors associated with the outcome and ensure balance among these critical covariates. In… ▽ More Balancing influential covariates is crucial for valid treatment comparisons in clinical studies. While covariate-adaptive randomization is commonly used to achieve balance, its performance can be inadequate when the number of baseline covariates is large. It is therefore essential to identify the influential factors associated with the outcome and ensure balance among these critical covariates. In this article, we propose a novel covariate-adjusted response-adaptive randomization that integrates the patients' responses and covariates information to select sequentially significant covariates and maintain their balance. We establish theoretically the consistency of our covariate selection method and demonstrate that the improved covariate balancing, as evidenced by a faster convergence rate of the imbalance measure, leads to higher efficiency in estimating treatment effects. Furthermore, we provide extensive numerical and empirical studies to illustrate the benefits of our proposed method across various settings. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 54 pages, 4 figures

arXiv:2406.08845 [pdf, other]

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

Authors: Tianle Zhang, Langtian Ma, Yuchen Yan, Yuchen Zhang, Kai Wang, Yue Yang, Ziyao Guo, Wenqi Shao, Yang You, Yu Qiao, Ping Luo, Kaipeng Zhang

Abstract: Recent text-to-video (T2V) technology advancements, as demonstrated by models such as Gen2, Pika, and Sora, have significantly broadened its applicability and popularity. Despite these strides, evaluating these models poses substantial challenges. Primarily, due to the limitations inherent in automatic metrics, manual evaluation is often considered a superior method for assessing T2V generation. H… ▽ More Recent text-to-video (T2V) technology advancements, as demonstrated by models such as Gen2, Pika, and Sora, have significantly broadened its applicability and popularity. Despite these strides, evaluating these models poses substantial challenges. Primarily, due to the limitations inherent in automatic metrics, manual evaluation is often considered a superior method for assessing T2V generation. However, existing manual evaluation protocols face reproducibility, reliability, and practicality issues. To address these challenges, this paper introduces the Text-to-Video Human Evaluation (T2VHE) protocol, a comprehensive and standardized protocol for T2V models. The T2VHE protocol includes well-defined metrics, thorough annotator training, and an effective dynamic evaluation module. Experimental results demonstrate that this protocol not only ensures high-quality annotations but can also reduce evaluation costs by nearly 50%. We will open-source the entire setup of the T2VHE protocol, including the complete protocol workflow, the dynamic evaluation component details, and the annotation interface code. This will help communities establish more sophisticated human assessment protocols. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08571 [pdf, other]

The verification of periodicity with the use of recurrent neural networks

Authors: Niall Miller, Philip Lucas, Yi Sun, Zhen Guo, Calum Morris, William Cooper

Abstract: The ability to automatically and robustly self-verify periodicity present in time-series astronomical data is becoming more important as data sets rapidly increase in size. The age of large astronomical surveys has rendered manual inspection of time-series data less practical. Previous efforts in generating a false alarm probability to verify the periodicity of stars have been aimed towards the an… ▽ More The ability to automatically and robustly self-verify periodicity present in time-series astronomical data is becoming more important as data sets rapidly increase in size. The age of large astronomical surveys has rendered manual inspection of time-series data less practical. Previous efforts in generating a false alarm probability to verify the periodicity of stars have been aimed towards the analysis of a constructed periodogram. However, these methods feature correlations with features that do not pertain to periodicity, such as light curve shape, slow trends and stochastic variability. The common assumption that photometric errors are Gaussian and well determined is also a limitation of analytic methods. We present a novel machine learning based technique which directly analyses the phase folded light curve for its false alarm probability. We show that the results of this method are largely insensitive to the shape of the light curve, and we establish minimum values for the number of data points and the amplitude to noise ratio. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08314 [pdf, other]

GPU-accelerated Auxiliary-field quantum Monte Carlo with multi-Slater determinant trial states

Authors: Yifei Huang, Zhen Guo, Hung Q. Pham, Dingshun Lv

Abstract: The accuracy of phaseless auxiliary-field quantum Monte Carlo (ph-AFQMC) can be systematically improved with better trial states. Using multi-Slater determinant trial states, ph-AFQMC has the potential to faithfully treat strongly correlated systems, while balancing the static and dynamical correlations on an equal footing. This preprint presents an implementation and application of graphics proce… ▽ More The accuracy of phaseless auxiliary-field quantum Monte Carlo (ph-AFQMC) can be systematically improved with better trial states. Using multi-Slater determinant trial states, ph-AFQMC has the potential to faithfully treat strongly correlated systems, while balancing the static and dynamical correlations on an equal footing. This preprint presents an implementation and application of graphics processing unit-accelerated ph-AFQMC, for multi-Slater determinant trial wavefunctions (GPU-accelerated MSD-AFQMC), to enable efficient simulation of large-scale, strongly correlated systems. This approach allows for nearly-exact computation of ground state energies in multi-reference systems. Our GPU-accelerated MSD-AFQMC is implemented in the open-source code \texttt{ipie}, a Python-based AFQMC package [\textit{J. Chem. Theory Comput.}, 2022, 19(1): 109-121]. We benchmark the performance of the GPU code on transition-metal clusters like [Cu$_2$O$_2$]$^{2+}$ and [Fe$_2$S$_2$(SCH$_3$)]$^{2-}$. The GPU code achieves at least sixfold speedup in both cases, comparing the timings of a single A100 GPU to that of a 32-CPU node. For [Fe$_2$S$_2$(SCH$_3$)]$^{2-}$, we demonstrate that our GPU MSD-AFQMC can recover the dynamical correlation necessary for chemical accuracy with an MSD trial, despite the large number of determinants required ($>10^5$). Our work significantly enhances the efficiency of MSD-AFQMC calculations for large, strongly correlated molecules by utilizing GPUs, offering a promising path for exploring the electronic structure of transition metal complexes. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 6 pages, 2 figures

arXiv:2406.07806 [pdf, other]

Probing the Shock Breakout Signal of SN 2024ggi from the Transformation of Early Flash Spectroscopy

Authors: Jujia Zhang, Luc Dessart, Xiaofeng Wang, Qian Zhai, Yi Yang, Liping Li, Han Lin, Giorgio Valerin, Yongzhi Cai, Zhen Guo, Lingzhi Wang, Zeyi Zhao, Zhenyu Wang, Shengyu Yan

Abstract: We present early-time, hour-to-day cadence spectroscopy of the nearby type II supernova (SN II) 2024ggi, which was discovered at a phase when the SN shock just emerged from the red-supergiant (RSG) progenitor star. Over the first few days after the first light, SN 2024ggi exhibited prominent narrow emission lines formed through intense and persistent photoionization of the nearby circumstellar mat… ▽ More We present early-time, hour-to-day cadence spectroscopy of the nearby type II supernova (SN II) 2024ggi, which was discovered at a phase when the SN shock just emerged from the red-supergiant (RSG) progenitor star. Over the first few days after the first light, SN 2024ggi exhibited prominent narrow emission lines formed through intense and persistent photoionization of the nearby circumstellar material (CSM). In the first 63 hours, spectral lines of He, C, N, and O revealed a rapid rise in ionization, as a result of the progressive sweeping-up of the CSM by the shock. The duration of the IIn-like spectra indicates a dense and relatively confined CSM distribution extending up to $\sim 4 \times 10^{14}$ cm. Spectral modeling reveals a CSM mass loss rate at this region exceeding $5 \times\, 10^{-3}${\rm M}_{\odot} yr$^{-1}$ is required to reproduce low-ionization emissions, which dramatically exceeds that of an RSG. Analyzing H$α$ emission shift implies the velocity of the unshocked outer CSM to be between 20 and 40 km s$^{-1}$, matching the typical wind velocity of an RSG. The differences between the inner and outer layers of the CSM and an RSG progenitor highlight a complex mass loss history before the explosion of SN 2024ggi. △ Less

Submitted 29 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: 10 pages and 5 figures in the main text (16 pages and 9 figures in total). Accepted for publication in ApJL

arXiv:2406.06852 [pdf, other]

A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures

Authors: Shuai Zhao, Meihuizi Jia, Zhongliang Guo, Leilei Gan, Jie Fu, Yichao Feng, Fengjun Pan, Luu Anh Tuan

Abstract: The large language models (LLMs), which bridge the gap between human language understanding and complex problem-solving, achieve state-of-the-art performance on several NLP tasks, particularly in few-shot and zero-shot settings. Despite the demonstrable efficacy of LMMs, due to constraints on computational resources, users have to engage with open-source language models or outsource the entire tra… ▽ More The large language models (LLMs), which bridge the gap between human language understanding and complex problem-solving, achieve state-of-the-art performance on several NLP tasks, particularly in few-shot and zero-shot settings. Despite the demonstrable efficacy of LMMs, due to constraints on computational resources, users have to engage with open-source language models or outsource the entire training process to third-party platforms. However, research has demonstrated that language models are susceptible to potential security vulnerabilities, particularly in backdoor attacks. Backdoor attacks are designed to introduce targeted vulnerabilities into language models by poisoning training samples or model weights, allowing attackers to manipulate model responses through malicious triggers. While existing surveys on backdoor attacks provide a comprehensive overview, they lack an in-depth examination of backdoor attacks specifically targeting LLMs. To bridge this gap and grasp the latest trends in the field, this paper presents a novel perspective on backdoor attacks for LLMs by focusing on fine-tuning methods. Specifically, we systematically classify backdoor attacks into three categories: full-parameter fine-tuning, parameter-efficient fine-tuning, and attacks without fine-tuning. Based on insights from a substantial review, we also discuss crucial issues for future research on backdoor attacks, such as further exploring attack algorithms that do not require fine-tuning, or developing more covert attack algorithms. △ Less

Submitted 13 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06777 [pdf, other]

MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

Authors: Khiem Le, Zhichun Guo, Kaiwen Dong, Xiaobao Huang, Bozhao Nan, Roshni Iyer, Xiangliang Zhang, Olaf Wiest, Wei Wang, Nitesh V. Chawla

Abstract: Recently, Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields, moving beyond natural language understanding. However, their proficiency within the chemistry domain remains restricted, especially in solving professional molecule-related tasks. This challenge is attributed to their inherent limitations in comprehend… ▽ More Recently, Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields, moving beyond natural language understanding. However, their proficiency within the chemistry domain remains restricted, especially in solving professional molecule-related tasks. This challenge is attributed to their inherent limitations in comprehending molecules using only common textual representations, i.e., SMILES strings. In this study, we seek to enhance the ability of LLMs to comprehend molecules by designing and equipping them with a multi-modal external module, namely MolX. In particular, instead of directly using a SMILES string to represent a molecule, we utilize specific encoders to extract fine-grained features from both SMILES string and 2D molecular graph representations for feeding into an LLM. Moreover, a human-defined molecular fingerprint is incorporated to leverage its embedded domain knowledge. Then, to establish an alignment between MolX and the LLM's textual input space, the whole model in which the LLM is frozen, is pre-trained with a versatile strategy including a diverse set of tasks. Extensive experimental evaluations demonstrate that our proposed method only introduces a small number of trainable parameters while outperforming baselines on various downstream molecule-related tasks ranging from molecule-to-text translation to retrosynthesis, with and without fine-tuning the LLM. △ Less

Submitted 27 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06478 [pdf]

High-precision surgical navigation using speckle structured light-based thoracoabdominal puncture robot

Authors: Zezhao Guo, Yanzhong Guo, Zhanfang Zhao

Abstract: Abstract Background During percutaneous puncture robotic surgical navigation, the needle insertion point is positioned on the patient's chest and abdomen body surface. By locating any point on the soft skin tissue, it is difficult to apply the traditional reflective ball tracking method. The patient's chest and abdomen body surface has fluctuations in breathing and appears irregular. The chest a… ▽ More Abstract Background During percutaneous puncture robotic surgical navigation, the needle insertion point is positioned on the patient's chest and abdomen body surface. By locating any point on the soft skin tissue, it is difficult to apply the traditional reflective ball tracking method. The patient's chest and abdomen body surface has fluctuations in breathing and appears irregular. The chest and abdomen are regular and smooth, lacking obvious features, and it is challenging to locate the needle insertion point on the body surface. Methods This paper designs and experiments a method that is different from previous reflective ball optical markers or magnetic positioning surgical navigation and tracking methods. It is based on a speckle structured light camera to identify the patient's body surface and fit it into a hollow ring with a diameter of 24mm. Results Experimental results show that this method of the system can be small, flexible, and high-precision positioning of any body surface point at multiple angles, achieving a positioning accuracy of 0.033-0.563mm and an image of 7-30 frames/s. Conclusions The positioning recognition ring material used in this method can be well imaged under CT, so the optical positioning of the body surface and the in vivo imaging positioning under CT can be combined to form a unified patient's internal and external positioning world coordinates to achieve internal and external registration. Positioning integration. The system senses motion with six degrees of freedom, up and down, front and back, left and right, and all rotations, with sub-millimeter accuracy, and has broad application prospects in future puncture surgeries. △ Less

Submitted 6 May, 2024; originally announced June 2024.

Comments: 17pages,7figures

arXiv:2406.05535 [pdf, other]

Perturbation Towards Easy Samples Improves Targeted Adversarial Transferability

Authors: Junqi Gao, Biqing Qi, Yao Li, Zhichang Guo, Dong Li, Yuming Xing, Dazhi Zhang

Abstract: The transferability of adversarial perturbations provides an effective shortcut for black-box attacks. Targeted perturbations have greater practicality but are more difficult to transfer between models. In this paper, we experimentally and theoretically demonstrated that neural networks trained on the same dataset have more consistent performance in High-Sample-Density-Regions (HSDR) of each class… ▽ More The transferability of adversarial perturbations provides an effective shortcut for black-box attacks. Targeted perturbations have greater practicality but are more difficult to transfer between models. In this paper, we experimentally and theoretically demonstrated that neural networks trained on the same dataset have more consistent performance in High-Sample-Density-Regions (HSDR) of each class instead of low sample density regions. Therefore, in the target setting, adding perturbations towards HSDR of the target class is more effective in improving transferability. However, density estimation is challenging in high-dimensional scenarios. Further theoretical and experimental verification demonstrates that easy samples with low loss are more likely to be located in HSDR. Perturbations towards such easy samples in the target class can avoid density estimation for HSDR location. Based on the above facts, we verified that adding perturbations to easy samples in the target class improves targeted adversarial transferability of existing attack methods. A generative targeted attack strategy named Easy Sample Matching Attack (ESMA) is proposed, which has a higher success rate for targeted attacks and outperforms the SOTA generative method. Moreover, ESMA requires only 5% of the storage space and much less computation time comparing to the current SOTA, as ESMA attacks all classes with only one model instead of seperate models for each class. Our code is available at https://github.com/gjq100/ESMA. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Journal ref: Advances in Neural Information Processing Systems 36, 2023

arXiv:2406.03702 [pdf, other]

DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation

Authors: Zilu Guo, Liuyang Bian, Xuan Huang, Hu Wei, Jingyu Li, Huasheng Ni

Abstract: Atrous convolutions are employed as a method to increase the receptive field in semantic segmentation tasks. However, in previous works of semantic segmentation, it was rarely employed in the shallow layers of the model. We revisit the design of atrous convolutions in modern convolutional neural networks (CNNs), and demonstrate that the concept of using large kernels to apply atrous convolutions c… ▽ More Atrous convolutions are employed as a method to increase the receptive field in semantic segmentation tasks. However, in previous works of semantic segmentation, it was rarely employed in the shallow layers of the model. We revisit the design of atrous convolutions in modern convolutional neural networks (CNNs), and demonstrate that the concept of using large kernels to apply atrous convolutions could be a more powerful paradigm. We propose three guidelines to apply atrous convolutions more efficiently. Following these guidelines, we propose DSNet, a Dual-Branch CNN architecture, which incorporates atrous convolutions in the shallow layers of the model architecture, as well as pretraining the nearly entire encoder on ImageNet to achieve better performance. To demonstrate the effectiveness of our approach, our models achieve a new state-of-the-art trade-off between accuracy and speed on ADE20K, Cityscapes and BDD datasets. Specifically, DSNet achieves 40.0% mIOU with inference speed of 179.2 FPS on ADE20K, and 80.4% mIOU with speed of 81.9 FPS on Cityscapes. Source code and models are available at Github: https://github.com/takaniwa/DSNet. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.03222 [pdf, other]

Solving Sharp Bounded-error Quantum Polynomial Time Problem by Evolution methods

Authors: Zhen Guo, Li You

Abstract: Counting ground state degeneracy of a $k$-local Hamiltonian is important in many fields of physics. Its complexity belongs to the problem of sharp bounded-error quantum polynomial time (#BQP) class and few methods are known for its solution. Finding ground states of a $k$-local Hamiltonian, on the other hand, is an easier problem of Quantum Merlin Arthur (QMA) class, for which many efficient metho… ▽ More Counting ground state degeneracy of a $k$-local Hamiltonian is important in many fields of physics. Its complexity belongs to the problem of sharp bounded-error quantum polynomial time (#BQP) class and few methods are known for its solution. Finding ground states of a $k$-local Hamiltonian, on the other hand, is an easier problem of Quantum Merlin Arthur (QMA) class, for which many efficient methods exist. In this work, we propose an algorithm of mapping a #BQP problem into one of finding a special ground state of a $k$-local Hamiltonian. We prove that all traditional methods, which solve the QMA problem by evolution under a function of a Hamiltonian, can be used to find the special ground state from a well-designed initial state, thus can solve the #BQP problem. We combine our algorithm with power method, Lanczos method, and quantum imaginary time evolution method for different systems to illustrate the detection of phase boundaries, competition between frustration and quantum fluctuation, and potential implementations with quantum circuits. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 11 pages, 7 figures

Showing 1–50 of 1,444 results for author: Guo, Z