subscribe to arXiv mailings

Supernova Pointing Capabilities of DUNE

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, D. A. Andrade , et al. (1340 additional authors not shown)

Abstract: The determination of the direction of a stellar core collapse via its neutrino emission is crucial for the identification of the progenitor for a multimessenger follow-up. A highly effective method of reconstructing supernova directions within the Deep Underground Neutrino Experiment (DUNE) is introduced. The supernova neutrino pointing resolution is studied by simulating and reconstructing electr… ▽ More The determination of the direction of a stellar core collapse via its neutrino emission is crucial for the identification of the progenitor for a multimessenger follow-up. A highly effective method of reconstructing supernova directions within the Deep Underground Neutrino Experiment (DUNE) is introduced. The supernova neutrino pointing resolution is studied by simulating and reconstructing electron-neutrino charged-current absorption on $^{40}$Ar and elastic scattering of neutrinos on electrons. Procedures to reconstruct individual interactions, including a newly developed technique called ``brems flipping'', as well as the burst direction from an ensemble of interactions are described. Performance of the burst direction reconstruction is evaluated for supernovae happening at a distance of 10 kpc for a specific supernova burst flux model. The pointing resolution is found to be 3.4 degrees at 68% coverage for a perfect interaction-channel classification and a fiducial mass of 40 kton, and 6.6 degrees for a 10 kton fiducial mass respectively. Assuming a 4% rate of charged-current interactions being misidentified as elastic scattering, DUNE's burst pointing resolution is found to be 4.3 degrees (8.7 degrees) at 68% coverage. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 25 pages, 16 figures

Report number: FERMILAB-PUB-24-0319-LBNF

arXiv:2407.08150 [pdf, other]

Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding

Authors: Minghui Wu, Chenxu Zhao, Anyang Su, Donglin Di, Tianyu Fu, Da An, Min He, Ya Gao, Meng Ma, Kun Yan, Ping Wang

Abstract: Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders. There is currently a lack of research in this area, and most existing benchmarks suffer from several drawbacks: 1) a limited number of modalities and answers with restrictive length; 2) the content and scenarios within… ▽ More Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders. There is currently a lack of research in this area, and most existing benchmarks suffer from several drawbacks: 1) a limited number of modalities and answers with restrictive length; 2) the content and scenarios within the videos are excessively monotonous, transmitting allegories and emotions that are overly simplistic. To bridge the gap to real-world applications, we introduce a large-scale Subjective Response Indicators for Advertisement Videos dataset, namely SRI-ADV. Specifically, we collected real changes in Electroencephalographic (EEG) and eye-tracking regions from different demographics while they viewed identical video content. Utilizing this multi-modal dataset, we developed tasks and protocols to analyze and evaluate the extent of cognitive understanding of video content among different users. Along with the dataset, we designed a Hypergraph Multi-modal Large Language Model (HMLLM) to explore the associations among different demographics, video elements, EEG, and eye-tracking indicators. HMLLM could bridge semantic gaps across rich modalities and integrate information beyond different modalities to perform logical reasoning. Extensive experimental evaluations on SRI-ADV and other additional video-based generative performance benchmarks demonstrate the effectiveness of our method. The codes and dataset will be released at https://github.com/suay1113/HMLLM. △ Less

Submitted 16 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted by ACM MULTIMEDIA 2024

arXiv:2407.07053 [pdf, other]

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Authors: Wenqi Zhang, Zhenglin Cheng, Yuanyu He, Mengna Wang, Yongliang Shen, Zeqi Tan, Guiyang Hou, Mingqian He, Yanna Ma, Weiming Lu, Yueting Zhuang

Abstract: Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite rudimentary. They often struggle with simple daily tasks, such as reading time from a clock, understanding a flowchart, or planning a route using a road map. In lig… ▽ More Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite rudimentary. They often struggle with simple daily tasks, such as reading time from a clock, understanding a flowchart, or planning a route using a road map. In light of this, we design a multi-modal self-instruct, utilizing large language models and their code capabilities to synthesize massive abstract images and visual reasoning instructions across daily scenarios. Our strategy effortlessly creates a multimodal benchmark with 11,193 instructions for eight visual scenarios: charts, tables, simulated maps, dashboards, flowcharts, relation graphs, floor plans, and visual puzzles. \textbf{This benchmark, constructed with simple lines and geometric elements, exposes the shortcomings of most advanced LMMs} like Claude-3.5-Sonnet and GPT-4o in abstract image understanding, spatial relations reasoning, and visual element induction. Besides, to verify the quality of our synthetic data, we fine-tune an LMM using 62,476 synthetic chart, table and road map instructions. The results demonstrate improved chart understanding and map navigation performance, and also demonstrate potential benefits for other visual reasoning tasks. Our code is available at: \url{https://github.com/zwq2018/Multi-modal-Self-instruct}. △ Less

Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: code: https://github.com/zwq2018/Multi-modal-Self-instruct dataset: https://huggingface.co/datasets/zwq2018/Multi-modal-Self-instruct Leaderboard: https://multi-modal-self-instruct.github.io/

arXiv:2407.05234 [pdf, ps, other]

Statistical Production of $B_c$ Mesons in Heavy-Ion Collisions at the LHC Energy

Authors: Shouxing Zhao, Min He

Abstract: The recombination production of $B_c$ mesons in heavy-ion collisions at the LHC energy is facilitated by the abundant and highly thermalized charm ($c$) quarks transported in the deconfined medium created. We study the production of $B_c$ mesons via $c$ and bottom ($b$) quark recombination in a statistical fashion by placing $B_c$ in the position of a member of the family of open $b$ hadrons, whic… ▽ More The recombination production of $B_c$ mesons in heavy-ion collisions at the LHC energy is facilitated by the abundant and highly thermalized charm ($c$) quarks transported in the deconfined medium created. We study the production of $B_c$ mesons via $c$ and bottom ($b$) quark recombination in a statistical fashion by placing $B_c$ in the position of a member of the family of open $b$ hadrons, which allows us to make quantitative predictions for the modifications of the production fraction ($f_c$) of $B_c$ mesons and its relative production to $B$ mesons in $\sqrt{s_{\rm NN}}=5.02$ TeV Pb-Pb collisions with respect to proton-proton ($pp$) collisions at the same energy. The statistical production yield of $B_c$ mesons is converted into the transverse momentum ($p_T$) distribution with the shape computed from resonance recombination using the $c$- and $b$-quark phase space distributions that have been simulated via Langevin diffusion and constrained by open $c$- and $b$-hadron observables. Supplemented with the component fragmented from $b$-quark spectrum that dominates at high $p_T$, the total $p_T$ spectrum of $B_c$ mesons is obtained and converted into the $p_T$ dependent nuclear modification factor ($R_{\rm AA}$). Both $f_c$ and the integrated $R_{\rm AA}$ exhibit a $\sim5$-fold enhancement in central Pb-Pb collisions relative to the $pp$ reference. Comparison with data measured by the CMS experiment shows decent agreement within theoretical and experimental uncertainties. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 10 pages, 4 figures

arXiv:2407.03913 [pdf, other]

MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile Devices

Authors: Jiayi Zhang, Chuang Zhao, Yihan Zhao, Zhaoyang Yu, Ming He, Jianping Fan

Abstract: The attainment of autonomous operations in mobile computing devices has consistently been a goal of human pursuit. With the development of Large Language Models (LLMs) and Visual Language Models (VLMs), this aspiration is progressively turning into reality. While contemporary research has explored automation of simple tasks on mobile devices via VLMs, there remains significant room for improvement… ▽ More The attainment of autonomous operations in mobile computing devices has consistently been a goal of human pursuit. With the development of Large Language Models (LLMs) and Visual Language Models (VLMs), this aspiration is progressively turning into reality. While contemporary research has explored automation of simple tasks on mobile devices via VLMs, there remains significant room for improvement in handling complex tasks and reducing high reasoning costs. In this paper, we introduce MobileExperts, which for the first time introduces tool formulation and multi-agent collaboration to address the aforementioned challenges. More specifically, MobileExperts dynamically assembles teams based on the alignment of agent portraits with the human requirements. Following this, each agent embarks on an independent exploration phase, formulating its tools to evolve into an expert. Lastly, we develop a dual-layer planning mechanism to establish coordinate collaboration among experts. To validate our effectiveness, we design a new benchmark of hierarchical intelligence levels, offering insights into algorithm's capability to address tasks across a spectrum of complexity. Experimental results demonstrate that MobileExperts performs better on all intelligence levels and achieves ~ 22% reduction in reasoning costs, thus verifying the superiority of our design. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.01903 [pdf, other]

Text-Aware Diffusion for Policy Learning

Authors: Calvin Luo, Mandy He, Zilai Zeng, Chen Sun

Abstract: Training an agent to achieve particular goals or perform desired behaviors is often accomplished through reinforcement learning, especially in the absence of expert demonstrations. However, supporting novel goals or behaviors through reinforcement learning requires the ad-hoc design of appropriate reward functions, which quickly becomes intractable. To address this challenge, we propose Text-Aware… ▽ More Training an agent to achieve particular goals or perform desired behaviors is often accomplished through reinforcement learning, especially in the absence of expert demonstrations. However, supporting novel goals or behaviors through reinforcement learning requires the ad-hoc design of appropriate reward functions, which quickly becomes intractable. To address this challenge, we propose Text-Aware Diffusion for Policy Learning (TADPoLe), which uses a pretrained, frozen text-conditioned diffusion model to compute dense zero-shot reward signals for text-aligned policy learning. We hypothesize that large-scale pretrained generative models encode rich priors that can supervise a policy to behave not only in a text-aligned manner, but also in alignment with a notion of naturalness summarized from internet-scale training data. In our experiments, we demonstrate that TADPoLe is able to learn policies for novel goal-achievement and continuous locomotion behaviors specified by natural language, in both Humanoid and Dog environments. The behaviors are learned zero-shot without ground-truth rewards or expert demonstrations, and are qualitatively more natural according to human evaluation. We further show that TADPoLe performs competitively when applied to robotic manipulation tasks in the Meta-World environment. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00390 [pdf, other]

Advancing Process Verification for Large Language Models via Tree-Based Preference Learning

Authors: Mingqian He, Yongliang Shen, Wenqi Zhang, Zeqi Tan, Weiming Lu

Abstract: Large Language Models (LLMs) have demonstrated remarkable potential in handling complex reasoning tasks by generating step-by-step rationales.Some methods have proven effective in boosting accuracy by introducing extra verifiers to assess these paths. However, existing verifiers, typically trained on binary-labeled reasoning paths, fail to fully utilize the relative merits of intermediate steps, t… ▽ More Large Language Models (LLMs) have demonstrated remarkable potential in handling complex reasoning tasks by generating step-by-step rationales.Some methods have proven effective in boosting accuracy by introducing extra verifiers to assess these paths. However, existing verifiers, typically trained on binary-labeled reasoning paths, fail to fully utilize the relative merits of intermediate steps, thereby limiting the effectiveness of the feedback provided. To overcome this limitation, we propose Tree-based Preference Learning Verifier (Tree-PLV), a novel approach that constructs reasoning trees via a best-first search algorithm and collects step-level paired data for preference training. Compared to traditional binary classification, step-level preferences more finely capture the nuances between reasoning steps, allowing for a more precise evaluation of the complete reasoning path. We empirically evaluate Tree-PLV across a range of arithmetic and commonsense reasoning tasks, where it significantly outperforms existing benchmarks. For instance, Tree-PLV achieved substantial performance gains over the Mistral-7B self-consistency baseline on GSM8K (67.55% to 82.79%), MATH (17.00% to 26.80%), CSQA (68.14% to 72.97%), and StrategyQA (82.86% to 83.25%).Additionally, our study explores the appropriate granularity for applying preference learning, revealing that step-level guidance provides feedback that better aligns with the evaluation of the reasoning process. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.17555 [pdf, ps, other]

A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al

Authors: Ji Yan, Jiwei Li, X. T. He, Lifeng Wang, Yaohua Chen, Feng Wang, Xiaoying Han, Kaiqiang Pan, Juxi Liang, Yulong Li, Zanyang Guan, Xiangming Liu, Xingsen Che, Zhongjing Chen, Xing Zhang, Yan Xu, Bin Li, Minging He, Hongbo Cai, Liang. Hao, Zhanjun Liu, Chunyang Zheng, Zhensheng Dai, Zhengfeng Fan, Bin Qiao , et al. (4 additional authors not shown)

Abstract: A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17475 [pdf, other]

doi 10.1145/3637528.3671786

Performative Debias with Fair-exposure Optimization Driven by Strategic Agents in Recommender Systems

Authors: Zhichen Xiang, Hongke Zhao, Chuang Zhao, Ming He, Jianping Fan

Abstract: Data bias, e.g., popularity impairs the dynamics of two-sided markets within recommender systems. This overshadows the less visible but potentially intriguing long-tail items that could capture user interest. Despite the abundance of research surrounding this issue, it still poses challenges and remains a hot topic in academic circles. Along this line, in this paper, we developed a re-ranking appr… ▽ More Data bias, e.g., popularity impairs the dynamics of two-sided markets within recommender systems. This overshadows the less visible but potentially intriguing long-tail items that could capture user interest. Despite the abundance of research surrounding this issue, it still poses challenges and remains a hot topic in academic circles. Along this line, in this paper, we developed a re-ranking approach in dynamic settings with fair-exposure optimization driven by strategic agents. Designed for the producer side, the execution of agents assumes content creators can modify item features based on strategic incentives to maximize their exposure. This iterative process entails an end-to-end optimization, employing differentiable ranking operators that simultaneously target accuracy and fairness. Joint objectives ensure the performance of recommendations while enhancing the visibility of tail items. We also leveraged the performativity nature of predictions to illustrate how strategic learning influences content creators to shift towards fairness efficiently, thereby incentivizing features of tail items. Through comprehensive experiments on both public and industrial datasets, we have substantiated the effectiveness and dominance of the proposed method especially on unveiling the potential of tail items. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: SIGKDD 2024 accepted paper

arXiv:2406.16494 [pdf, other]

Cross-domain Transfer of Valence Preferences via a Meta-optimization Approach

Authors: Chuang Zhao, Hongke Zhao, Ming He, Xiaomeng Li, Jianping Fan

Abstract: Cross-domain recommendation offers a potential avenue for alleviating data sparsity and cold-start problems. Embedding and mapping, as a classic cross-domain research genre, aims to identify a common mapping function to perform representation transformation between two domains. Nevertheless, previous coarse-grained preference representations, non-personalized mapping functions, and excessive relia… ▽ More Cross-domain recommendation offers a potential avenue for alleviating data sparsity and cold-start problems. Embedding and mapping, as a classic cross-domain research genre, aims to identify a common mapping function to perform representation transformation between two domains. Nevertheless, previous coarse-grained preference representations, non-personalized mapping functions, and excessive reliance on overlapping users limit their performance, especially in scenarios where overlapping users are sparse. To address aforementioned challenges, we propose a novel cross-domain approach, namely CVPM. CVPM formalizes cross-domain interest transfer as a hybrid architecture of parametric meta-learning and self-supervised learning, which not only transfers user preferences at a finer level, but also enables signal enhancement with the knowledge of non-overlapping users. Specifically, with deep insights into user preferences and valence preference theory, we believe that there exists significant difference between users' positive preferences and negative behaviors, and thus employ differentiated encoders to learn their distributions. In particular, we further utilize the pre-trained model and item popularity to sample pseudo-interaction items to ensure the integrity of both distributions. To guarantee the personalization of preference transfer, we treat each user's mapping as two parts, the common transformation and the personalized bias, where the network used to generate the personalized bias is output by a meta-learner. Furthermore, in addition to the supervised loss for overlapping users, we design contrastive tasks for non-overlapping users from both group and individual-levels to avoid model skew and enhance the semantics of representations. Exhaustive data analysis and extensive experimental results demonstrate the effectiveness and advancement of our proposed framework. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16251 [pdf, other]

Probing critical spin fluctuations with a composite magnetoelectric method: A case study on a Kitaev spin liquid candidate Na$_3$Co$_2$SbO$_6$

Authors: Xinrun Mi, Xintong Li, Long Zhang, Aifeng Wang, Yuan Li, Yisheng Chai, Mingquan He

Abstract: In correlated quantum materials, divergent critical fluctuations near the quantum critical point are often closely associated with exotic quantum phases of matter, such as unconventional superconductivity and quantum spin liquids. Here we present a simple yet highly sensitive composite magnetoelectric (ME) method for detecting the critical spin fluctuations in quantum magnets. The ME signal is pro… ▽ More In correlated quantum materials, divergent critical fluctuations near the quantum critical point are often closely associated with exotic quantum phases of matter, such as unconventional superconductivity and quantum spin liquids. Here we present a simple yet highly sensitive composite magnetoelectric (ME) method for detecting the critical spin fluctuations in quantum magnets. The ME signal is proportional the magnetostriction coefficient, which directly probes the product of magnetization and spin-spin correlation. As a demonstration, the composite ME method is applied to a Kitaev quantum spin liquid candidate Na$_3$Co$_2$SbO$_6$, which shows signs of magnetic field-induced quantum criticality. Notably, the ME signal prominently diverges at the magnetic field-induced tricritical points, particularly at a tricritical point that lies in close proximity to a zero-temperature quantum critical point. A crucial aspect of these tricritical points is their tunability through the modification of the in-plane magnetic field's direction. The direction of magnetic field can thus serve as a handful yet important tuning parameter, alongside pressure and chemical doping, for searching quantum critical points in quantum magnets with pronounced magnetic anisotropy. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 6 pages, 4 figures

arXiv:2406.15504 [pdf, other]

Dr.E Bridges Graphs with Large Language Models through Words

Authors: Zipeng Liu, Likang Wu, Ming He, Zhong Guan, Hongke Zhao, Nan Feng

Abstract: Significant efforts have been directed toward integrating powerful Large Language Models (LLMs) with diverse modalities, particularly focusing on the fusion of vision, language, and audio data. However, the graph-structured data, inherently rich in structural and domain-specific knowledge, have not yet been gracefully adapted to LLMs. Existing methods either describe the graph with raw text, suffe… ▽ More Significant efforts have been directed toward integrating powerful Large Language Models (LLMs) with diverse modalities, particularly focusing on the fusion of vision, language, and audio data. However, the graph-structured data, inherently rich in structural and domain-specific knowledge, have not yet been gracefully adapted to LLMs. Existing methods either describe the graph with raw text, suffering the loss of graph structural information, or feed Graph Neural Network (GNN) embeddings directly into LLM at the cost of losing semantic representation. To bridge this gap, we introduce an innovative, end-to-end modality-aligning framework, equipped with a pretrained Dual-Residual Vector Quantized-Variational AutoEncoder (Dr.E). This framework is specifically designed to facilitate token-level alignment with LLMs, enabling an effective translation of the intrinsic `language' of graphs into comprehensible natural language. Our experimental evaluations on standard GNN node classification tasks demonstrate competitive performance against other state-of-the-art approaches. Additionally, our framework ensures interpretability, efficiency, and robustness, with its effectiveness further validated under both fine-tuning and few-shot settings. This study marks the first successful endeavor to achieve token-level alignment between GNNs and LLMs. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13250 [pdf, other]

LangTopo: Aligning Language Descriptions of Graphs with Tokenized Topological Modeling

Authors: Zhong Guan, Hongke Zhao, Likang Wu, Ming He, Jianpin Fan

Abstract: Recently, large language models (LLMs) have been widely researched in the field of graph machine learning due to their outstanding abilities in language comprehension and learning. However, the significant gap between natural language tasks and topological structure modeling poses a nonnegligible challenge. Specifically, since natural language descriptions are not sufficient for LLMs to understand… ▽ More Recently, large language models (LLMs) have been widely researched in the field of graph machine learning due to their outstanding abilities in language comprehension and learning. However, the significant gap between natural language tasks and topological structure modeling poses a nonnegligible challenge. Specifically, since natural language descriptions are not sufficient for LLMs to understand and process graph-structured data, fine-tuned LLMs perform even worse than some traditional GNN models on graph tasks, lacking inherent modeling capabilities for graph structures. Existing research overly emphasizes LLMs' understanding of semantic information captured by external models, while inadequately exploring graph topological structure modeling, thereby overlooking the genuine capabilities that LLMs lack. Consequently, in this paper, we introduce a new framework, LangTopo, which aligns graph structure modeling with natural language understanding at the token level. LangTopo quantifies the graph structure modeling capabilities of GNNs and LLMs by constructing a codebook for the graph modality and performs consistency maximization. This process aligns the text description of LLM with the topological modeling of GNN, allowing LLM to learn the ability of GNN to capture graph structures, enabling LLM to handle graph-structured data independently. We demonstrate the effectiveness of our proposed method on multiple datasets. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13235 [pdf, other]

Enhancing Collaborative Semantics of Language Model-Driven Recommendations via Graph-Aware Learning

Authors: Zhong Guan, Likang Wu, Hongke Zhao, Ming He, Jianpin Fan

Abstract: Large Language Models (LLMs) are increasingly prominent in the recommendation systems domain. Existing studies usually utilize in-context learning or supervised fine-tuning on task-specific data to align LLMs into recommendations. However, the substantial bias in semantic spaces between language processing tasks and recommendation tasks poses a nonnegligible challenge. Specifically, without the ad… ▽ More Large Language Models (LLMs) are increasingly prominent in the recommendation systems domain. Existing studies usually utilize in-context learning or supervised fine-tuning on task-specific data to align LLMs into recommendations. However, the substantial bias in semantic spaces between language processing tasks and recommendation tasks poses a nonnegligible challenge. Specifically, without the adequate capturing ability of collaborative information, existing modeling paradigms struggle to capture behavior patterns within community groups, leading to LLMs' ineffectiveness in discerning implicit interaction semantic in recommendation scenarios. To address this, we consider enhancing the learning capability of language model-driven recommendation models for structured data, specifically by utilizing interaction graphs rich in collaborative semantics. We propose a Graph-Aware Learning for Language Model-Driven Recommendations (GAL-Rec). GAL-Rec enhances the understanding of user-item collaborative semantics by imitating the intent of Graph Neural Networks (GNNs) to aggregate multi-hop information, thereby fully exploiting the substantial learning capacity of LLMs to independently address the complex graphs in the recommendation system. Sufficient experimental results on three real-world datasets demonstrate that GAL-Rec significantly enhances the comprehension of collaborative semantics, and improves recommendation performance. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 10pages

arXiv:2406.10988 [pdf, other]

Quantum coupon collector with mixed-state encoding

Authors: Jing-Peng Zhang, Min-Quan He, Dan-Bo Zhang

Abstract: The coupon collector is a prototypical model for evaluating the number of samples for identifying a set. By superposing all elements in the set as a pure quantum state, a quantum version of the coupon collector aims to learn the state, which is shown to reduce the sample complexity. Here we propose a quantum coupon collector by encoding the set into a mixed state, where the information of missing… ▽ More The coupon collector is a prototypical model for evaluating the number of samples for identifying a set. By superposing all elements in the set as a pure quantum state, a quantum version of the coupon collector aims to learn the state, which is shown to reduce the sample complexity. Here we propose a quantum coupon collector by encoding the set into a mixed state, where the information of missing elements are labelled with Pauli strings. Remarkably, the encoded mixed state has no quantum entangled state and is easy to prepare. With such mixed-state encoding, it can be efficient to learn the set by performing Bell measurements on two copies and then extracting the missing element by solving a series of equations obtained from the measurements. Our protocol further reduces the sample complexity from $O(n)$ in the case of pure-state encoding to $O(\log n)$ when the missing element is one, where $n$ is the number of elements in the set. The mixed-state encoding scheme provides a new avenue for quantum learning and enlarges the realm for exploring quantum advantages. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10638 [pdf, other]

Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions

Authors: Yexin Liu, Zhengyang Liang, Yueze Wang, Muyang He, Jian Li, Bo Zhao

Abstract: Multimodal Large Language Models (MLLMs) have exhibited impressive capabilities in visual understanding and reasoning, providing sightly reasonable answers, such as image descriptions. This has spurred extensive research on the evaluation of MLLMs. Most evaluation benchmarks assume that incorrect answers indicate a lack of understanding of the visual content. However, our findings reveal that, in… ▽ More Multimodal Large Language Models (MLLMs) have exhibited impressive capabilities in visual understanding and reasoning, providing sightly reasonable answers, such as image descriptions. This has spurred extensive research on the evaluation of MLLMs. Most evaluation benchmarks assume that incorrect answers indicate a lack of understanding of the visual content. However, our findings reveal that, in many cases, MLLMs answer questions incorrectly despite correctly understanding the visual content. This suggests that incorrect answers do not necessarily imply a lack of comprehension but may instead result from lacking robustness to leading questions. To comprehensively measure MLLMs' understanding capability and robustness to leading questions, we introduce a MultiModal Robustness benchmark (MMR). MMR contains paired positive and negative questions across 12 categories, meticulously annotated by humans. We evaluate 18 leading MLLMs on the MMB benchmark, revealing that MLLMs suffer from fragility to leading questions despite understanding the visual content. To enhance MLLMs' understanding capability and robustness, we further present a training set with paired positive and negative visual question-answer samples. Experiments verify that MLLMs' robustness can be significantly enhanced by tuning on this new training set. The benchmark, training set, and code can be found at https://github.com/BAAI-DCAI/Multimodal-Robustness-Benchmark. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.09755 [pdf, other]

Mix Q-learning for Lane Changing: A Collaborative Decision-Making Method in Multi-Agent Deep Reinforcement Learning

Authors: Xiaojun Bi, Mingjie He, Yiwen Sun

Abstract: Lane-changing decisions, which are crucial for autonomous vehicle path planning, face practical challenges due to rule-based constraints and limited data. Deep reinforcement learning has become a major research focus due to its advantages in data acquisition and interpretability. However, current models often overlook collaboration, which affects not only impacts overall traffic efficiency but als… ▽ More Lane-changing decisions, which are crucial for autonomous vehicle path planning, face practical challenges due to rule-based constraints and limited data. Deep reinforcement learning has become a major research focus due to its advantages in data acquisition and interpretability. However, current models often overlook collaboration, which affects not only impacts overall traffic efficiency but also hinders the vehicle's own normal driving in the long run. To address the aforementioned issue, this paper proposes a method named Mix Q-learning for Lane Changing(MQLC) that integrates a hybrid value Q network, taking into account both collective and individual benefits for the greater good. At the collective level, our method coordinates the individual Q and global Q networks by utilizing global information. This enables agents to effectively balance their individual interests with the collective benefit. At the individual level, we integrated a deep learning-based intent recognition module into our observation and enhanced the decision network. These changes provide agents with richer decision information and more accurate feature extraction for improved lane-changing decisions. This strategy enables the multi-agent system to learn and formulate optimal decision-making strategies effectively. Our MQLC model, through extensive experimental results, impressively outperforms other state-of-the-art multi-agent decision-making methods, achieving significantly safer and faster lane-changing decisions. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.07546 [pdf, other]

Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

Authors: Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth

Abstract: We present a novel task and benchmark for evaluating the ability of text-to-image(T2I) generation models to produce images that fit commonsense in real life, which we call Commonsense-T2I. Given two adversarial text prompts containing an identical set of action words with minor differences, such as "a lightbulb without electricity" v.s. "a lightbulb with electricity", we evaluate whether T2I model… ▽ More We present a novel task and benchmark for evaluating the ability of text-to-image(T2I) generation models to produce images that fit commonsense in real life, which we call Commonsense-T2I. Given two adversarial text prompts containing an identical set of action words with minor differences, such as "a lightbulb without electricity" v.s. "a lightbulb with electricity", we evaluate whether T2I models can conduct visual-commonsense reasoning, e.g. produce images that fit "the lightbulb is unlit" vs. "the lightbulb is lit" correspondingly. Commonsense-T2I presents an adversarial challenge, providing pairwise text prompts along with expected outputs. The dataset is carefully hand-curated by experts and annotated with fine-grained labels, such as commonsense type and likelihood of the expected outputs, to assist analyzing model behavior. We benchmark a variety of state-of-the-art (sota) T2I models and surprisingly find that, there is still a large gap between image synthesis and real life photos--even the DALL-E 3 model could only achieve 48.92% on Commonsense-T2I, and the stable diffusion XL model only achieves 24.92% accuracy. Our experiments show that GPT-enriched prompts cannot solve this challenge, and we include a detailed analysis about possible reasons for such deficiency. We aim for Commonsense-T2I to serve as a high-quality evaluation benchmark for T2I commonsense checking, fostering advancements in real life image generation. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Text-to-Image Generation, Commonsense, Project Url: https://zeyofu.github.io/CommonsenseT2I/

arXiv:2406.05848 [pdf, other]

Nonlinear Interactions of Planetary-Scale Waves in Mesospheric Winds Observed at 52°N Latitude and Two Longitudes

Authors: Maosheng He, Jeffrey M. Forbes, Gunter Stober, Christoph Jacobi, Guozhu Li, Libo Liu, Jiyao Xu

Abstract: Nine years of mesospheric wind data from two meteor radars at 52°N latitude were analyzed to investigate planetary waves (PWs) and tides by estimating their zonal wavenumber through longitudinal phase differences. Our results reveal that PW normal modes (NMs) primarily drive multi-day oscillations, showing seasonal variability and statistical associations with Sudden Stratospheric Warming (SSW) ev… ▽ More Nine years of mesospheric wind data from two meteor radars at 52°N latitude were analyzed to investigate planetary waves (PWs) and tides by estimating their zonal wavenumber through longitudinal phase differences. Our results reveal that PW normal modes (NMs) primarily drive multi-day oscillations, showing seasonal variability and statistical associations with Sudden Stratospheric Warming (SSW) events. Specifically, a significant 6-day NM emerges in April, followed by predominant 4- and 2-day NMs until June, with peaks of 2-, 4-, and 6-day NMs spanning July to October. Furthermore, our study provides the first observational verification of frequency and zonal wavenumber of over ten secondary waves from nonlinear interactions among planetary-scale waves. One notable finding is the prevalence of non-migrating components in winter 24-hour and summer 8-hour tides, attributed to these nonlinear interactions. Our findings underscore the diverse nonlinear dynamics of planetary-scale waves, triggering a variety of periodic oscillations. △ Less

Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.04180 [pdf, other]

Cogenesis by a sliding pNGB with symmetry non-restoration

Authors: Eung Jin Chun, Suruj Jyoti Das, Minxi He, Tae Hyun Jung, Jin Sun

Abstract: We show that a pseudo-Nambu-Goldstone boson (pNGB) with an initial misalignment angle can drive successful spontaneous baryogenesis, and become a good dark matter candidate if the corresponding global symmetry is non-restored at high temperatures. Considering a dimension-five explicit breaking operator, we find that the pNGB starts its motion with a sliding across rapidly decreasing potential barr… ▽ More We show that a pseudo-Nambu-Goldstone boson (pNGB) with an initial misalignment angle can drive successful spontaneous baryogenesis, and become a good dark matter candidate if the corresponding global symmetry is non-restored at high temperatures. Considering a dimension-five explicit breaking operator, we find that the pNGB starts its motion with a sliding across rapidly decreasing potential barriers during which the baryon asymmetry is generated and frozen, and later it oscillates as dark matter. It is predicted that the pNGB mass and decay constant are around $5\,{\rm eV}$ and $3\times10^6\,{\rm GeV}$, respectively, while the radial mode has a light mass $O(10)\,{\rm MeV}$ and a small mixing $O(10^{-4})$ with the Higgs boson. Applied to the Majoron in the type-I seesaw model, the heaviest right-handed neutrino is required to be as light as $100\,{\rm GeV}$. These predictions can be tested at kaon experiments, heavy neutral lepton searches, LHC, and future colliders. △ Less

Submitted 21 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: 5 pages, 2 figures with supplemental material, v2: discussion on the isocurvature perturbation constraint and references added

Report number: CTPU-PTC-24-16

arXiv:2406.01993 [pdf]

Choroidal Vessel Segmentation on Indocyanine Green Angiography Images via Human-in-the-Loop Labeling

Authors: Ruoyu Chen, Ziwei Zhao, Mayinuer Yusufu, Xianwen Shang, Danli Shi, Mingguang He

Abstract: Human-in-the-loop (HITL) strategy has been recently introduced into the field of medical image processing. Indocyanine green angiography (ICGA) stands as a well-established examination for visualizing choroidal vasculature and detecting chorioretinal diseases. However, the intricate nature of choroidal vascular networks makes large-scale manual segmentation of ICGA images challenging. Thus, the st… ▽ More Human-in-the-loop (HITL) strategy has been recently introduced into the field of medical image processing. Indocyanine green angiography (ICGA) stands as a well-established examination for visualizing choroidal vasculature and detecting chorioretinal diseases. However, the intricate nature of choroidal vascular networks makes large-scale manual segmentation of ICGA images challenging. Thus, the study aims to develop a high-precision choroidal vessel segmentation model with limited labor using HITL framework. We utilized a multi-source ICGA dataset, including 55 degree view and ultra-widefield ICGA (UWF-ICGA) images for model development. The choroidal vessel network was pre-segmented by a pre-trained vessel segmentation model, and then manually modified by two ophthalmologists. Choroidal vascular diameter, density, complexity, tortuosity, and branching angle were automatically quantified based on the segmentation. We finally conducted four cycles of HITL. One hundred and fifty 55 degree view ICGA images were used for the first three cycles (50 images per cycle), and twenty UWF-ICGA images for the last cycle. The average time needed to manually correct a pre-segmented ICGA image per cycle reduced from 20 minutes to 1 minute. High segmentation accuracy has been achieved on both 55 degree view ICGA and UWF-ICGA images. Additionally, the multi-dimensional choroidal vascular parameters were significantly associated with various chorioretinal diseases. Our study not only demonstrated the feasibility of the HITL strategy in improving segmentation performance with reduced manual labeling, but also innovatively introduced several risk predictors for choroidal abnormalities. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 25 pages,4 figures

arXiv:2406.01435 [pdf, other]

Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning

Authors: Fan He, Mingzhen He, Lei Shi, Xiaolin Huang, Johan A. K. Suykens

Abstract: Ridgeless regression has garnered attention among researchers, particularly in light of the ``Benign Overfitting'' phenomenon, where models interpolating noisy samples demonstrate robust generalization. However, kernel ridgeless regression does not always perform well due to the lack of flexibility. This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels,… ▽ More Ridgeless regression has garnered attention among researchers, particularly in light of the ``Benign Overfitting'' phenomenon, where models interpolating noisy samples demonstrate robust generalization. However, kernel ridgeless regression does not always perform well due to the lack of flexibility. This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels, incorporating kernel learning techniques to improve performance in both experiments and theory. For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs). Despite the absence of explicit regularization in the proposed model, its optimization is equivalent to solving an $\ell_0$-regularized problem in the integral space of RKHSs, elucidating the origin of its generalization ability. Taking an approximation analysis viewpoint, we introduce an $l_q$-norm analysis technique (with $0<q<1$) to derive the learning rate for the proposed model under mild conditions. This result deepens our theoretical understanding, explaining that our algorithm's robust approximation ability arises from the large capacity of the integral space of RKHSs, while its generalization ability is ensured by sparsity, controlled by the number of support vectors. Experimental results on both synthetic and real datasets validate our theoretical conclusions. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.05236

arXiv:2406.01007 [pdf, other]

Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay

Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, J. Cheng, Y. -C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng , et al. (177 additional authors not shown)

Abstract: This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive… ▽ More This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.20659 [pdf]

Realization of a cold atom gyroscope in space

Authors: Jinting Li, Xi Chen, Danfang Zhang, Wenzhang Wang, Yang Zhou, Meng He, Jie Fang, Lin Zhou, Chuan He, Junjie Jiang, Huanyao Sun, Qunfeng Chen, Lei Qin, Xiao Li, Yibo Wang, Xiaowei Zhang, Jiaqi Zhong, Runbing Li, Meizhen An, Long Zhang, Shuquan Wang, Zongfeng Li, Jin Wang, Mingsheng Zhan

Abstract: High precision gyroscopes in space are important for sophisticated scientific experiments and deep space navigation. Microgravity in the space provides an ideal condition for operation of a cold atom gyroscope. To demonstrate this advantage, an atom interferometer (AI) was launched and installed in the China Space Station in 2022. Here reported is a realization of the cold atom gyroscope with this… ▽ More High precision gyroscopes in space are important for sophisticated scientific experiments and deep space navigation. Microgravity in the space provides an ideal condition for operation of a cold atom gyroscope. To demonstrate this advantage, an atom interferometer (AI) was launched and installed in the China Space Station in 2022. Here reported is a realization of the cold atom gyroscope with this AI. By applying point source interferometry, spatial fringes are obtained and acceleration and rotation are extracted. The angles of the Raman lasers are precisely calibrated to avoid measurement error, and other systematic errors are also considered for the rotation measurement. The evaluated rotation measurement is (-115.64+/-1.71)*10^-5 rad/s in space, and an acceleration measurement resolution of 1.03*10^-6 m/s^2 is also obtained for a single image. This study conducts the first AI-based gyroscope in space and paves a way for future space-based AI experiments. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 12 pages, 5 figures

arXiv:2405.20621 [pdf, other]

A critical comparison of the implementation of granular pressure gradient term in Euler-Euler simulation of gas-solid flows

Authors: Yige Liu, Mingming He, Jianhua Chen, Wen Li, Bidan Zhao, Ji Xu, Junwu Wang

Abstract: Numerical solution of Euler-Euler model using different in-house, open source and commercial software can generate significantly different results, even when the governing equations and the initial and boundary conditions are exactly same. Unfortunately, the underlying reasons have not been identified yet. In this article, three methods for calculating the granular pressure gradient term are prese… ▽ More Numerical solution of Euler-Euler model using different in-house, open source and commercial software can generate significantly different results, even when the governing equations and the initial and boundary conditions are exactly same. Unfortunately, the underlying reasons have not been identified yet. In this article, three methods for calculating the granular pressure gradient term are presented for two-fluid model of gas-solid flows and implemented implicitly or explicitly into the solver in OpenFOAM: Method I assumes that the granular pressure gradient is equal to the elastic modulus plus the solid concentration gradient; Method II directly calculates the gradient using a difference scheme; Method III, which is proposed in this work, calculates the gradient as the sum of two partial derivatives: one related to the solid volume fraction and the other related to the granular energy. Obviously, only Methods II and III are consistent with kinetic theory of granular flow. It was found that the difference between all methods is small for bubbling fluidization. While for circulating fluidization, both methods II and III are capable of capturing non-uniform structures and producing superior results over Method I. The contradictory conclusions made from the simulation of different fluidization regimes is due to the different contribution of the term related to the granular energy gradient. Present study concludes that the implementation method of granular pressure gradient may have a significant impact on hydrodynamics and is probably a key factor contributing to the observed differences between different simulation software. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.11338 [pdf]

EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging

Authors: Danli Shi, Weiyi Zhang, Xiaolan Chen, Yexin Liu, Jiancheng Yang, Siyu Huang, Yih Chung Tham, Yingfeng Zheng, Mingguang He

Abstract: Artificial intelligence (AI) is vital in ophthalmology, tackling tasks like diagnosis, classification, and visual question answering (VQA). However, existing AI models in this domain often require extensive annotation and are task-specific, limiting their clinical utility. While recent developments have brought about foundation models for ophthalmology, they are limited by the need to train separa… ▽ More Artificial intelligence (AI) is vital in ophthalmology, tackling tasks like diagnosis, classification, and visual question answering (VQA). However, existing AI models in this domain often require extensive annotation and are task-specific, limiting their clinical utility. While recent developments have brought about foundation models for ophthalmology, they are limited by the need to train separate weights for each imaging modality, preventing a comprehensive representation of multi-modal features. This highlights the need for versatile foundation models capable of handling various tasks and modalities in ophthalmology. To address this gap, we present EyeFound, a multimodal foundation model for ophthalmic images. Unlike existing models, EyeFound learns generalizable representations from unlabeled multimodal retinal images, enabling efficient model adaptation across multiple applications. Trained on 2.78 million images from 227 hospitals across 11 ophthalmic modalities, EyeFound facilitates generalist representations and diverse multimodal downstream tasks, even for detecting challenging rare diseases. It outperforms previous work RETFound in diagnosing eye diseases, predicting systemic disease incidents, and zero-shot multimodal VQA. EyeFound provides a generalizable solution to improve model performance and lessen the annotation burden on experts, facilitating widespread clinical AI applications for retinal imaging. △ Less

Submitted 21 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

Comments: 21 pages, 2 figures, 4 tables

arXiv:2405.11236 [pdf, other]

TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation

Authors: Chengcheng Feng, Mu He, Qiuyu Tian, Haojie Yin, Xiaofang Zhao, Hongwei Tang, Xingqiang Wei

Abstract: As deep learning technology continues to advance, image generation models, especially models like Stable Diffusion, are finding increasingly widespread application in visual arts creation. However, these models often face challenges such as overfitting, lack of stability in generated results, and difficulties in accurately capturing the features desired by creators during the fine-tuning process.… ▽ More As deep learning technology continues to advance, image generation models, especially models like Stable Diffusion, are finding increasingly widespread application in visual arts creation. However, these models often face challenges such as overfitting, lack of stability in generated results, and difficulties in accurately capturing the features desired by creators during the fine-tuning process. In response to these challenges, we propose an innovative method that integrates Singular Value Decomposition (SVD) into the Low-Rank Adaptation (LoRA) parameter update strategy, aimed at enhancing the fine-tuning efficiency and output quality of image generation models. By incorporating SVD within the LoRA framework, our method not only effectively reduces the risk of overfitting but also enhances the stability of model outputs, and captures subtle, creator-desired feature adjustments more accurately. We evaluated our method on multiple datasets, and the results show that, compared to traditional fine-tuning methods, our approach significantly improves the model's generalization ability and creative flexibility while maintaining the quality of generation. Moreover, this method maintains LoRA's excellent performance under resource-constrained conditions, allowing for significant improvements in image generation quality without sacrificing the original efficiency and resource advantages. △ Less

Submitted 13 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.10739 [pdf, other]

Efficient Multimodal Large Language Models: A Survey

Authors: Yizhang Jin, Jian Li, Yexin Liu, Tianjun Gu, Kai Wu, Zhengkai Jiang, Muyang He, Bo Zhao, Xin Tan, Zhenye Gan, Yabiao Wang, Chengjie Wang, Lizhuang Ma

Abstract: In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, e… ▽ More In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, especially in edge computing scenarios. In this survey, we provide a comprehensive and systematic review of the current state of efficient MLLMs. Specifically, we summarize the timeline of representative efficient MLLMs, research state of efficient structures and strategies, and the applications. Finally, we discuss the limitations of current efficient MLLM research and promising future directions. Please refer to our GitHub repository for more details: https://github.com/lijiannuist/Efficient-Multimodal-LLMs-Survey. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.10676 [pdf, other]

Identifying L-H transition in HL-2A through deep learning

Authors: Meihuizi He, Songfen Liu, Fan Xia, Zongyu Yang, Wulyu Zhong

Abstract: During the operation of tokamak devices, addressing the thermal load issues caused by Edge Localized Modes (ELMs) eruption is crucial. Ideally, mitigation and suppression measures for ELMs should be promptly initiated as soon as the first low-to-high confinement (L-H) transition occurs, which necessitates the real-time monitoring and accurate identification of the L-H transition process. Motivated… ▽ More During the operation of tokamak devices, addressing the thermal load issues caused by Edge Localized Modes (ELMs) eruption is crucial. Ideally, mitigation and suppression measures for ELMs should be promptly initiated as soon as the first low-to-high confinement (L-H) transition occurs, which necessitates the real-time monitoring and accurate identification of the L-H transition process. Motivated by this, and by recent deep learning boom, we propose a deep learning-based L-H transition identification algorithm on HL-2A tokamak. In this work, we have constructed a neural network comprising layers of Residual Long Short-Term Memory (LSTM) and Temporal Convolutional Network (TCN). Unlike previous work based on recognition for ELMs by slice, this method implements recognition on L-H transition process before the first ELMs crash. Therefore the mitigation techniques can be triggered in time to suppress the initial ELMs bursts. In order to further explain the effectiveness of the algorithm, we developed a series of evaluation indicators by shots, and the results show that this algorithm can provide necessary reference for the mitigation and suppression system. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.09059 [pdf, other]

Task-adaptive Q-Face

Authors: Haomiao Sun, Mingjie He, Shiguang Shan, Hu Han, Xilin Chen

Abstract: Although face analysis has achieved remarkable improvements in the past few years, designing a multi-task face analysis model is still challenging. Most face analysis tasks are studied as separate problems and do not benefit from the synergy among related tasks. In this work, we propose a novel task-adaptive multi-task face analysis method named as Q-Face, which simultaneously performs multiple fa… ▽ More Although face analysis has achieved remarkable improvements in the past few years, designing a multi-task face analysis model is still challenging. Most face analysis tasks are studied as separate problems and do not benefit from the synergy among related tasks. In this work, we propose a novel task-adaptive multi-task face analysis method named as Q-Face, which simultaneously performs multiple face analysis tasks with a unified model. We fuse the features from multiple layers of a large-scale pre-trained model so that the whole model can use both local and global facial information to support multiple tasks. Furthermore, we design a task-adaptive module that performs cross-attention between a set of query vectors and the fused multi-stage features and finally adaptively extracts desired features for each face analysis task. Extensive experiments show that our method can perform multiple tasks simultaneously and achieves state-of-the-art performance on face expression recognition, action unit detection, face attribute analysis, age estimation, and face pose estimation. Compared to conventional methods, our method opens up new possibilities for multi-task face analysis and shows the potential for both accuracy and efficiency. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Ever submitted to ECCV2024

arXiv:2405.07800 [pdf, other]

Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method

Authors: Ruikai Yang, Fan He, Mingzhen He, Kaijie Wang, Xiaolin Huang

Abstract: Data imputation, the process of filling in missing feature elements for incomplete data sets, plays a crucial role in data-driven learning. A fundamental belief is that data imputation is helpful for learning performance, and it follows that the pursuit of better classification can guide the data imputation process. While some works consider using label information to assist in this task, their si… ▽ More Data imputation, the process of filling in missing feature elements for incomplete data sets, plays a crucial role in data-driven learning. A fundamental belief is that data imputation is helpful for learning performance, and it follows that the pursuit of better classification can guide the data imputation process. While some works consider using label information to assist in this task, their simplistic utilization of labels lacks flexibility and may rely on strict assumptions. In this paper, we propose a new framework that effectively leverages supervision information to complete missing data in a manner conducive to classification. Specifically, this framework operates in two stages. Firstly, it leverages labels to supervise the optimization of similarity relationships among data, represented by the kernel matrix, with the goal of enhancing classification accuracy. To mitigate overfitting that may occur during this process, a perturbation variable is introduced to improve the robustness of the framework. Secondly, the learned kernel matrix serves as additional supervision information to guide data imputation through regression, utilizing the block coordinate descent method. The superiority of the proposed method is evaluated on four real-world data sets by comparing it with state-of-the-art imputation methods. Remarkably, our algorithm significantly outperforms other methods when the data is missing more than 60\% of the features △ Less

Submitted 9 July, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07791 [pdf, ps, other]

doi 10.1109/TNNLS.2024.3414325

Decentralized Kernel Ridge Regression Based on Data-Dependent Random Feature

Authors: Ruikai Yang, Fan He, Mingzhen He, Jie Yang, Xiaolin Huang

Abstract: Random feature (RF) has been widely used for node consistency in decentralized kernel ridge regression (KRR). Currently, the consistency is guaranteed by imposing constraints on coefficients of features, necessitating that the random features on different nodes are identical. However, in many applications, data on different nodes varies significantly on the number or distribution, which calls for… ▽ More Random feature (RF) has been widely used for node consistency in decentralized kernel ridge regression (KRR). Currently, the consistency is guaranteed by imposing constraints on coefficients of features, necessitating that the random features on different nodes are identical. However, in many applications, data on different nodes varies significantly on the number or distribution, which calls for adaptive and data-dependent methods that generate different RFs. To tackle the essential difficulty, we propose a new decentralized KRR algorithm that pursues consensus on decision functions, which allows great flexibility and well adapts data on nodes. The convergence is rigorously given and the effectiveness is numerically verified: by capturing the characteristics of the data on each node, while maintaining the same communication costs as other methods, we achieved an average regression accuracy improvement of 25.5\% across six real-world data sets. △ Less

Submitted 5 July, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07488 [pdf, other]

Predictive Modeling of Flexible EHD Pumps using Kolmogorov-Arnold Networks

Authors: Yanhong Peng, Miao He, Fangchao Hu, Zebing Mao, Xia Huang, Jun Ding

Abstract: We present a novel approach to predicting the pressure and flow rate of flexible electrohydrodynamic pumps using the Kolmogorov-Arnold Network. Inspired by the Kolmogorov-Arnold representation theorem, KAN replaces fixed activation functions with learnable spline-based activation functions, enabling it to approximate complex nonlinear functions more effectively than traditional models like Multi-L… ▽ More We present a novel approach to predicting the pressure and flow rate of flexible electrohydrodynamic pumps using the Kolmogorov-Arnold Network. Inspired by the Kolmogorov-Arnold representation theorem, KAN replaces fixed activation functions with learnable spline-based activation functions, enabling it to approximate complex nonlinear functions more effectively than traditional models like Multi-Layer Perceptron and Random Forest. We evaluated KAN on a dataset of flexible EHD pump parameters and compared its performance against RF, and MLP models. KAN achieved superior predictive accuracy, with Mean Squared Errors of 12.186 and 0.001 for pressure and flow rate predictions, respectively. The symbolic formulas extracted from KAN provided insights into the nonlinear relationships between input parameters and pump performance. These findings demonstrate that KAN offers exceptional accuracy and interpretability, making it a promising alternative for predictive modeling in electrohydrodynamic pumping. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07468 [pdf]

Evaluating large language models in medical applications: a survey

Authors: Xiaolan Chen, Jiayang Xiang, Shanfu Lu, Yexin Liu, Mingguang He, Danli Shi

Abstract: Large language models (LLMs) have emerged as powerful tools with transformative potential across numerous domains, including healthcare and medicine. In the medical domain, LLMs hold promise for tasks ranging from clinical decision support to patient education. However, evaluating the performance of LLMs in medical contexts presents unique challenges due to the complex and critical nature of medic… ▽ More Large language models (LLMs) have emerged as powerful tools with transformative potential across numerous domains, including healthcare and medicine. In the medical domain, LLMs hold promise for tasks ranging from clinical decision support to patient education. However, evaluating the performance of LLMs in medical contexts presents unique challenges due to the complex and critical nature of medical information. This paper provides a comprehensive overview of the landscape of medical LLM evaluation, synthesizing insights from existing studies and highlighting evaluation data sources, task scenarios, and evaluation methods. Additionally, it identifies key challenges and opportunities in medical LLM evaluation, emphasizing the need for continued research and innovation to ensure the responsible integration of LLMs into clinical practice. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 4 figures, 1 table

arXiv:2405.07025 [pdf, ps, other]

Second-Order Dissociation and Transition of Heavy Quarkonia in the Quark-Gluon Plasma

Authors: Shouxing Zhao, Min He

Abstract: We revisit the dissociation of heavy quarkonia by thermal partons at the next-to-leading order (NLO, also known as inelastic parton scattering dissociation) in the Quark-Gluon Plasma (QGP). Utilizing the chromo-electric dipole coupling from QCD multipole expansion as an effective Hamiltonian, this has been conducted in the approach of second-order quantum mechanical perturbation theory, which allo… ▽ More We revisit the dissociation of heavy quarkonia by thermal partons at the next-to-leading order (NLO, also known as inelastic parton scattering dissociation) in the Quark-Gluon Plasma (QGP). Utilizing the chromo-electric dipole coupling from QCD multipole expansion as an effective Hamiltonian, this has been conducted in the approach of second-order quantum mechanical perturbation theory, which allows us to systematically incorporate the bound state wave functions. Employing the quarkonium wave functions and binding energies obtained from an in-medium potential model, we then numerically evaluate the dissociation cross sections and rates for various charmonia and bottomonia, where the infrared and collinear divergences are regularized by the thermal masses of medium partons. We demonstrate that distinct from the leading order (LO, also known as gluo-dissociation) counterparts peaking at relatively low gluon energy and falling off thereafter, the NLO cross sections first grow and then nearly saturate as the incident parton energy increases, as a result of the outgoing parton carrying away the excess energy. The resulting NLO dissociation rates increase with temperature and take over from the LO counterparts toward high temperatures, similar to pertinent findings from previous studies. We also evaluate the in-medium second-order transition between different bound states, which may contribute to the total thermal decay widths of heavy quarkonia in the QGP. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: 17 pages, 13 figures

arXiv:2405.05913 [pdf, other]

Topological flat bands in a family of multilayer graphene moiré lattices

Authors: Dacen Waters, Ruiheng Su, Ellis Thompson, Anna Okounkova, Esmeralda Arreguin-Martinez, Minhao He, Katherine Hinds, Kenji Watanabe, Takashi Taniguchi, Xiaodong Xu, Ya-Hui Zhang, Joshua Folk, Matthew Yankowitz

Abstract: Moiré materials host a wealth of intertwined correlated and topological states of matter, all arising from flat electronic bands with nontrivial quantum geometry. A prominent example is the family of alternating-twist magic-angle graphene stacks, which exhibit symmetry-broken states at rational fillings of the moiré band and superconductivity close to half filling. Here, we introduce a second fami… ▽ More Moiré materials host a wealth of intertwined correlated and topological states of matter, all arising from flat electronic bands with nontrivial quantum geometry. A prominent example is the family of alternating-twist magic-angle graphene stacks, which exhibit symmetry-broken states at rational fillings of the moiré band and superconductivity close to half filling. Here, we introduce a second family of twisted graphene multilayers made up of twisted sheets of $M$- and $N$-layer Bernal-stacked graphene flakes. Calculations indicate that applying an electric displacement field isolates a flat and topological moiré conduction band that is primarily localized to a single graphene sheet below the moiré interface. Phenomenologically, the result is a striking similarity in the hierarchies of symmetry-broken phases across this family of twisted graphene multilayers. Our results show that this family of structures offers promising new opportunities for the discovery of exotic new correlated and topological phenomena, enabled by using the layer number to fine tune the flat moiré band and its screening environment. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 7 pages, 4 figures, extended data, 10 extended data figures, 19 supplementary information figures

arXiv:2405.03135 [pdf, other]

CURLING - I. The Influence of Point-like Image Approximation on the Outcomes of Cluster Strong Lens Modeling

Authors: Yushan Xie, Huanyuan Shan, Nan Li, Ran Li, Eric Jullo, Chen Su, Xiaoyue Cao, Jean-Paul Kneib, Ana Acebron, Mengfan He, Ji Yao, Chunxiang Wang, Jiadong Li, Yin Li

Abstract: Cluster-scale strong lensing is a powerful tool for exploring the properties of dark matter and constraining cosmological models. However, due to the complex parameter space, pixelized strong lens modeling in galaxy clusters is computationally expensive, leading to the point-source approximation of strongly lensed extended images, potentially introducing systematic biases. Herein, as the first pap… ▽ More Cluster-scale strong lensing is a powerful tool for exploring the properties of dark matter and constraining cosmological models. However, due to the complex parameter space, pixelized strong lens modeling in galaxy clusters is computationally expensive, leading to the point-source approximation of strongly lensed extended images, potentially introducing systematic biases. Herein, as the first paper of the ClUsteR strong Lens modelIng for the Next-Generation observations (CURLING) program, we use lensing ray-tracing simulations to quantify the biases and uncertainties arising from the point-like image approximation for JWST-like observations. Our results indicate that the approximation works well for reconstructing the total cluster mass distribution, but can bias the magnification measurements near critical curves and the constraints on the cosmological parameters, the total matter density of the Universe $Ω_{\rm m}$, and dark energy equation of state parameter $w$. To mitigate the biases, we propose incorporating the extended surface brightness distribution of lensed sources into the modeling. This approach reduces the bias in magnification from 46.2 per cent to 0.09 per cent for $μ\sim 1000$. Furthermore, the median values of cosmological parameters align more closely with the fiducial model. In addition to the improved accuracy, we also demonstrate that the constraining power can be substantially enhanced. In conclusion, it is necessary to model cluster-scale strong lenses with pixelized multiple images, especially for estimating the intrinsic luminosity of highly magnified sources and accurate cosmography in the era of high-precision observations. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 12 pages, 8 figures

arXiv:2404.18408 [pdf, other]

doi 10.1088/1674-4527/ad3954

Low surface brightness galaxies from BASS+MzLS with Machine Learning

Authors: Peng-Liang Du, Wei Du, Bing-Qing Zhang, Zhen-Ping Yi, Min He, Hong Wu

Abstract: From $\sim$ 5000 deg$^{2}$ of the combination of the Beijing-Arizona Sky Survey (BASS) and Mayall $z$-band Legacy Survey (MzLS) which is also the northern sky region of the Dark Energy Spectroscopic Instrument (DESI) Legacy Imaging Surveys, we selected a sample of 31,825 candidates of low surface brightness galaxies (LSBGs) with the mean effective surface brightness 24.2 $< \barμ_{\rm eff,g} <$ 28… ▽ More From $\sim$ 5000 deg$^{2}$ of the combination of the Beijing-Arizona Sky Survey (BASS) and Mayall $z$-band Legacy Survey (MzLS) which is also the northern sky region of the Dark Energy Spectroscopic Instrument (DESI) Legacy Imaging Surveys, we selected a sample of 31,825 candidates of low surface brightness galaxies (LSBGs) with the mean effective surface brightness 24.2 $< \barμ_{\rm eff,g} <$ 28.8 mag arcsec$^{\rm -2}$ and the half-light radius 2.5$^{\prime\prime}$ $< r_{\rm eff} <$ 20$^{\prime\prime}$ based on the released photometric catalogue and the machine learning model. The distribution of the LSBGs is of bimodality in the $g$ - $r$ color, indicating the two distinct populations of the blue ($g$ - $r <$ 0.60) and the red ($g$ - $r >$ 0.60) LSBGs. The blue LSBGs appear spiral, disk or irregular while the red LSBGs are spheroidal or ellipitcal and spatially clustered. This trend shows that the color has a strong correlation with galaxy morphology for LSBGs. In the spatial distribution, the blue LSBGs are more uniformly distributed while the red ones are highly clustered, indicating that red LSBGs preferentially populated denser environment than the blue LSBGs. Besides, both populations have consistent distribution of ellipticity (median $ε\sim$ 0.3), half-light radius (median $r_{\rm eff} \sim$ 4$^{\prime\prime}$), and Sersic index (median $n$ = 1), implying the dominance of the full sample by the round and disk galaxies. This sample has definitely extended the studies of LSBGs to a regime of lower surface brightness, fainter magnitude, and broader other properties than the previously SDSS-based samples. △ Less

Submitted 29 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

Comments: 20 pages, 11 figures, 1 table, accepted by Research in Astronomy and Astrophysics

arXiv:2404.17280 [pdf, other]

Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks

Authors: Mingrui He, Longting Xu, Han Wang, Mingjun Zhang, Rohan Kumar Das

Abstract: The most common spoofing attacks on automatic speaker verification systems are replay speech attacks. Detection of replay speech heavily relies on replay configuration information. Previous studies have shown that graph Fourier transform-derived features can effectively detect replay speech but ignore device and environmental noise effects. In this work, we propose a new feature, the graph frequen… ▽ More The most common spoofing attacks on automatic speaker verification systems are replay speech attacks. Detection of replay speech heavily relies on replay configuration information. Previous studies have shown that graph Fourier transform-derived features can effectively detect replay speech but ignore device and environmental noise effects. In this work, we propose a new feature, the graph frequency device cepstral coefficient, derived from the graph frequency domain using a device-related linear transformation. We also introduce two novel representations: graph frequency logarithmic coefficient and graph frequency logarithmic device coefficient. We evaluate our methods using traditional Gaussian mixture model and light convolutional neural network systems as classifiers. On the ASVspoof 2017 V2, ASVspoof 2019 physical access, and ASVspoof 2021 physical access datasets, our proposed features outperform known front-ends, demonstrating their effectiveness for replay speech detection. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.13158 [pdf, other]

Resource Slicing with Cross-Cell Coordination in Satellite-Terrestrial Integrated Networks

Authors: Mingcheng He, Huaqing Wu, Conghao Zhou, Xuemin, Shen

Abstract: Satellite-terrestrial integrated networks (STIN) are envisioned as a promising architecture for ubiquitous network connections to support diversified services. In this paper, we propose a novel resource slicing scheme with cross-cell coordination in STIN to satisfy distinct service delay requirements and efficient resource usage. To address the challenges posed by spatiotemporal dynamics in servic… ▽ More Satellite-terrestrial integrated networks (STIN) are envisioned as a promising architecture for ubiquitous network connections to support diversified services. In this paper, we propose a novel resource slicing scheme with cross-cell coordination in STIN to satisfy distinct service delay requirements and efficient resource usage. To address the challenges posed by spatiotemporal dynamics in service demands and satellite mobility, we formulate the resource slicing problem into a long-term optimization problem and propose a distributed resource slicing (DRS) scheme for scalable and flexible resource management across different cells. Specifically, a hybrid data-model co-driven approach is developed, including an asynchronous multi-agent reinforcement learning-based algorithm to determine the optimal satellite set serving each cell and a distributed optimization-based algorithm to make the resource reservation decisions for each slice. Simulation results demonstrate that the proposed scheme outperforms benchmark methods in terms of resource usage and delay performance. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Accepted by IEEE ICC 2024

arXiv:2404.07149 [pdf, other]

Tianyu: search for the second solar system and explore the dynamic universe

Authors: Fabo Feng, Yicheng Rui, Zhimao Du, Qing Lin, Congcong Zhang, Dan Zhou, Kaiming Cui, Masahiro Ogihara, Ming Yang, Jie Lin, Yongzhi Cai, Taozhi Yang, Xiaoying Pang, Mingjie Jian, Wenxiong Li, Hengxiao Guo, Xian Shi, Jianchun Shi, Jianyang Li, Kangrou Guo, Song Yao, Aming Chen, Peng Jia, Xianyu Tan, James S. Jenkins , et al. (10 additional authors not shown)

Abstract: Giant planets like Jupiter and Saturn, play important roles in the formation and habitability of Earth-like planets. The detection of solar system analogs that have multiple cold giant planets is essential for our understanding of planet habitability and planet formation. Although transit surveys such as Kepler and TESS have discovered thousands of exoplanets, these missions are not sensitive to l… ▽ More Giant planets like Jupiter and Saturn, play important roles in the formation and habitability of Earth-like planets. The detection of solar system analogs that have multiple cold giant planets is essential for our understanding of planet habitability and planet formation. Although transit surveys such as Kepler and TESS have discovered thousands of exoplanets, these missions are not sensitive to long period planets due to their limited observation baseline. The Tianyu project, comprising two 1-meter telescopes (Tianyu-I and II), is designed to detect transiting cold giant planets in order to find solar system analogs. Featuring a large field of view and equipped with a high-speed CMOS camera, Tianyu-I will perform a high-precision photometric survey of about 100 million stars, measuring light curves at hour-long cadence. The candidates found by Tianyu-I will be confirmed by Tianyu-II and other surveys and follow-up facilities through multi-band photometry, spectroscopy, and high resolution imaging. Tianyu telescopes will be situated at an elevation about 4000 meters in Lenghu, China. With a photometric precision of 1% for stars with V < 18 mag, Tianyu is expected to find more than 300 transiting exoplanets, including about 12 cold giant planets, over five years. A five-year survey of Tianyu would discover 1-2 solar system analogs. Moreover, Tianyu is also designed for non-exoplanetary exploration, incorporating multiple survey modes covering timescales from sub-seconds to months, with a particular emphasis on events occurring within the sub-second to hour range. It excels in observing areas such as infant supernovae, rare variable stars and binaries, tidal disruption events, Be stars, cometary activities, and interstellar objects. These discoveries not only enhance our comprehension of the universe but also offer compelling opportunities for public engagement in scientific exploration. △ Less

Submitted 10 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: 48 pages, 16 figures, accepted by Acta Astronomica Sinica

arXiv:2404.04708 [pdf, other]

Efficient Sparse Processing-in-Memory Architecture (ESPIM) for Machine Learning Inference

Authors: Mingxuan He, Mithuna Thottethodi, T. N. Vijaykumar

Abstract: Emerging machine learning (ML) models (e.g., transformers) involve memory pin bandwidth-bound matrix-vector (MV) computation in inference. By avoiding pin crossings, processing in memory (PIM) can improve performance and energy for pin-bound workloads, as evidenced by recent commercial efforts in (digital) PIM. Sparse models can improve performance and energy of inference without losing much accur… ▽ More Emerging machine learning (ML) models (e.g., transformers) involve memory pin bandwidth-bound matrix-vector (MV) computation in inference. By avoiding pin crossings, processing in memory (PIM) can improve performance and energy for pin-bound workloads, as evidenced by recent commercial efforts in (digital) PIM. Sparse models can improve performance and energy of inference without losing much accuracy. However, unstructured sparse inference injects the key challenges of uncertainty, irregularity, and load imbalance into a dense PIM's operation across all the banks. The dense PIM reads the matrix cells from each bank and broadcasts the vector elements to all the banks exploiting DRAM organization. To address these challenges efficiently, we propose ESPIM which makes four contributions: (1) Because matrix sparsity increases the vector broadcast bandwidth demand per matrix column-read, ESPIM employs a fine-grained interleaving of the matrix cells so that each vector broadcast is shared among multiple rows in each bank, cutting the bandwidth demand. (2) ESPIM mostly avoids on-chip control's area and energy despite sparsity's uncertainties by exploiting the observation that the sparsity is data-dependent but static and known before inference. Accordingly, ESPIM employs static data-dependent scheduling (SDDS) (3) ESPIM decouples the matrix cell values and their indices, placing the indices ahead of the values to enable prefetching of the vector elements. We extend SDDS for performance and correctness with the decoupled prefetching. (4) Finally, we simplify the switch required to select the vector elements that match the matrix cells. We extend SDDS to improve performance by reducing conflicts in the simplified switch. In our simulations, ESPIM achieves 2x average (up to 4.2x) speedup over and 34% average (up to 63%) lower energy than Newton while incurring under 5% area. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.04306 [pdf, other]

AuditGPT: Auditing Smart Contracts with ChatGPT

Authors: Shihao Xia, Shuai Shao, Mengting He, Tingting Yu, Linhai Song, Yiying Zhang

Abstract: To govern smart contracts running on Ethereum, multiple Ethereum Request for Comment (ERC) standards have been developed, each containing a set of rules to guide the behaviors of smart contracts. Violating the ERC rules could cause serious security issues and financial loss, signifying the importance of verifying smart contracts follow ERCs. Today's practices of such verification are to either man… ▽ More To govern smart contracts running on Ethereum, multiple Ethereum Request for Comment (ERC) standards have been developed, each containing a set of rules to guide the behaviors of smart contracts. Violating the ERC rules could cause serious security issues and financial loss, signifying the importance of verifying smart contracts follow ERCs. Today's practices of such verification are to either manually audit each single contract or use expert-developed, limited-scope program-analysis tools, both of which are far from being effective in identifying ERC rule violations. This paper presents a tool named AuditGPT that leverages large language models (LLMs) to automatically and comprehensively verify ERC rules against smart contracts. To build AuditGPT, we first conduct an empirical study on 222 ERC rules specified in four popular ERCs to understand their content, their security impacts, their specification in natural language, and their implementation in Solidity. Guided by the study, we construct AuditGPT by separating the large, complex auditing process into small, manageable tasks and design prompts specialized for each ERC rule type to enhance LLMs' auditing performance. In the evaluation, AuditGPT successfully pinpoints 418 ERC rule violations and only reports 18 false positives, showcasing its effectiveness and accuracy. Moreover, AuditGPT beats an auditing service provided by security experts in effectiveness, accuracy, and cost, demonstrating its advancement over state-of-the-art smart-contract auditing practices. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.03025 [pdf, other]

When Digital Twin Meets Generative AI: Intelligent Closed-Loop Network Management

Authors: Xinyu Huang, Haojun Yang, Conghao Zhou, Mingcheng He, Xuemin Shen, Weihua Zhuang

Abstract: Generative artificial intelligence (GAI) and digital twin (DT) are advanced data processing and virtualization technologies to revolutionize communication networks. Thanks to the powerful data processing capabilities of GAI, integrating it into DT is a potential approach to construct an intelligent holistic virtualized network for better network management performance. To this end, we propose a GA… ▽ More Generative artificial intelligence (GAI) and digital twin (DT) are advanced data processing and virtualization technologies to revolutionize communication networks. Thanks to the powerful data processing capabilities of GAI, integrating it into DT is a potential approach to construct an intelligent holistic virtualized network for better network management performance. To this end, we propose a GAI-driven DT (GDT) network architecture to enable intelligent closed-loop network management. In the architecture, various GAI models can empower DT status emulation, feature abstraction, and network decision-making. The interaction between GAI-based and model-based data processing can facilitate intelligent external and internal closed-loop network management. To further enhance network management performance, three potential approaches are proposed, i.e., model light-weighting, adaptive model selection, and data-model-driven network management. We present a case study pertaining to data-model-driven network management for the GDT network, followed by some open research issues. △ Less

Submitted 8 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: 8 pages, 5 figures

arXiv:2404.01687 [pdf, other]

Search for a sub-eV sterile neutrino using Daya Bay's full dataset

Authors: F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, Y. C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng, X. Y. Ding, Y. Y. Ding , et al. (176 additional authors not shown)

Abstract: This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis… ▽ More This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis benefits from a doubling of the statistics of our previous result and from improvements of several important systematic uncertainties. No significant oscillation due to mixing of a sub-eV sterile neutrino with active neutrinos was found. Exclusion limits are set by both Feldman-Cousins and CLs methods. Light sterile neutrino mixing with $\sin^2 2θ_{14} \gtrsim 0.01$ can be excluded at 95\% confidence level in the region of $0.01$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.1 $ eV$^2$. This result represents the world-leading constraints in the region of $2 \times 10^{-4}$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.2 $ eV$^2$. △ Less

Submitted 15 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures, 1 table

arXiv:2404.00762 [pdf, other]

Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification

Authors: Cheng Wen, Jialun Cao, Jie Su, Zhiwu Xu, Shengchao Qin, Mengda He, Haokun Li, Shing-Chi Cheung, Cong Tian

Abstract: Formal verification provides a rigorous and systematic approach to ensure the correctness and reliability of software systems. Yet, constructing specifications for the full proof relies on domain expertise and non-trivial manpower. In view of such needs, an automated approach for specification synthesis is desired. While existing automated approaches are limited in their versatility, i.e., they ei… ▽ More Formal verification provides a rigorous and systematic approach to ensure the correctness and reliability of software systems. Yet, constructing specifications for the full proof relies on domain expertise and non-trivial manpower. In view of such needs, an automated approach for specification synthesis is desired. While existing automated approaches are limited in their versatility, i.e., they either focus only on synthesizing loop invariants for numerical programs, or are tailored for specific types of programs or invariants. Programs involving multiple complicated data types (e.g., arrays, pointers) and code structures (e.g., nested loops, function calls) are often beyond their capabilities. To help bridge this gap, we present AutoSpec, an automated approach to synthesize specifications for automated program verification. It overcomes the shortcomings of existing work in specification versatility, synthesizing satisfiable and adequate specifications for full proof. It is driven by static analysis and program verification, and is empowered by large language models (LLMs). AutoSpec addresses the practical challenges in three ways: (1) driving \name by static analysis and program verification, LLMs serve as generators to generate candidate specifications, (2) programs are decomposed to direct the attention of LLMs, and (3) candidate specifications are validated in each round to avoid error accumulation during the interaction with LLMs. In this way, AutoSpec can incrementally and iteratively generate satisfiable and adequate specifications. The evaluation shows its effectiveness and usefulness, as it outperforms existing works by successfully verifying 79% of programs through automatic specification synthesis, a significant improvement of 1.592x. It can also be successfully applied to verify the programs in a real-world X509-parser project. △ Less

Submitted 2 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2404.00555 [pdf, other]

Gas-rich Ultra-diffuse Galaxies Are Originated from High Specific Angular Momentum

Authors: Yu Rong, Huijie Hu, Min He, Wei Du, Qi Guo, Hui-Yuan Wang, Hong-Xin Zhang, Houjun Mo

Abstract: Ultra-diffuse galaxies, characterized by comparable effective radii to the Milky Way but possessing 100-1,000 times fewer stars, offer a unique opportunity to garner novel insights into the mechanisms governing galaxy formation. Nevertheless, the existing corpus of observational and simulation studies has not yet yielded a definitive constraint or comprehensive consensus on the formation mechanism… ▽ More Ultra-diffuse galaxies, characterized by comparable effective radii to the Milky Way but possessing 100-1,000 times fewer stars, offer a unique opportunity to garner novel insights into the mechanisms governing galaxy formation. Nevertheless, the existing corpus of observational and simulation studies has not yet yielded a definitive constraint or comprehensive consensus on the formation mechanisms underlying ultra-diffuse galaxies. In this study, we delve into the properties of ultra-diffuse galaxies enriched with neutral hydrogen using a semi-analytic method, with the explicit aim of constraining existing ultra-diffuse galaxy formation models. We find that the gas-rich ultra-diffuse galaxies are statistically not failed $L^{\star}$ galaxies nor dark matter deficient galaxies. In statistical terms, these ultra-diffuse galaxies exhibit comparable halo concentration, but higher baryonic mass fraction, as well as higher stellar and gas specific angular momentum, in comparison to typical dwarf galaxy counterparts. Our analysis unveils that higher gas specific angular momentum serves as the underlying factor elucidating the observed heightened baryonic mass fractions, diminished star formation efficiency, expanded stellar disk sizes, and reduced stellar densities in ultra-diffuse galaxies. Our findings make significant contributions to advancing our knowledge of ultra-diffuse galaxy formation and shed light on the intricate interplay between gas dynamics and the evolution of galaxies. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: comments welcome

arXiv:2404.00357 [pdf, other]

Revisiting Random Weight Perturbation for Efficiently Improving Generalization

Authors: Tao Li, Qinghua Tao, Weihao Yan, Zehao Lei, Yingwen Wu, Kun Fang, Mingzhen He, Xiaolin Huang

Abstract: Improving the generalization ability of modern deep neural networks (DNNs) is a fundamental challenge in machine learning. Two branches of methods have been proposed to seek flat minima and improve generalization: one led by sharpness-aware minimization (SAM) minimizes the worst-case neighborhood loss through adversarial weight perturbation (AWP), and the other minimizes the expected Bayes objecti… ▽ More Improving the generalization ability of modern deep neural networks (DNNs) is a fundamental challenge in machine learning. Two branches of methods have been proposed to seek flat minima and improve generalization: one led by sharpness-aware minimization (SAM) minimizes the worst-case neighborhood loss through adversarial weight perturbation (AWP), and the other minimizes the expected Bayes objective with random weight perturbation (RWP). While RWP offers advantages in computation and is closely linked to AWP on a mathematical basis, its empirical performance has consistently lagged behind that of AWP. In this paper, we revisit the use of RWP for improving generalization and propose improvements from two perspectives: i) the trade-off between generalization and convergence and ii) the random perturbation generation. Through extensive experimental evaluations, we demonstrate that our enhanced RWP methods achieve greater efficiency in enhancing generalization, particularly in large-scale problems, while also offering comparable or even superior performance to SAM. The code is released at https://github.com/nblt/mARWP. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: Accepted to TMLR 2024

arXiv:2404.00309 [pdf, other]

Model-Driven Deep Learning for Distributed Detection with Binary Quantization

Authors: Wei Guo, Meng He, Chuan Huang, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

Abstract: Within the realm of rapidly advancing wireless sensor networks (WSNs), distributed detection assumes a significant role in various practical applications. However, critical challenge lies in maintaining robust detection performance while operating within the constraints of limited bandwidth and energy resources. This paper introduces a novel approach that combines model-driven deep learning (DL) w… ▽ More Within the realm of rapidly advancing wireless sensor networks (WSNs), distributed detection assumes a significant role in various practical applications. However, critical challenge lies in maintaining robust detection performance while operating within the constraints of limited bandwidth and energy resources. This paper introduces a novel approach that combines model-driven deep learning (DL) with binary quantization to strike a balance between communication overhead and detection performance in WSNs. We begin by establishing the lower bound of detection error probability for distributed detection using the maximum a posteriori (MAP) criterion. Furthermore, we prove the global optimality of employing identical local quantizers across sensors, thereby maximizing the corresponding Chernoff information. Subsequently, the paper derives the minimum MAP detection error probability (MAPDEP) by inplementing identical binary probabilistic quantizers across the sensors. Moreover, the paper establishes the equivalence between utilizing all quantized data and their average as input to the detector at the fusion center (FC). In particular, we derive the Kullback-Leibler (KL) divergence, which measures the difference between the true posterior probability and output of the proposed detector. Leveraging the MAPDEP and KL divergence as loss functions, the paper proposes model-driven DL method to separately train the probability controller module in the quantizer and the detector module at the FC. Numerical results validate the convergence and effectiveness of the proposed method, which achieves near-optimal performance with reduced complexity for Gaussian hypothesis testing. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Showing 1–50 of 695 results for author: He, M