subscribe to arXiv mailings

arXiv:2407.08150 [pdf, other]

Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding

Authors: Minghui Wu, Chenxu Zhao, Anyang Su, Donglin Di, Tianyu Fu, Da An, Min He, Ya Gao, Meng Ma, Kun Yan, Ping Wang

Abstract: Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders. There is currently a lack of research in this area, and most existing benchmarks suffer from several drawbacks: 1) a limited number of modalities and answers with restrictive length; 2) the content and scenarios within… ▽ More Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders. There is currently a lack of research in this area, and most existing benchmarks suffer from several drawbacks: 1) a limited number of modalities and answers with restrictive length; 2) the content and scenarios within the videos are excessively monotonous, transmitting allegories and emotions that are overly simplistic. To bridge the gap to real-world applications, we introduce a large-scale Subjective Response Indicators for Advertisement Videos dataset, namely SRI-ADV. Specifically, we collected real changes in Electroencephalographic (EEG) and eye-tracking regions from different demographics while they viewed identical video content. Utilizing this multi-modal dataset, we developed tasks and protocols to analyze and evaluate the extent of cognitive understanding of video content among different users. Along with the dataset, we designed a Hypergraph Multi-modal Large Language Model (HMLLM) to explore the associations among different demographics, video elements, EEG, and eye-tracking indicators. HMLLM could bridge semantic gaps across rich modalities and integrate information beyond different modalities to perform logical reasoning. Extensive experimental evaluations on SRI-ADV and other additional video-based generative performance benchmarks demonstrate the effectiveness of our method. The codes and dataset will be released at https://github.com/suay1113/HMLLM. △ Less

Submitted 16 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted by ACM MULTIMEDIA 2024

arXiv:2407.06890 [pdf, ps, other]

$(ω, α, n)$-sensitivity and limit sets of zero entropy homeomorphisms on the square

Authors: Jiehua Mai, Enhui Shi, Kesong Yan, Fanping Zeng

Abstract: For a homeomorphism $f$ of a compact metric space $X$ and a positive integer $n\geq 2$, we introduce the notion of $(ω, α, n)$-sensitivity of $f$, which describes such a kind of chaos: there is some $c>0$ such that for any $x\in X$ and any open neighborhood $U$ of $x$, there are points $\{x_i\}_{i=1}^n$ and $\{y_i\}_{i=1}^n$ in $U$ such that both the collection of $ω$-limit sets $ω(x_i, f)$ and th… ▽ More For a homeomorphism $f$ of a compact metric space $X$ and a positive integer $n\geq 2$, we introduce the notion of $(ω, α, n)$-sensitivity of $f$, which describes such a kind of chaos: there is some $c>0$ such that for any $x\in X$ and any open neighborhood $U$ of $x$, there are points $\{x_i\}_{i=1}^n$ and $\{y_i\}_{i=1}^n$ in $U$ such that both the collection of $ω$-limit sets $ω(x_i, f)$ and that of the $α$-limit sets $α(y_i, f)$ are pairwise $c$-separated. Then we construct a class of homeomorphisms of the square $[-1, 1]^2$ which are $(ω, α, n)$-sensitive for any $n\geq 2$ and have zero topological entropies. To investigate further the complexity of zero entropy homeomorphisms by using limit sets, we analyze in depth the limit sets of square homeomorphisms by the boundary permeating technique. Specially, we prove that for any given set of points $Y\equiv\{y_{n1}, y_{n2}:n\in\mathbb N\}$ in $(-1, 1)^2$ which satisfies some loosely technical conditions, and for any given family of pairwise disjoint countable dense subsets $\{W_n:n\in\mathbb N\}$ of $(-1, 1)^2-Y$, there is a zero entropy homeomorphism $f$ on the square $[-1, 1]^2$ such that $ω(x, f)=\{y_{n1}\}$ and $α(x, f)=\{y_{n2}\}$ for any $n$ and any $x\in W_n$. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.03314 [pdf, other]

BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

Authors: Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

Abstract: This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimu… ▽ More This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimum elements and presents them in a graph structure. Element-wise style enables easy understanding, and structural composition liberates difficult locating. Careful prompt design births the BACON captions with the help of public-available VLMs and segmentation methods. In this way, we gather a dataset with 100K annotated images, which endow VLMs with remarkable capabilities, such as accurately generating BACON, transforming prompts into BACON format, envisioning scenarios in the style of BACONr, and dynamically modifying elements within BACON through interactive dialogue and more. Wide representative experiments, including detection, VQA, and image generation tasks, tell BACON as a lifeline to achieve previous out-of-reach tasks or excel in their current cutting-edge solutions. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2406.17243 [pdf, ps, other]

A new construction of counterexamples to the bounded orbit conjecture

Authors: Jiehua Mai, Enhui Shi, Kesong Yan, Fanping Zeng

Abstract: The bounded orbit conjecture says that every homeomorphism on the plane with each of its orbits being bounded must have a fixed point. Brouwer's translation theorem asserts that the conjecture is true for orientation preserving homeomorphisms, but Boyles' counterexample shows that it is false for the orientation reversing case. In this paper, we give a more comprehensible construction of counterex… ▽ More The bounded orbit conjecture says that every homeomorphism on the plane with each of its orbits being bounded must have a fixed point. Brouwer's translation theorem asserts that the conjecture is true for orientation preserving homeomorphisms, but Boyles' counterexample shows that it is false for the orientation reversing case. In this paper, we give a more comprehensible construction of counterexamples to the conjecture. Roughly speaking, we construct an orientation reversing homeomorphisms $f$ on the square $J^2=[-1, 1]^2$ with $ω(x, f)=\{(-1. 1), (1, 1)\}$ and $α(x, f)=\{(-1. -1), (1, -1)\}$ for each $x\in (-1, 1)^2$. Then by a semi-conjugacy defined by pushing an appropriate part of $\partial J^2$ into $(-1, 1)^2$, $f$ induces a homeomorphism on the plane, which is a counterexample. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.14604 [pdf, other]

Two-Loop Spacelike Splitting Amplitude for N=4 Super-Yang-Mills Theory

Authors: Johannes Henn, Rourou Ma, Yongqun Xu, Kai Yan, Yang Zhang, Hua Xing Zhu

Abstract: The study of collinear behavior for gauge theories in the spacelike region is of great phenomenological and theoretical importance. We analytically calculate the two-loop spacelike splitting amplitude for the full color N=4 Super-Yang-Mills theory. The result is derived by two complementary methods starting from the known amplitude: one is based on a discontinuity analysis, while the other one is… ▽ More The study of collinear behavior for gauge theories in the spacelike region is of great phenomenological and theoretical importance. We analytically calculate the two-loop spacelike splitting amplitude for the full color N=4 Super-Yang-Mills theory. The result is derived by two complementary methods starting from the known amplitude: one is based on a discontinuity analysis, while the other one is based on analytic continuation. Our result explicitly shows terms that violate naive factorization. However we show that factorization is restored at the level of color-summed unpolarized squared amplitudes at next-to-next-to-next-to leading order. We conjecture that the two-loop tripole terms in the generalized splitting amplitudes in QCD are identical to what we obtain in N=4 super Yang-Mills theory. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 6 packages, 3 figures

Report number: USTC-ICTS/PCFT-24-18

arXiv:2406.12888 [pdf, other]

A Space Group Symmetry Informed Network for O(3) Equivariant Crystal Tensor Prediction

Authors: Keqiang Yan, Alexandra Saxton, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji

Abstract: We consider the prediction of general tensor properties of crystalline materials, including dielectric, piezoelectric, and elastic tensors. A key challenge here is how to make the predictions satisfy the unique tensor equivariance to O(3) group and invariance to crystal space groups. To this end, we propose a General Materials Tensor Network (GMTNet), which is carefully designed to satisfy the req… ▽ More We consider the prediction of general tensor properties of crystalline materials, including dielectric, piezoelectric, and elastic tensors. A key challenge here is how to make the predictions satisfy the unique tensor equivariance to O(3) group and invariance to crystal space groups. To this end, we propose a General Materials Tensor Network (GMTNet), which is carefully designed to satisfy the required symmetries. To evaluate our method, we curate a dataset and establish evaluation metrics that are tailored to the intricacies of crystal tensor predictions. Experimental results show that our GMTNet not only achieves promising performance on crystal tensors of various orders but also generates predictions fully consistent with the intrinsic crystal symmetries. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS). △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: This paper has been accepted to ICML 24 as a poster. You are encouraged to cite the conference version of this paper

arXiv:2406.03933 [pdf, other]

Beyond Similarity: Personalized Federated Recommendation with Composite Aggregation

Authors: Honglei Zhang, Haoxuan Li, Jundong Chen, Sen Cui, Kunda Yan, Abudukelimu Wuerkaixi, Xin Zhou, Zhiqi Shen, Yidong Li

Abstract: Federated recommendation aims to collect global knowledge by aggregating local models from massive devices, to provide recommendations while ensuring privacy. Current methods mainly leverage aggregation functions invented by federated vision community to aggregate parameters from similar clients, e.g., clustering aggregation. Despite considerable performance, we argue that it is suboptimal to appl… ▽ More Federated recommendation aims to collect global knowledge by aggregating local models from massive devices, to provide recommendations while ensuring privacy. Current methods mainly leverage aggregation functions invented by federated vision community to aggregate parameters from similar clients, e.g., clustering aggregation. Despite considerable performance, we argue that it is suboptimal to apply them to federated recommendation directly. This is mainly reflected in the disparate model architectures. Different from structured parameters like convolutional neural networks in federated vision, federated recommender models usually distinguish itself by employing one-to-one item embedding table. Such a discrepancy induces the challenging embedding skew issue, which continually updates the trained embeddings but ignores the non-trained ones during aggregation, thus failing to predict future items accurately. To this end, we propose a personalized Federated recommendation model with Composite Aggregation (FedCA), which not only aggregates similar clients to enhance trained embeddings, but also aggregates complementary clients to update non-trained embeddings. Besides, we formulate the overall learning process into a unified optimization algorithm to jointly learn the similarity and complementarity. Extensive experiments on several real-world datasets substantiate the effectiveness of our proposed model. The source codes are available at https://github.com/hongleizhang/FedCA. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.01918 [pdf, other]

Image steganography based on generative implicit neural representation

Authors: Zhong Yangjie, Liu Jia, Ke Yan, Liu Meiqi

Abstract: In the realm of advanced steganography, the scale of the model typically correlates directly with the resolution of the fundamental grid, necessitating the training of a distinct neural network for message extraction. This paper proposes an image steganography based on generative implicit neural representation. This approach transcends the constraints of image resolution by portraying data as cont… ▽ More In the realm of advanced steganography, the scale of the model typically correlates directly with the resolution of the fundamental grid, necessitating the training of a distinct neural network for message extraction. This paper proposes an image steganography based on generative implicit neural representation. This approach transcends the constraints of image resolution by portraying data as continuous functional expressions. Notably, this method permits the utilization of a diverse array of multimedia data as cover images, thereby broadening the spectrum of potential carriers. Additionally, by fixing a neural network as the message extractor, we effectively redirect the training burden to the image itself, resulting in both a reduction in computational overhead and an enhancement in steganographic speed. This approach also circumvents potential transmission challenges associated with the message extractor. Experimental findings reveal that this methodology achieves a commendable optimization efficiency, achieving a completion time of just 3 seconds for 64x64 dimensional images, while concealing only 1 bpp of information. Furthermore, the accuracy of message extraction attains an impressive mark of 100%. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 33 pages, 15 figures and 5 tables

MSC Class: 68T07 ACM Class: E.3

arXiv:2406.00523 [pdf, other]

Stealing Trust: Unveiling Vulnerabilities in Web3 Authentication

Authors: Kailun Yan, Xiaokuan Zhang, Wenrui Diao

Abstract: As the field of Web3 continues its rapid expansion, the security of Web3 authentication, often the gateway to various Web3 applications, becomes increasingly crucial. Despite its widespread use as a login method by numerous Web3 applications, the security risks of Web3 authentication have not received much attention. This paper investigates the vulnerabilities in the Web3 authentication process an… ▽ More As the field of Web3 continues its rapid expansion, the security of Web3 authentication, often the gateway to various Web3 applications, becomes increasingly crucial. Despite its widespread use as a login method by numerous Web3 applications, the security risks of Web3 authentication have not received much attention. This paper investigates the vulnerabilities in the Web3 authentication process and proposes a new type of attack. In attacks, attackers trick users into blindly signing messages from target applications by exploiting users' inability to verify the source of messages, thereby achieving unauthorized access to the target application. We have developed Web3AuthChecker, a dynamic detection tool that interacts with Web3 authentication-related APIs to identify vulnerabilities. Our evaluation of real-world Web3 applications shows that a staggering 75.8\% (22/29) of Web3 authentication deployments are at risk of attacks. In response to this alarming situation, we implemented Web3AuthGuard on the open-source wallet MetaMask to alert users of potential attacks. Our evaluation results show that Web3AuthGuard can successfully raise alerts in 80\% of the tested Web3 authentications. We have responsibly reported our findings to vulnerable websites and have been assigned two CVE IDs. △ Less

Submitted 6 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.10561 [pdf, other]

Infrared Image Super-Resolution via Lightweight Information Split Network

Authors: Shijie Liu, Kang Yan, Feiwei Qin, Changmiao Wang, Ruiquan Ge, Kai Zhang, Jie Huang, Yong Peng, Jin Cao

Abstract: Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory… ▽ More Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory demands. These issues become particularly pronounced in the context of infrared image SR, where infrared devices often have stringent storage and computational constraints. To mitigate these challenges, we introduce a novel, efficient, and precise single infrared image SR model, termed the Lightweight Information Split Network (LISN). The LISN comprises four main components: shallow feature extraction, deep feature extraction, dense feature fusion, and high-resolution infrared image reconstruction. A key innovation within this model is the introduction of the Lightweight Information Split Block (LISB) for deep feature extraction. The LISB employs a sequential process to extract hierarchical features, which are then aggregated based on the relevance of the features under consideration. By integrating channel splitting and shift operations, the LISB successfully strikes an optimal balance between enhanced SR performance and a lightweight framework. Comprehensive experimental evaluations reveal that the proposed LISN achieves superior performance over contemporary state-of-the-art methods in terms of both SR quality and model complexity, affirming its efficacy for practical deployment in resource-constrained infrared imaging applications. △ Less

Submitted 27 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.09892 [pdf, other]

Balancing Similarity and Complementarity for Federated Learning

Authors: Kunda Yan, Sen Cui, Abudukelimu Wuerkaixi, Jingfeng Zhang, Bo Han, Gang Niu, Masashi Sugiyama, Changshui Zhang

Abstract: In mobile and IoT systems, Federated Learning (FL) is increasingly important for effectively using data while maintaining user privacy. One key challenge in FL is managing statistical heterogeneity, such as non-i.i.d. data, arising from numerous clients and diverse data sources. This requires strategic cooperation, often with clients having similar characteristics. However, we are interested in a… ▽ More In mobile and IoT systems, Federated Learning (FL) is increasingly important for effectively using data while maintaining user privacy. One key challenge in FL is managing statistical heterogeneity, such as non-i.i.d. data, arising from numerous clients and diverse data sources. This requires strategic cooperation, often with clients having similar characteristics. However, we are interested in a fundamental question: does achieving optimal cooperation necessarily entail cooperating with the most similar clients? Typically, significant model performance improvements are often realized not by partnering with the most similar models, but through leveraging complementary data. Our theoretical and empirical analyses suggest that optimal cooperation is achieved by enhancing complementarity in feature distribution while restricting the disparity in the correlation between features and targets. Accordingly, we introduce a novel framework, \texttt{FedSaC}, which balances similarity and complementarity in FL cooperation. Our framework aims to approximate an optimal cooperation network for each client by optimizing a weighted sum of model similarity and feature complementarity. The strength of \texttt{FedSaC} lies in its adaptability to various levels of data heterogeneity and multimodal scenarios. Our comprehensive unimodal and multimodal experiments demonstrate that \texttt{FedSaC} markedly surpasses other state-of-the-art FL methods. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.09164 [pdf]

Rapidly Achieving Chemical Accuracy with Quantum Computing Enforced Language Model

Authors: Honghui Shang, Xiongzhi Zeng, Ming Gong, Yangju Wu, Shaojun Guo, Haoran Qian, Chen Zha, Zhijie Fan, Kai Yan, Xiaobo Zhu, Zhenyu Li, Yi Luo, Jian-Wei Pan, Jinlong Yang

Abstract: Finding accurate ground state energy of a many-body system has been a major challenge in quantum chemistry. The integration of classic and quantum computers has shed new light on resolving this outstanding problem. Here we propose QiankunNet-VQE, a transformer based language models enforced with quantum computing to learn and generate quantum states. It has been implemented using up to 12 qubits a… ▽ More Finding accurate ground state energy of a many-body system has been a major challenge in quantum chemistry. The integration of classic and quantum computers has shed new light on resolving this outstanding problem. Here we propose QiankunNet-VQE, a transformer based language models enforced with quantum computing to learn and generate quantum states. It has been implemented using up to 12 qubits and attaining an accuracy level competitive with state-of-the-art classical methods. By leveraging both quantum and classical resources, this scheme overcomes the limitations of variational quantum eigensolver(VQE) without the need for cumbersome error mitigation. Moreover, QiankunNet-VQE provides a different route to achieve a practical quantum advantage for solving many-electron Schrödinger equation without requiring extremely precise preparation and measurement of the ground-state wavefunction on quantum computer. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.07652 [pdf, other]

G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios

Authors: Zeyu Wang, Yuanchun Shi, Yuntao Wang, Yuchen Yao, Kun Yan, Yuhan Wang, Lei Ji, Xuhai Xu, Chun Yu

Abstract: Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze -- a modality deeply linked to user intent and increasingly accessible via gaze-tracking wearables -- remains underexplored. This paper introduces a novel gaze-facilitated information querying paradigm, named G-VOILA, which synergizes users' gaze, visual fie… ▽ More Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze -- a modality deeply linked to user intent and increasingly accessible via gaze-tracking wearables -- remains underexplored. This paper introduces a novel gaze-facilitated information querying paradigm, named G-VOILA, which synergizes users' gaze, visual field, and voice-based natural language queries to facilitate a more intuitive querying process. In a user-enactment study involving 21 participants in 3 daily scenarios (p = 21, scene = 3), we revealed the ambiguity in users' query language and a gaze-voice coordination pattern in users' natural query behaviors with G-VOILA. Based on the quantitative and qualitative findings, we developed a design framework for the G-VOILA paradigm, which effectively integrates the gaze data with the in-situ querying context. Then we implemented a G-VOILA proof-of-concept using cutting-edge deep learning techniques. A follow-up user study (p = 16, scene = 2) demonstrates its effectiveness by achieving both higher objective score and subjective score, compared to a baseline without gaze data. We further conducted interviews and provided insights for future gaze-facilitated information querying systems. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 25 pages, 12 figures

arXiv:2405.03613 [pdf, other]

Dual Relation Mining Network for Zero-Shot Learning

Authors: Jinwei Han, Yingguo Gao, Zhiwen Lin, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia

Abstract: Zero-shot learning (ZSL) aims to recognize novel classes through transferring shared semantic knowledge (e.g., attributes) from seen classes to unseen classes. Recently, attention-based methods have exhibited significant progress which align visual features and attributes via a spatial attention mechanism. However, these methods only explore visual-semantic relationship in the spatial dimension, w… ▽ More Zero-shot learning (ZSL) aims to recognize novel classes through transferring shared semantic knowledge (e.g., attributes) from seen classes to unseen classes. Recently, attention-based methods have exhibited significant progress which align visual features and attributes via a spatial attention mechanism. However, these methods only explore visual-semantic relationship in the spatial dimension, which can lead to classification ambiguity when different attributes share similar attention regions, and semantic relationship between attributes is rarely discussed. To alleviate the above problems, we propose a Dual Relation Mining Network (DRMN) to enable more effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer. Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion and conducts spatial attention for visual to semantic embedding. Moreover, an attribute-guided channel attention is utilized to decouple entangled semantic features. For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images. Additionally, a global classification branch is introduced as a complement to human-defined semantic attributes, and we then combine the results with attribute-based classification. Extensive experiments demonstrate that the proposed DRMN leads to new state-of-the-art performances on three standard ZSL benchmarks, i.e., CUB, SUN, and AwA2. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.00874 [pdf]

Artificial intelligence for context-aware visual change detection in software test automation

Authors: Milad Moradi, Ke Yan, David Colwell, Rhona Asgari

Abstract: Automated software testing is integral to the software development process, streamlining workflows and ensuring product reliability. Visual testing within this context, especially concerning user interface (UI) and user experience (UX) validation, stands as one of crucial determinants of overall software quality. Nevertheless, conventional methods like pixel-wise comparison and region-based visual… ▽ More Automated software testing is integral to the software development process, streamlining workflows and ensuring product reliability. Visual testing within this context, especially concerning user interface (UI) and user experience (UX) validation, stands as one of crucial determinants of overall software quality. Nevertheless, conventional methods like pixel-wise comparison and region-based visual change detection fall short in capturing contextual similarities, nuanced alterations, and understanding the spatial relationships between UI elements. In this paper, we introduce a novel graph-based method for visual change detection in software test automation. Leveraging a machine learning model, our method accurately identifies UI controls from software screenshots and constructs a graph representing contextual and spatial relationships between the controls. This information is then used to find correspondence between UI controls within screenshots of different versions of a software. The resulting graph encapsulates the intricate layout of the UI and underlying contextual relations, providing a holistic and context-aware model. This model is finally used to detect and highlight visual regressions in the UI. Comprehensive experiments on different datasets showed that our change detector can accurately detect visual software changes in various simple and complex test scenarios. Moreover, it outperformed pixel-wise comparison and region-based baselines by a large margin in more complex testing scenarios. This work not only contributes to the advancement of visual change detection but also holds practical implications, offering a robust solution for real-world software test automation challenges, enhancing reliability, and ensuring the seamless evolution of software interfaces. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.17174 [pdf, other]

Optimizing Cycle Life Prediction of Lithium-ion Batteries via a Physics-Informed Model

Authors: Constantin-Daniel Nicolae, Sara Sameer, Nathan Sun, Karena Yan

Abstract: Accurately measuring the cycle lifetime of commercial lithium-ion batteries is crucial for performance and technology development. We introduce a novel hybrid approach combining a physics-based equation with a self-attention model to predict the cycle lifetimes of commercial lithium iron phosphate graphite cells via early-cycle data. After fitting capacity loss curves to this physics-based equatio… ▽ More Accurately measuring the cycle lifetime of commercial lithium-ion batteries is crucial for performance and technology development. We introduce a novel hybrid approach combining a physics-based equation with a self-attention model to predict the cycle lifetimes of commercial lithium iron phosphate graphite cells via early-cycle data. After fitting capacity loss curves to this physics-based equation, we then use a self-attention layer to reconstruct entire battery capacity loss curves. Our model exhibits comparable performances to existing models while predicting more information: the entire capacity loss curve instead of cycle life. This provides more robustness and interpretability: our model does not need to be retrained for a different notion of end-of-life and is backed by physical intuition. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.15275 [pdf, other]

ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

Authors: Xuanhua He, Quande Liu, Shengju Qian, Xin Wang, Tao Hu, Ke Cao, Keyu Yan, Jie Zhang

Abstract: Generating high-fidelity human video with specified identities has attracted significant attention in the content generation community. However, existing techniques struggle to strike a balance between training efficiency and identity preservation, either requiring tedious case-by-case fine-tuning or usually missing identity details in the video generation process. In this study, we present \textb… ▽ More Generating high-fidelity human video with specified identities has attracted significant attention in the content generation community. However, existing techniques struggle to strike a balance between training efficiency and identity preservation, either requiring tedious case-by-case fine-tuning or usually missing identity details in the video generation process. In this study, we present \textbf{ID-Animator}, a zero-shot human-video generation approach that can perform personalized video generation given a single reference facial image without further training. ID-Animator inherits existing diffusion-based video generation backbones with a face adapter to encode the ID-relevant embeddings from learnable facial latent queries. To facilitate the extraction of identity information in video generation, we introduce an ID-oriented dataset construction pipeline that incorporates unified human attributes and action captioning techniques from a constructed facial image pool. Based on this pipeline, a random reference training strategy is further devised to precisely capture the ID-relevant embeddings with an ID-preserving loss, thus improving the fidelity and generalization capacity of our model for ID-specific video generation. Extensive experiments demonstrate the superiority of ID-Animator to generate personalized human videos over previous models. Moreover, our method is highly compatible with popular pre-trained T2V models like animatediff and various community backbone models, showing high extendability in real-world applications for video generation where identity preservation is highly desired. Our codes and checkpoints are released at https://github.com/ID-Animator/ID-Animator. △ Less

Submitted 25 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: Project Page: https://id-animator.github.io/

arXiv:2404.15272 [pdf, other]

CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios

Authors: Jingyang Lin, Yingda Xia, Jianpeng Zhang, Ke Yan, Le Lu, Jiebo Luo, Ling Zhang

Abstract: Medical Vision-Language Pretraining (Med-VLP) establishes a connection between visual content from medical images and the relevant textual descriptions. Existing Med-VLP methods primarily focus on 2D images depicting a single body part, notably chest X-rays. In this paper, we extend the scope of Med-VLP to encompass 3D images, specifically targeting full-body scenarios, by using a multimodal datas… ▽ More Medical Vision-Language Pretraining (Med-VLP) establishes a connection between visual content from medical images and the relevant textual descriptions. Existing Med-VLP methods primarily focus on 2D images depicting a single body part, notably chest X-rays. In this paper, we extend the scope of Med-VLP to encompass 3D images, specifically targeting full-body scenarios, by using a multimodal dataset of CT images and reports. Compared with the 2D counterpart, 3D VLP is required to effectively capture essential semantics from significantly sparser representation in 3D imaging. In this paper, we introduce CT-GLIP (Grounded Language-Image Pretraining with CT scans), a novel method that constructs organ-level image-text pairs to enhance multimodal contrastive learning, aligning grounded visual features with precise diagnostic text. Additionally, we developed an abnormality dictionary to augment contrastive learning with diverse contrastive pairs. Our method, trained on a multimodal CT dataset comprising 44,011 organ-level vision-text pairs from 17,702 patients across 104 organs, demonstrates it can identify organs and abnormalities in a zero-shot manner using natural languages. The performance of CT-GLIP is validated on a separate test set of 1,130 patients, focusing on the 16 most frequent abnormalities across 7 organs. The experimental results show our model's superior performance over the standard CLIP framework across zero-shot and fine-tuning scenarios, using both CNN and ViT architectures. △ Less

Submitted 28 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: 12 pages, 5 figures, 3 tables

arXiv:2404.13642 [pdf, ps, other]

Can points of bounded orbits surround points of unbounded orbits ?

Authors: Jiehua Mai, Enhui Shi, Kesong Yan, Fanping Zeng

Abstract: We show a somewhat surprising result: if $E$ is a disk in the plane $\mathbb R^2$, then there is a homeomorphism $h:\mathbb R^2\rightarrow\mathbb R^2$ such that, for every $x\in\partial E$, the orbit $O(x, h)$ is bounded, but for every $y\in {\rm Int}(E)$, the orbit $O(y, h)$ is doubly divergent. To prove this, we define a class of homeomorphisms on the square $[-1, 1]^2$, called normally rising h… ▽ More We show a somewhat surprising result: if $E$ is a disk in the plane $\mathbb R^2$, then there is a homeomorphism $h:\mathbb R^2\rightarrow\mathbb R^2$ such that, for every $x\in\partial E$, the orbit $O(x, h)$ is bounded, but for every $y\in {\rm Int}(E)$, the orbit $O(y, h)$ is doubly divergent. To prove this, we define a class of homeomorphisms on the square $[-1, 1]^2$, called normally rising homeomorphisms, and show that a normally rising homeomorphism can have very complex $ω$-limit sets and $α$-limt sets, though the homeomorphism itself looks very simple. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 18 pages. Comments are welcome

MSC Class: 37E30

arXiv:2404.11973 [pdf]

Exploring the landscape of large language models: Foundations, techniques, and challenges

Authors: Milad Moradi, Ke Yan, David Colwell, Matthias Samwald, Rhona Asgari

Abstract: In this review paper, we delve into the realm of Large Language Models (LLMs), covering their foundational principles, diverse applications, and nuanced training processes. The article sheds light on the mechanics of in-context learning and a spectrum of fine-tuning approaches, with a special focus on methods that optimize efficiency in parameter usage. Additionally, it explores how LLMs can be mo… ▽ More In this review paper, we delve into the realm of Large Language Models (LLMs), covering their foundational principles, diverse applications, and nuanced training processes. The article sheds light on the mechanics of in-context learning and a spectrum of fine-tuning approaches, with a special focus on methods that optimize efficiency in parameter usage. Additionally, it explores how LLMs can be more closely aligned with human preferences through innovative reinforcement learning frameworks and other novel methods that incorporate human feedback. The article also examines the emerging technique of retrieval augmented generation, integrating external knowledge into LLMs. The ethical dimensions of LLM deployment are discussed, underscoring the need for mindful and responsible application. Concluding with a perspective on future research trajectories, this review offers a succinct yet comprehensive overview of the current state and emerging trends in the evolving landscape of LLMs, serving as an insightful guide for both researchers and practitioners in artificial intelligence. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.10247 [pdf, other]

Orientation Preserving Homeomorphisms of the Plane having BP-Chain Recurrent Points

Authors: Jiehua Mai, Kesong Yan, Fanping Zeng

Abstract: More than a century ago, L. E. J. Brouwer proved a famous theorem, which says that any orientation preserving homeomorphism of the plane having a periodic point must have a fixed point. In recent years, there are still some authors giving various proofs of this fixed point theorem. In \cite{Fa}, Fathi showed that the condition``having a periodic point'' in this theorem can be weakened to ``having… ▽ More More than a century ago, L. E. J. Brouwer proved a famous theorem, which says that any orientation preserving homeomorphism of the plane having a periodic point must have a fixed point. In recent years, there are still some authors giving various proofs of this fixed point theorem. In \cite{Fa}, Fathi showed that the condition``having a periodic point'' in this theorem can be weakened to ``having a non-wandering point''. In this paper, we first give a new proof of Brouwer's theorem, which is relatively more simpler and the statement is more compact. Further, we propose a notion of BP-chain recurrent points, which is a generalization of the concept of non-wandering points, and we prove that if an orientation preserving homeomorphism of the plane has a BP-chain recurrent point, then it has a fixed point. This further weakens the condition in the Brouwer's fixed point theorem on plane. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 15 pages, 2 figures

MSC Class: 37E30; 37C25; 37B20; 54H20

arXiv:2404.06244 [pdf, other]

Anchor-based Robust Finetuning of Vision-Language Models

Authors: Jinwei Han, Zhiwen Lin, Zhongyisun Sun, Yingguo Gao, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia

Abstract: We aim at finetuning a vision-language model without hurting its out-of-distribution (OOD) generalization. We address two types of OOD generalization, i.e., i) domain shift such as natural to sketch images, and ii) zero-shot capability to recognize the category that was not contained in the finetune data. Arguably, the diminished OOD generalization after finetuning stems from the excessively simpl… ▽ More We aim at finetuning a vision-language model without hurting its out-of-distribution (OOD) generalization. We address two types of OOD generalization, i.e., i) domain shift such as natural to sketch images, and ii) zero-shot capability to recognize the category that was not contained in the finetune data. Arguably, the diminished OOD generalization after finetuning stems from the excessively simplified finetuning target, which only provides the class information, such as ``a photo of a [CLASS]''. This is distinct from the process in that CLIP was pretrained, where there is abundant text supervision with rich semantic information. Therefore, we propose to compensate for the finetune process using auxiliary supervision with rich semantic information, which acts as anchors to preserve the OOD generalization. Specifically, two types of anchors are elaborated in our method, including i) text-compensated anchor which uses the images from the finetune set but enriches the text supervision from a pretrained captioner, ii) image-text-pair anchor which is retrieved from the dataset similar to pretraining data of CLIP according to the downstream task, associating with the original CLIP text with rich semantics. Those anchors are utilized as auxiliary semantic information to maintain the original feature space of CLIP, thereby preserving the OOD generalization capabilities. Comprehensive experiments demonstrate that our method achieves in-distribution performance akin to conventional finetuning while attaining new state-of-the-art results on domain shift and zero-shot learning benchmarks. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: CVPR2024

arXiv:2404.05248 [pdf, ps, other]

Some extensions of the Brouwer fixed point theorem

Authors: Jiehua Mai, Enhui Shi, Kesong Yan, Fanping Zeng

Abstract: We study the existence of fixed points for continuous maps $f$ from an $n$-ball $X$ in $\mathbb R^n$ to $\mathbb R^n$ with $n\geq 1$. We show that $f$ has a fixed point if, for some absolute retract $Y\subset\partial X$, $f(Y)\subset X$ and $\partial X-Y$ is an $(f, X)$-blockading set. For $n\geq 2$, let $D$ be an $n$-ball in $X$ and $Y$ be an $(n-1)$-ball in $\partial X$. Relying on the result ju… ▽ More We study the existence of fixed points for continuous maps $f$ from an $n$-ball $X$ in $\mathbb R^n$ to $\mathbb R^n$ with $n\geq 1$. We show that $f$ has a fixed point if, for some absolute retract $Y\subset\partial X$, $f(Y)\subset X$ and $\partial X-Y$ is an $(f, X)$-blockading set. For $n\geq 2$, let $D$ be an $n$-ball in $X$ and $Y$ be an $(n-1)$-ball in $\partial X$. Relying on the result just mentioned, we show the existence of a fixed point of $f$, if $D$ and $Y$ are well placed and behave well under $f$, and ${\rm deg}(f_D)=-{\rm deg}(f_{\partial Y})$, where $f_D=f|D: D \rightarrow \mathbb{R}^n$ and $f_{\partial Y}=f|\partial Y: \partial Y \rightarrow \partial Y$. The degree ${\rm deg}(f_D)$ of $f_D$ is explicitly defined and some elementary properties of which are investigated. These results extend the Brouwer fixed point theorem. △ Less

Submitted 8 April, 2024; originally announced April 2024.

MSC Class: 55M20; 55M25; 54H20

arXiv:2404.04878 [pdf, other]

CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data

Authors: Wei Fang, Yuxing Tang, Heng Guo, Mingze Yuan, Tony C. W. Mok, Ke Yan, Jiawen Yao, Xin Chen, Zaiyi Liu, Le Lu, Ling Zhang, Minfeng Xu

Abstract: In the realm of medical 3D data, such as CT and MRI images, prevalent anisotropic resolution is characterized by high intra-slice but diminished inter-slice resolution. The lowered resolution between adjacent slices poses challenges, hindering optimal viewing experiences and impeding the development of robust downstream analysis algorithms. Various volumetric super-resolution algorithms aim to sur… ▽ More In the realm of medical 3D data, such as CT and MRI images, prevalent anisotropic resolution is characterized by high intra-slice but diminished inter-slice resolution. The lowered resolution between adjacent slices poses challenges, hindering optimal viewing experiences and impeding the development of robust downstream analysis algorithms. Various volumetric super-resolution algorithms aim to surmount these challenges, enhancing inter-slice resolution and overall 3D medical imaging quality. However, existing approaches confront inherent challenges: 1) often tailored to specific upsampling factors, lacking flexibility for diverse clinical scenarios; 2) newly generated slices frequently suffer from over-smoothing, degrading fine details, and leading to inter-slice inconsistency. In response, this study presents CycleINR, a novel enhanced Implicit Neural Representation model for 3D medical data volumetric super-resolution. Leveraging the continuity of the learned implicit function, the CycleINR model can achieve results with arbitrary up-sampling rates, eliminating the need for separate training. Additionally, we enhance the grid sampling in CycleINR with a local attention mechanism and mitigate over-smoothing by integrating cycle-consistent loss. We introduce a new metric, Slice-wise Noise Level Inconsistency (SNLI), to quantitatively assess inter-slice noise level inconsistency. The effectiveness of our approach is demonstrated through image quality evaluations on an in-house dataset and a downstream task analysis on the Medical Segmentation Decathlon liver tumor dataset. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: CVPR accepted paper

arXiv:2404.03819 [pdf, other]

Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer

Authors: Qinji Yu, Yirui Wang, Ke Yan, Haoshen Li, Dazhou Guo, Li Zhang, Le Lu, Na Shen, Qifeng Wang, Xiaowei Ding, Xianghua Ye, Dakai Jin

Abstract: Lymph node (LN) assessment is a critical, indispensable yet very challenging task in the routine clinical workflow of radiology and oncology. Accurate LN analysis is essential for cancer diagnosis, staging, and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians under high inter-observer variations. Previou… ▽ More Lymph node (LN) assessment is a critical, indispensable yet very challenging task in the routine clinical workflow of radiology and oncology. Accurate LN analysis is essential for cancer diagnosis, staging, and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians under high inter-observer variations. Previous automatic LN detection works typically yield limited recall and high false positives (FPs) due to adjacent anatomies with similar image intensities, shapes, or textures (vessels, muscles, esophagus, etc). In this work, we propose a new LN DEtection TRansformer, named LN-DETR, to achieve more accurate performance. By enhancing the 2D backbone with a multi-scale 2.5D feature fusion to incorporate 3D context explicitly, more importantly, we make two main contributions to improve the representation quality of LN queries. 1) Considering that LN boundaries are often unclear, an IoU prediction head and a location debiased query selection are proposed to select LN queries of higher localization accuracy as the decoder query's initialization. 2) To reduce FPs, query contrastive learning is employed to explicitly reinforce LN queries towards their best-matched ground-truth queries over unmatched query predictions. Trained and tested on 3D CT scans of 1067 patients (with 10,000+ labeled LNs) via combining seven LN datasets from different body parts (neck, chest, and abdomen) and pathologies/cancers, our method significantly improves the performance of previous leading methods by > 4-5% average recall at the same FP rates in both internal and external testing. We further evaluate on the universal lesion detection task using NIH DeepLesion benchmark, and our method achieves the top performance of 88.46% averaged recall across 0.5 to 4 FPs per image, compared with other leading reported results. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Technical report

arXiv:2404.01082 [pdf, other]

The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023

Authors: Jun Lyu, Chen Qin, Shuo Wang, Fanwen Wang, Yan Li, Zi Wang, Kunyuan Guo, Cheng Ouyang, Michael Tänzer, Meng Liu, Longyu Sun, Mengting Sun, Qin Li, Zhang Shi, Sha Hua, Hao Li, Zhensen Chen, Zhenlin Zhang, Bingyu Xin, Dimitris N. Metaxas, George Yiasemis, Jonas Teuwen, Liping Zhang, Weitian Chen, Yidong Zhao , et al. (25 additional authors not shown)

Abstract: Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation p… ▽ More Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation platform hinder the development of data-driven reconstruction algorithms. To address this issue, we organized the Cardiac MRI Reconstruction Challenge (CMRxRecon) in 2023, in collaboration with the 26th International Conference on MICCAI. CMRxRecon presented an extensive k-space dataset comprising cine and mapping raw data, accompanied by detailed annotations of cardiac anatomical structures. With overwhelming participation, the challenge attracted more than 285 teams and over 600 participants. Among them, 22 teams successfully submitted Docker containers for the testing phase, with 7 teams submitted for both cine and mapping tasks. All teams use deep learning based approaches, indicating that deep learning has predominately become a promising solution for the problem. The first-place winner of both tasks utilizes the E2E-VarNet architecture as backbones. In contrast, U-Net is still the most popular backbone for both multi-coil and single-coil reconstructions. This paper provides a comprehensive overview of the challenge design, presents a summary of the submitted results, reviews the employed methods, and offers an in-depth discussion that aims to inspire future advancements in cardiac MRI reconstruction models. The summary emphasizes the effective strategies observed in Cardiac MRI reconstruction, including backbone architecture, loss function, pre-processing techniques, physical modeling, and model complexity, thereby providing valuable insights for further developments in this field. △ Less

Submitted 16 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: 25 pages, 17 figures

arXiv:2403.17744 [pdf]

Facile synthesis of micro-flower NiCo2O4 assembled by nanosheets efficient for electrocatalysis of water

Authors: Yujie Wang, Yan Duan, Yuwen Chen, Man Zhang, Yuchen Wang, Bin Liu, Xiaodie Zhang, Yutong Zhang, Kai Yan

Abstract: Effective regulation of the morphology of transition metal spinel structures is crucial for creating efficient and stable bifunctional catalysts for electrocatalysis of water. In this work, micro-flower NiCo2O4 (F-NCO) assembled by nanosheets via a chemical template method for the simultaneous promotion of hydrogen evolution reaction (HER) and oxygen evolution reaction (OER). Electronic microscope… ▽ More Effective regulation of the morphology of transition metal spinel structures is crucial for creating efficient and stable bifunctional catalysts for electrocatalysis of water. In this work, micro-flower NiCo2O4 (F-NCO) assembled by nanosheets via a chemical template method for the simultaneous promotion of hydrogen evolution reaction (HER) and oxygen evolution reaction (OER). Electronic microscope analysis revealed that the thickness of the F-NCO catalyst was only 2.7% of that of the NiCo2O4 bulk (B-NCO), and this ultrathin lamellar structure was conducive to further exposure of the active site and improved reaction kinetics. The F-NCO catalyst exhibited superior HER and OER performance (10 = 236 and 310 mV) and robust long-term stability over the B-NCO catalyst in 1.0 M KOH, with a 2.68-fold and 4.16-fold increase in active surface area and a 0.42-fold and 0.61-fold decrease in charge transfer resistance values, respectively. This micro-flower-structured electrode has remarkable electrocatalytic property and long-term durability, providing a novel insight for characterizing cost-effective and high-performance bifunctional electrocatalysts. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.17465 [pdf, other]

LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection

Authors: Yunpeng Luo, Junlong Du, Ke Yan, Shouhong Ding

Abstract: The evolution of Diffusion Models has dramatically improved image generation quality, making it increasingly difficult to differentiate between real and generated images. This development, while impressive, also raises significant privacy and security concerns. In response to this, we propose a novel Latent REconstruction error guided feature REfinement method (LaRE^2) for detecting the diffusion-… ▽ More The evolution of Diffusion Models has dramatically improved image generation quality, making it increasingly difficult to differentiate between real and generated images. This development, while impressive, also raises significant privacy and security concerns. In response to this, we propose a novel Latent REconstruction error guided feature REfinement method (LaRE^2) for detecting the diffusion-generated images. We come up with the Latent Reconstruction Error (LaRE), the first reconstruction-error based feature in the latent space for generated image detection. LaRE surpasses existing methods in terms of feature extraction efficiency while preserving crucial cues required to differentiate between the real and the fake. To exploit LaRE, we propose an Error-Guided feature REfinement module (EGRE), which can refine the image feature guided by LaRE to enhance the discriminativeness of the feature. Our EGRE utilizes an align-then-refine mechanism, which effectively refines the image feature for generated-image detection from both spatial and channel perspectives. Extensive experiments on the large-scale GenImage benchmark demonstrate the superiority of our LaRE^2, which surpasses the best SoTA method by up to 11.9%/12.1% average ACC/AP across 8 different image generators. LaRE also surpasses existing methods in terms of feature extraction cost, delivering an impressive speed enhancement of 8 times. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2403.17316 [pdf]

Recent advances on the spherical metal oxides for sustainable degradation of antibiotics

Authors: Ke Zhu, Xin Li, Yuwen Chen, Yizhe Huang, Zhiyu Yang, Guoqing Guan, Kai Yan

Abstract: Due to the permanent harm to human health and ecosystem balance, antibiotic pollution in water has become an important direction of current environmental governance. Spherical metal oxides (SMOs) have been frequently utilized as effective heterogeneous photocatalysts for the efficient degradation of antibiotics due to the unique properties (e.g., strong light absorption ability, high separation ef… ▽ More Due to the permanent harm to human health and ecosystem balance, antibiotic pollution in water has become an important direction of current environmental governance. Spherical metal oxides (SMOs) have been frequently utilized as effective heterogeneous photocatalysts for the efficient degradation of antibiotics due to the unique properties (e.g., strong light absorption ability, high separation efficiency of photo-generated electron hole pairs, and good catalytic activity). This review will firstly focus on summarizing the rational design and synthesis of SMOs with various tuned microstructures such as hollow, porous shell, yolk shell, core shell, and nanoflowers. These structures can expose more active sites, achieve a higher utilization rate of light, enhance the mass transfer efficiency and improve the effective diffusion of reactive oxygen species (ROS). Secondly, this review will mainly analyze the intrinsic relationship between the structure of SMOs and its photocatalytic property, the ability to generate ROS, and the degradation pathway for antibiotics. Moreover, the photocatalytic mechanisms and recent progress of different SMOs catalysts for degrading typical antibiotics are compared in detail. Finally, challenges and prospects of future direction in the development of SMOs for antibiotic degradation are reviewed. It is expected to provide a rational design of SMOs catalysts for efficient photocatalytic degradation of environmental pollutants. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.17038 [pdf]

One-step architecture of bifunctional petal-like oxygen-deficient NiAl-LDHs nanosheets for high-performance hybrid supercapacitors and urea oxidation

Authors: Yuchen Wang, Yaoyu Liu, Man Zhang, Biying Liu, Zhiyue Zhao, Kai Yan

Abstract: Nickel-based layered double hydroxides (LDHs) are promising electrode materials in the fields of energy storage (supercapacitors) and conversion (urea oxidation). The rational construction of atomic and electronic structure is crucial for nickel-based LDHs to realize their satisfactory electrochemical performance. Herein, we report a facile, ecofriendly, one-step synthesis process to construct pet… ▽ More Nickel-based layered double hydroxides (LDHs) are promising electrode materials in the fields of energy storage (supercapacitors) and conversion (urea oxidation). The rational construction of atomic and electronic structure is crucial for nickel-based LDHs to realize their satisfactory electrochemical performance. Herein, we report a facile, ecofriendly, one-step synthesis process to construct petal-like oxygen-deficient NiAl-LDH nanosheets for hybrid super-capacitors (HSCs) and urea oxidation reaction (UOR). The asprepared NiAl-LDH nanosheets with rich oxygen vacancies possess a large specific surface area of 216.6 m2 g-1 and a desirable electronic conductivity of 3.45 * 10-4 S cm-1 to deliver an ultra-high specific capacitance of 2801 F g-1 (700 C g-1) at 1 A g-1. Furthermore, high specific energy of 50.0 W h kg-1 at 400 W kg-1 and excellent cycle stability with 91% capacitance retention after 10,000 cycles are achieved by the NiAl-LDHs/CFP (carbon fiber paper) (+)//YP-80F (a commercial activated carbon) (-) HSC. Besides, NiAl-LDH nanosheets also work as an efficient electrocatalyst for UOR, which only requires 1.42 V vs. reversible hydrogen electrode to drive 10 mA cm-2 in 1 mol L-1 KOH with 0.33 mol L-1 urea. This remarkable performance is superior to most reported values of previous candidates owing to the thin structure of NiAl-LDH nanosheets for exposing more active sites and abundant oxygen vacancies. In addition, various reaction parameters are investigated to optimize the electrochemical performance. In general, this work paves a new way for the architecture of multifunctional nanostructured energy materials. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16751 [pdf]

Recent advances on CO2-assisted synthesis of metal nanoparticles for the upgrading of biomass-derived compounds

Authors: Zhiwei Jiang, Yongjian Zeng, Ruichao Guo, Lu Lin, Rafael Luque, Kai Yan

Abstract: Nanostructured catalysts have attracted the increased attention for biomass conversion into high-valued chemicals due to the rapid depletion of fossil resources and increasingly severe environmental issues. Supercritical carbon dioxide (scCO2) fluid is an attractive medium for synthesizing nanostructured materials due to its favorable properties. In this review, the properties of scCO2 and the rol… ▽ More Nanostructured catalysts have attracted the increased attention for biomass conversion into high-valued chemicals due to the rapid depletion of fossil resources and increasingly severe environmental issues. Supercritical carbon dioxide (scCO2) fluid is an attractive medium for synthesizing nanostructured materials due to its favorable properties. In this review, the properties of scCO2 and the roles of scCO2 in the fabrication of metal nanoparticles were assessed in detailed. A general overview of the synthesis of different types of metal nanoparticles (including metal oxide nanoparticles) using scCO2 and the relationship between the structure of the obtained metal nanoparticles and the preparation conditions such as reaction temperature and pressure, types of metal precursors, and deposition time are system summarized and compared in tables. Besides, compared to the meatal catalysts using the conventional methods, the catalysts obtained using scCO2 exhibited excellent catalytic performance on biomass conversion reactions, mainly focused on oxidation, hydrogenation reactions. Finally, opportunities and challenges of metal nanoparticle preparation using scCO2 for biomass valorization to chemicals and liquid fuels are highlighted. This review could be helpful for the rational design of more efficient metal catalysts for the selective synthesis of fine chemicals and fuels from biomass-derived chemicals. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 66 pages, 21 figures

arXiv:2403.16733 [pdf]

doi 10.1016/j.apsusc.2022.154997

Direct activation of PMS by highly dispersed amorphous CoOx clusters in anatase TiO2 nanosheets for efficient oxidation of biomass-derived alcohols

Authors: Zhiwei Jiang, Zhiyue Zhao, Xin Li, Huaiguang Li, Hector F. Garces, Mahmoud Amer, Kai Yan

Abstract: Developing a green and cost-effective catalytic system for the selective oxidation of biomass-derived alcohols is vital for the sustainable synthesis of fine chemicals. Herein, highly dispersed subnanometric amorphous CoOx clusters in anatase TiO2 nanosheets (Co-TiO2) fabricated by green solvent CO2 assisted approach could directly activate peroxymonosulfate (PMS) for the highly selective oxidatio… ▽ More Developing a green and cost-effective catalytic system for the selective oxidation of biomass-derived alcohols is vital for the sustainable synthesis of fine chemicals. Herein, highly dispersed subnanometric amorphous CoOx clusters in anatase TiO2 nanosheets (Co-TiO2) fabricated by green solvent CO2 assisted approach could directly activate peroxymonosulfate (PMS) for the highly selective oxidation of various biomass-derived alcohols. Advanced characterizations (e.g., EXAFS, EPR, AC HAADF-STEM) reveal that a strong interaction of CoOx clusters and the anatase TiO2 support exist in Co-TiO2 and Co atom in Co-TiO2 is mainly consisted of Co2+ and Co3+. The Co-TiO2 catalyst offers superior catalytic performance in the conversion of six types of alcohols (e.g., benzyl alcohol (BAL), 5-hydroxymethylfurfural (HMF)) with high selectivity to produce corresponding aldehydes. Highly dispersed CoOx clusters and the interaction between CoOx clusters and TiO2 support contribute to the superior performance. Mechanism studies show that SO4 radicals play the dominant role in the selective oxidation of model reactant BAL and 1O2 participates in the non-radical pathway. DFT calculations are well matched with experiment and decipher that the strong interaction between CoOx clusters and TiO2 support promotes the formation of SO4 and SO5. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 29 pages, 9 figures

Journal ref: Applied Surface Science 2023

arXiv:2403.16708 [pdf]

doi 10.1016/S1872-2067(23)64418-3

Facile synthesis of CoSi alloy with rich vacancy for base- and solvent-free aerobic oxidation of aromatic alcohols

Authors: Zhiyue Zhao, Zhiwei Jiang, Yizhe Huang, Mebrouka Boubeche, Valentina G. Matveeva, Hector F. Garces, Huixia Luo, Kai Yan

Abstract: Rational design and green synthesis of low-cost and robust catalysts efficient for the selective oxidation of various alcohols are full of challenges. Herein, we report a fast and solvent-free arc-melting (AM) method to controllably synthesize semimetal CoSi alloy (abbreviated as AM-CoSi) that is efficient for the base- and solvent-free oxidation of six types of aromatic alcohols. X-ray absorption… ▽ More Rational design and green synthesis of low-cost and robust catalysts efficient for the selective oxidation of various alcohols are full of challenges. Herein, we report a fast and solvent-free arc-melting (AM) method to controllably synthesize semimetal CoSi alloy (abbreviated as AM-CoSi) that is efficient for the base- and solvent-free oxidation of six types of aromatic alcohols. X-ray absorption fine structure (XAFS), electron paramagnetic resonance (EPR), and aberration corrected high angle annular dark field scanning transmission electron microscope (AC HAADF-STEM) confirmed the successful synthesis of AM-CoSi with rich Si vacancy (Siv). The as-prepared CoSi alloy catalysts exhibit an order of magnitude activity enhancement in the oxidation of model reactant benzyl alcohol (BAL) to benzyl benzoate (BBE) compared with its mono counterparts, whereas 70 % yield of BBE which is the highest yield to date. Experimental results and DFT calculations well verify that the CoSi alloy structure improves the BAL conversion and Si vacancy mainly contributes to the generation of BBE. After that, CoSi alloy maintains high stability and a potential pathway is rationally proposed. Besides, CoSi alloy also efficiently works for the selective oxidation of various alcohols with different groups. This work demonstrates for the first time that semimetal CoSi alloy is robust for the green oxidation of various alcohols and provides a vast opportunity for reasonable design and application of other semimetal alloy catalysts. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 22 pages, 6 figures,

Journal ref: Chinese Journal of Catalysis 2023

arXiv:2403.16506 [pdf]

In situ growth of hydrophilic nickel-cobalt layered double hydroxides nanosheets on biomass waste-derived porous carbon for high-performance hybrid supercapacitors

Authors: Yuchen Wang, Yaoyu Liu, Zuo Chen, Man Zhang, Biying Liu, Zhenhao Xu, Kai Yan

Abstract: Rational design and cost-effective fabrication of layered double hydroxides (LDHs) nanosheets with extraordinary electrochemical performance is a key challenge for hybrid supercapacitors (HSCs). Herein, we report a facile in situ growth methodology to eco-friendly synthesize hydrophilic NiCo-LDHs nanosheets on biomass waste-derived porous carbon (BC) for robust high-performance HSC cathode. The in… ▽ More Rational design and cost-effective fabrication of layered double hydroxides (LDHs) nanosheets with extraordinary electrochemical performance is a key challenge for hybrid supercapacitors (HSCs). Herein, we report a facile in situ growth methodology to eco-friendly synthesize hydrophilic NiCo-LDHs nanosheets on biomass waste-derived porous carbon (BC) for robust high-performance HSC cathode. The in situ growth process under ultrasonication realizes the rational arrangement of NiCo-LDHs nanosheets on the surface of BC, which effectively increases the specific surface area, promotes the electronic conductivity and enhances the wettability of NiCo-LDHs nanosheets without affecting their thickness values. With the beneficial effects of ultrathin thickness of LDHs nanosheets (6.20 nm), large specific surface area (2324.1 m2 g-1), low charge transfer resistance (1.65 ohm), and high wettability with electrolyte (34-35 degree), the obtained Ni2Co1-LDHs/BC50 electrode possesses an ultra-high specific capacitance of 2390 F g-1 (956 C g-1) at 1 A g-1, which is superior to most reported values. Furthermore, an assembled Ni2Co1-LDHs/BC50//YP-80F HSC delivers a maximum specific energy of 52.47 Wh kg-1 at 375 W kg-1, and maintains a high capacitance retention of 75.9% even after 4000 cycles. This work provides a facile approach to fabricate LDHs nanosheets based cathode materials for high-performance HSCs. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16487 [pdf]

Green fabrication of nickel-iron layered double hydroxides nanosheets efficient for the enhanced capacitive performance

Authors: Yuchen Wang, Zuo Chen, Man Zhang, Yaoyu Liu, Huixia Luo, Kai Yan

Abstract: Rational synthesis of robust layered double hydroxides (LDHs) nanosheets for high-energy supercapacitors is full of challenges. Herein, we reported an ultrasonication-assisted strategy to eco-friendly fabricate NiFe-LDHs nanosheets for the enhanced capacitive behavior. The experimental results combined with different advanced characterization tools document that the utilization of ultrasonication… ▽ More Rational synthesis of robust layered double hydroxides (LDHs) nanosheets for high-energy supercapacitors is full of challenges. Herein, we reported an ultrasonication-assisted strategy to eco-friendly fabricate NiFe-LDHs nanosheets for the enhanced capacitive behavior. The experimental results combined with different advanced characterization tools document that the utilization of ultrasonication has a profound effect on the morphology and thickness of the as-obtained NiFe-LDHs, alternatively affecting the capacitive behavior. It shows that NiFe-LDHs nanosheets prepared with 2-h ultrasonic treatments display the exceptional capacitive performance because of the synergetic effect of ultrathin thickness, large specific surface area, and high mesoporous volume. The maximum specific capacitance of Ni3Fe1-LDHs nanosheets with the thickness of 7.39 nm and the specific surface area of 77.16 m2 g-1 reached 1923 F g-1, which is competitive with most previously reported values. In addition, the maximum specific energy of the assembled NiFe-LDHs//AC asymmetric supercapacitor achieved 49.13 Wh kg-1 at 400 W kg-1. This work provides a green technology to fabricate LDHs nanosheets, and offers deep insights for understanding the relationship between the morphology/structure and capacitive behavior of LDHs nanosheets, which is helpful for achieving high-performance LDHs-based electrode materials. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16461 [pdf]

Recent Advances on Transition-Metal-Based Layered Double Hydroxides Nanosheets for Electrocatalytic Energy Conversion

Authors: Yuchen Wang, Man Zhang, Yaoyu Liu, Zhikeng Zheng, Biying Liu, Meng Chen, Guoqing Guan, Kai Yan

Abstract: Transition-metal-based layered double hydroxides (TM-LDHs) nanosheets are promising electrocatalysts in the renewable electrochemical energy conversion system, which are regarded as alternatives to noble metal-based materials. In this review, recent advances on effective and facile strategies to rationally design TM-LDHs nanosheets as electrocatalysts, such as increasing the number of active sties… ▽ More Transition-metal-based layered double hydroxides (TM-LDHs) nanosheets are promising electrocatalysts in the renewable electrochemical energy conversion system, which are regarded as alternatives to noble metal-based materials. In this review, recent advances on effective and facile strategies to rationally design TM-LDHs nanosheets as electrocatalysts, such as increasing the number of active sties, improving the utilization of active sites (atomic-scale catalysts), modulating the electron configurations, and controlling the lattice facets, are summarized and compared. Then, the utilization of these fabricated TM-LDHs nanosheets for oxygen evolution reaction, hydrogen evolution reaction, urea oxidation reaction, nitrogen reduction reaction, small molecule oxidations, and biomass derivatives upgrading is articulated through systematically discussing the corresponding fundamental design principles and reaction mechanism. Finally, the existing challenges in increasing the density of catalytically active sites and future prospects of TM-LDHs nanosheets-based electrocatalysts in each application are also commented. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16455 [pdf]

Topological iron silicide with H* intermediate modulated surface for efficient electrocatalytic hydrogenation of nitrobenzene in neutral medium

Authors: Yuchen Wang, Yaoyu Liu, Zhiyue Zhao, Zhikeng Zheng, Alina M. Balu, Rafael Luque, Kai Yan

Abstract: Electrocatalytic hydrogenation of nitrobenzene (Ph-NO2) reaction (EHNR) has been considered as a potential alternative to the traditional thermocatalytic process in the production of high-value aniline (Ph-NH2). However, due to the absence of robust catalyst and low surface H* coverage, the EHNR faces the challenges of undesired performance and indetermined mechanism. Herein, we construct a type o… ▽ More Electrocatalytic hydrogenation of nitrobenzene (Ph-NO2) reaction (EHNR) has been considered as a potential alternative to the traditional thermocatalytic process in the production of high-value aniline (Ph-NH2). However, due to the absence of robust catalyst and low surface H* coverage, the EHNR faces the challenges of undesired performance and indetermined mechanism. Herein, we construct a type of noble-metal free topological FeSi (M-FeSi) materials through a solvent-free microwave strategy for efficient EHNR in neutral medium. Impressively, benefiting from abundant active H* intermediates on the surface of M-FeSi catalyst, the topological M-FeSi catalyst exhibits 99.7% conversion of Ph-NO2 and 93.8% yield of Ph-NH2 after 200 C in neutral medium, which are superior to previous candidates and FeSi catalyst synthesized via the traditional arc-melting method under same conditions. Besides, theoretical calculations validate that high surface H* coverage over M-FeSi catalyst is conducive to switching the rate-determining step from Ph-NO2* Ph-NO* to Ph-NO* Ph-NHOH*, and thus decreasing the total energy barrier of electrocatalytic Ph-NH2 production. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16433 [pdf]

doi 10.1016/j.cattod.2023.114252

Highly dispersed Ru nanoparticles anchored on NiAl layered double oxides catalyst for selective hydrodeoxygenation of vanillin

Authors: Yongjian Zeng, Lu Lin, Di Hu, Zhiwei Jiang, Shaimaa Saeed, Ruichao Guo, Ibrahim Ashour, Kai Yan

Abstract: The hydrodeoxygenation (HDO) of lignin-derived feedstocks into value-added chemicals with high efficiency and selectivity is desirable for the utilization of biomass resource. The complex oxygen-containing groups of lignin-derived substance result in the challenge of the low selectivity toward the required product. In this work, highly dispersed Ru nanoparticles anchored on Ni3Al1 layered double o… ▽ More The hydrodeoxygenation (HDO) of lignin-derived feedstocks into value-added chemicals with high efficiency and selectivity is desirable for the utilization of biomass resource. The complex oxygen-containing groups of lignin-derived substance result in the challenge of the low selectivity toward the required product. In this work, highly dispersed Ru nanoparticles anchored on Ni3Al1 layered double oxides (LDOs) catalyst derived from NiAl layered double hydroxides (LDHs) with flower-shaped morphology was constructed by a simple deposition-reduction method. The introduction of LDHs-derived support can significantly impact the catalytic activity for the HDO of lignin-derived vanillin (VL) into 2-methoxy-4-methylphenol (MMP). The Ru/Ni3Al1-400 catalyst obtained complete conversion of VL and 94.2% yield of MMP at 130 °C in methanol solvent, much better than the catalysts without LDHs-derived support. The methanol solvent is beneficial for the conversion of reaction intermediate of vanillin alcohol (VA). Detailed characterization reveals that the existence of the enhanced metal-support interaction over Ru/Ni3Al1-400 and the easily accessible acid sites facilitate the production of MMP. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16200 [pdf]

doi 10.1016/j.cej.2023.144190

Acceleration of Fe$^{3+}$/Fe$^{2+}$ cycle in garland-like MIL-101(Fe)/MoS$_2$ nanosheets to promote peroxymonosulfate activation for sulfamethoxazole degradation

Authors: Ke Zhu, Wenlei Qin, Yaping Gan, Yizhe Huang, Zhiwei Jiang, Yuwen Chen, Xin Li, Kai Yan

Abstract: Iron-based molybdenum disulfide (Fe-MoS$_2$) has emerged as a Fenton-like catalyst for the highly efficient degradation of antibiotics, but the structure-activity relationship remains elusive. Herein, garland-like MIL-101(Fe)/MoS$_2$ nanosheets (MMS) with dual metal active sites (Fe and Mo) and rich sulfur vacancies were fabricated to directly activate peroxymonosulfate (PMS) for fast degradation… ▽ More Iron-based molybdenum disulfide (Fe-MoS$_2$) has emerged as a Fenton-like catalyst for the highly efficient degradation of antibiotics, but the structure-activity relationship remains elusive. Herein, garland-like MIL-101(Fe)/MoS$_2$ nanosheets (MMS) with dual metal active sites (Fe and Mo) and rich sulfur vacancies were fabricated to directly activate peroxymonosulfate (PMS) for fast degradation of different organic pollutants (phenols, dyes and drugs), even in real water bodies. The MMS exhibited extremely fast catalytic rate constant of 0.289 min$^{-1}$ in the degradation of sulfamethoxazole (SMX), which was about 36 and 29 times that of single MoS$_2$ (0.008 min$^{-1}$) and MIL-101(Fe) (0.01 min$^{-1}$). Moreover, MMS with good stability and reusability could reach 92% degradation of SMX after 5 cycles. Quenching experiments and electron spin resonance (ESR) tests revealed that hydroxyl radicals (.OH) and singlet oxygen ($^1$O$_2$) were the dominant reactive oxygen species (ROS) for SMX degradation. The integration of experimental works, characterization techniques and density functional theory (DFT) calculations unraveled that the formation of sulfur vacancies in MMS catalyst could expose more Mo sites, improve the charge density and boost the electron transfer, which was conducive to accelerating the Fe$^{3+}$/Fe$^{2+}$ cycle for enhancing the activation of PMS. Finally, the C-N, N-O, S-N, C-O and C-S bonds of SMX were easily attacked by ROS to generate the nontoxic intermediates in the MMS/PMS/SMX system. This study offers a new approach to designing high-performance Fe-MoS$_2$ catalysts for the removal of organic pollutants. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.15063 [pdf, other]

Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model using 3D Whole-body CT Scans

Authors: Heng Guo, Jianfeng Zhang, Jiaxing Huang, Tony C. W. Mok, Dazhou Guo, Ke Yan, Le Lu, Dakai Jin, Minfeng Xu

Abstract: Segment anything model (SAM) demonstrates strong generalization ability on natural image segmentation. However, its direct adaption in medical image segmentation tasks shows significant performance drops with inferior accuracy and unstable results. It may also requires an excessive number of prompt points to obtain a reasonable accuracy. For segmenting 3D radiological CT or MRI scans, a 2D SAM mod… ▽ More Segment anything model (SAM) demonstrates strong generalization ability on natural image segmentation. However, its direct adaption in medical image segmentation tasks shows significant performance drops with inferior accuracy and unstable results. It may also requires an excessive number of prompt points to obtain a reasonable accuracy. For segmenting 3D radiological CT or MRI scans, a 2D SAM model has to separately handle hundreds of 2D slices. Although quite a few studies explore adapting SAM into medical image volumes, the efficiency of 2D adaption methods is unsatisfactory and 3D adaptation methods only capable of segmenting specific organs/tumors. In this work, we propose a comprehensive and scalable 3D SAM model for whole-body CT segmentation, named CT-SAM3D. Instead of adapting SAM, we propose a 3D promptable segmentation model using a (nearly) fully labeled CT dataset. To train CT-SAM3D effectively, ensuring the model's accurate responses to higher-dimensional spatial prompts is crucial, and 3D patch-wise training is required due to GPU memory constraints. For this purpose, we propose two key technical developments: 1) a progressively and spatially aligned prompt encoding method to effectively encode click prompts in local 3D space; and 2) a cross-patch prompt learning scheme to capture more 3D spatial context, which is beneficial for reducing the editing workloads when interactively prompting on large organs. CT-SAM3D is trained and validated using a curated dataset of 1204 CT scans containing 107 whole-body anatomies, reporting significantly better quantitative performance against all previous SAM-derived models by a large margin with much fewer click prompts. Our model can handle segmenting unseen organ as well. Code, data, and our 3D interactive segmentation tool with quasi-real-time responses will be made publicly available. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.11857 [pdf, other]

Complete and Efficient Graph Transformers for Crystal Material Property Prediction

Authors: Keqiang Yan, Cong Fu, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji

Abstract: Crystal structures are characterized by atomic bases within a primitive unit cell that repeats along a regular lattice throughout 3D space. The periodic and infinite nature of crystals poses unique challenges for geometric graph representation learning. Specifically, constructing graphs that effectively capture the complete geometric information of crystals and handle chiral crystals remains an un… ▽ More Crystal structures are characterized by atomic bases within a primitive unit cell that repeats along a regular lattice throughout 3D space. The periodic and infinite nature of crystals poses unique challenges for geometric graph representation learning. Specifically, constructing graphs that effectively capture the complete geometric information of crystals and handle chiral crystals remains an unsolved and challenging problem. In this paper, we introduce a novel approach that utilizes the periodic patterns of unit cells to establish the lattice-based representation for each atom, enabling efficient and expressive graph representations of crystals. Furthermore, we propose ComFormer, a SE(3) transformer designed specifically for crystalline materials. ComFormer includes two variants; namely, iComFormer that employs invariant geometric descriptors of Euclidean distances and angles, and eComFormer that utilizes equivariant vector representations. Experimental results demonstrate the state-of-the-art predictive accuracy of ComFormer variants on various tasks across three widely-used crystal benchmarks. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS). △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: This paper has been accepted by ICLR 2024

arXiv:2403.10839 [pdf]

Visible light-assisted peroxymonosulfate activation by high-purity FeS$_2$ nanoplates for dye pollutant control

Authors: Yizhe Huang, Yuwen Chen, Ke Zhu, Pengfei Li, Xu Wu, Rafael Luque, Kai Yan

Abstract: With the rapid industrial development, many dye pollutants have entered the water, along with heavy metals like As, leading to complex pollution that threatens the ecological environment and human health. Therefore, designing an effective strategy for treating complex dye wastewater is urgent. Herein, we have constructed high-purity pyrite FeS$_2$ nanoplates as bifunctional catalysts for the simul… ▽ More With the rapid industrial development, many dye pollutants have entered the water, along with heavy metals like As, leading to complex pollution that threatens the ecological environment and human health. Therefore, designing an effective strategy for treating complex dye wastewater is urgent. Herein, we have constructed high-purity pyrite FeS$_2$ nanoplates as bifunctional catalysts for the simultaneous removal of dyes and arsenite (As(III)). △ Less

Submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.10837 [pdf]

doi 10.1016/j.seppur.2023.125131

Facile synthesis of fine-grained CoFe$_2$O$_4$ anchored on porous carbon for simultaneous removal of tetracycline and arsenite

Authors: Yuwen Chen, Ke Zhu, Yizhe Huang, Xin Li, Zhikeng Zheng, Zhiwei Jiang, Di Hu, Ping Fang, Kai Yan

Abstract: The coexistence of tetracycline (TC) and arsenite (As(III)) in livestock wastewater threatens public health, and the heterogeneous Fenton-like system is a practical approach for the simultaneous removal of TC and As(III). In this work, fine CoFe$_2$O$_4$ nanoparticles are facilely anchored on heretically porous carbon (CoFe$_2$O$_4$@PC) via a microwave-assisted calcination method and used for elim… ▽ More The coexistence of tetracycline (TC) and arsenite (As(III)) in livestock wastewater threatens public health, and the heterogeneous Fenton-like system is a practical approach for the simultaneous removal of TC and As(III). In this work, fine CoFe$_2$O$_4$ nanoparticles are facilely anchored on heretically porous carbon (CoFe$_2$O$_4$@PC) via a microwave-assisted calcination method and used for eliminating TC and As(III) via peroxymonosulfate (PMS) activation. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Journal ref: Separation and Purification Technology 328 (2024) 125131

arXiv:2403.10835 [pdf]

doi 10.1016/j.cej.2024.150786

Enhanced electron transfer using NiCo2O4@C hollow nanocages with an electron-shuttle effect for efficient tetracycline degradation

Authors: Yuwen Chen, Ke Zhu, Wenlei Qin, Zhiwei Jiang, Zhuofeng Hu, Mika Sillanpää, Kai Yan

Abstract: Spinel oxides are recognized as promising Fenton-like catalysts for the degradation of antibiotics. However, the catalytic performance is restrained by the poor electron transfer rate (ETR). Herein, hollow NiCo2O4@C nanocages are rationally designed and prepared to accelerate ETR in peroxymonosulfate (PMS) activation for tetracycline (TC) degradation. Spinel oxides are recognized as promising Fenton-like catalysts for the degradation of antibiotics. However, the catalytic performance is restrained by the poor electron transfer rate (ETR). Herein, hollow NiCo2O4@C nanocages are rationally designed and prepared to accelerate ETR in peroxymonosulfate (PMS) activation for tetracycline (TC) degradation. △ Less

Submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.05146 [pdf, other]

Motion-Guided Dual-Camera Tracker for Low-Cost Skill Evaluation of Gastric Endoscopy

Authors: Yuelin Zhang, Wanquan Yan, Kim Yan, Chun Ping Lam, Yufu Qiu, Pengyu Zheng, Raymond Shing-Yan Tang, Shing Shin Cheng

Abstract: Gastric simulators with objective educational feedback have been proven useful for endoscopy training. Existing electronic simulators with feedback are however not commonly adopted due to their high cost. In this work, a motion-guided dual-camera tracker is proposed to provide reliable endoscope tip position feedback at a low cost inside a mechanical simulator for endoscopy skill evaluation, tackl… ▽ More Gastric simulators with objective educational feedback have been proven useful for endoscopy training. Existing electronic simulators with feedback are however not commonly adopted due to their high cost. In this work, a motion-guided dual-camera tracker is proposed to provide reliable endoscope tip position feedback at a low cost inside a mechanical simulator for endoscopy skill evaluation, tackling several unique challenges. To address the issue of significant appearance variation of the endoscope tip while keeping dual-camera tracking consistency, the cross-camera mutual template strategy (CMT) is proposed to introduce dynamic transient mutual templates to dual-camera tracking. To alleviate disturbance from large occlusion and distortion by the light source from the endoscope tip, the Mamba-based motion-guided prediction head (MMH) is presented to aggregate historical motion with visual tracking. It is the first application of Mamba for object tracking. The proposed tracker was evaluated on datasets captured by low-cost camera pairs during endoscopy procedures performed inside the mechanical simulator. The tracker achieves SOTA performance with robust and consistent tracking on dual cameras. Further downstream evaluation proves that the 3D tip position determined by the proposed tracker enables reliable skill differentiation. The code and dataset are available at https://github.com/PieceZhang/MotionDCTrack △ Less

Submitted 20 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.18933 [pdf, other]

Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration

Authors: Tony C. W. Mok, Zi Li, Yunhao Bai, Jianpeng Zhang, Wei Liu, Yan-Jie Zhou, Ke Yan, Dakai Jin, Yu Shi, Xiaoli Yin, Le Lu, Ling Zhang

Abstract: Establishing dense anatomical correspondence across distinct imaging modalities is a foundational yet challenging procedure for numerous medical image analysis studies and image-guided radiotherapy. Existing multi-modality image registration algorithms rely on statistical-based similarity measures or local structural image representations. However, the former is sensitive to locally varying noise,… ▽ More Establishing dense anatomical correspondence across distinct imaging modalities is a foundational yet challenging procedure for numerous medical image analysis studies and image-guided radiotherapy. Existing multi-modality image registration algorithms rely on statistical-based similarity measures or local structural image representations. However, the former is sensitive to locally varying noise, while the latter is not discriminative enough to cope with complex anatomical structures in multimodal scans, causing ambiguity in determining the anatomical correspondence across scans with different modalities. In this paper, we propose a modality-agnostic structural representation learning method, which leverages Deep Neighbourhood Self-similarity (DNS) and anatomy-aware contrastive learning to learn discriminative and contrast-invariance deep structural image representations (DSIR) without the need for anatomical delineations or pre-aligned training images. We evaluate our method on multiphase CT, abdomen MR-CT, and brain MR T1w-T2w registration. Comprehensive results demonstrate that our method is superior to the conventional local structural representation and statistical-based similarity measures in terms of discriminability and accuracy. △ Less

Submitted 31 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: Accepted by CVPR2024

arXiv:2402.16076 [pdf, ps, other]

Quasi-intermediate value theorem and outflanking arc theorem for plane maps

Authors: Jiehua Mai, Enhui Shi, Kesong Yan, Fanping Zeng

Abstract: For a disk $D$ in the plane $\mathbb R^2$ and a plane map $f$, we give several conditions on the restriction of $f$ to the boundary $\partial D$ of $D$ which imply the existence of a fixed point of $f$ in some specified domain in $D$. These conditions are similar to those appeared in the intermediate value theorem for maps on the real line. As an application of the main results, we establish a fix… ▽ More For a disk $D$ in the plane $\mathbb R^2$ and a plane map $f$, we give several conditions on the restriction of $f$ to the boundary $\partial D$ of $D$ which imply the existence of a fixed point of $f$ in some specified domain in $D$. These conditions are similar to those appeared in the intermediate value theorem for maps on the real line. As an application of the main results, we establish a fixed point theorem for plane maps having an outflanking arc, which extends the famous theorem due to Brouwer: if $f$ is an orientation-preserving homeomorphism on the plane and has a periodic point, then it has a fixed point. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.13574 [pdf, ps, other]

One-sided Drazin inverses in Banach algebras and perturbations of B-Fredholm spectra

Authors: Kai Yan

Abstract: The famous Drazin inverse and generalized Drazin inverse were introduced by Drazin in 1958 and Koliha in 1996, respectively. In the present paper, the author introduces the concepts of left and right (generalized) Drazin inverses, which are the one-sided versions of classical (generalized) Drazin inverses, in Banach algebras. Several characterizations of one-sided (generalized) Drazin invertible o… ▽ More The famous Drazin inverse and generalized Drazin inverse were introduced by Drazin in 1958 and Koliha in 1996, respectively. In the present paper, the author introduces the concepts of left and right (generalized) Drazin inverses, which are the one-sided versions of classical (generalized) Drazin inverses, in Banach algebras. Several characterizations of one-sided (generalized) Drazin invertible operators on Banach spaces are given. By utilizing the one-sided Drazin invertible spectra, the characterizations of B-Fredholm spectra for Banach space operators are obtained. These perturbational results can be regarded as generalizations of classical Fredholm theory in Banach spaces. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 28 pages, Comments welcome

MSC Class: 15A09; 47A53; 47A55

arXiv:2402.12192 [pdf, other]

Pan-Mamba: Effective pan-sharpening with State Space Model

Authors: Xuanhua He, Ke Cao, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou

Abstract: Pan-sharpening involves integrating information from low-resolution multi-spectral and high-resolution panchromatic images to generate high-resolution multi-spectral counterparts. While recent advancements in the state space model, particularly the efficient long-range dependency modeling achieved by Mamba, have revolutionized computer vision community, its untapped potential in pan-sharpening mot… ▽ More Pan-sharpening involves integrating information from low-resolution multi-spectral and high-resolution panchromatic images to generate high-resolution multi-spectral counterparts. While recent advancements in the state space model, particularly the efficient long-range dependency modeling achieved by Mamba, have revolutionized computer vision community, its untapped potential in pan-sharpening motivates our exploration. Our contribution, Pan-Mamba, represents a novel pan-sharpening network that leverages the efficiency of the Mamba model in global information modeling. In Pan-Mamba, we customize two core components: channel swapping Mamba and cross-modal Mamba, strategically designed for efficient cross-modal information exchange and fusion. The former initiates a lightweight cross-modal interaction through the exchange of partial panchromatic and multi-spectral channels, while the latter facilities the information representation capability by exploiting inherent cross-modal relationships. Through extensive experiments across diverse datasets, our proposed approach surpasses state-of-the-art methods, showcasing superior fusion results in pan-sharpening. To the best of our knowledge, this work is the first attempt in exploring the potential of the Mamba model and establishes a new frontier in the pan-sharpening techniques. The source code is available at \url{https://github.com/alexhe101/Pan-Mamba}. △ Less

Submitted 8 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.12048 [pdf, other]

Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

Authors: Didi Zhu, Zhongyi Sun, Zexi Li, Tao Shen, Ke Yan, Shouhong Ding, Kun Kuang, Chao Wu

Abstract: Catastrophic forgetting emerges as a critical challenge when fine-tuning multi-modal large language models (MLLMs), where improving performance on unseen tasks often leads to a significant performance drop on the original tasks. This paper presents a comprehensive analysis of catastrophic forgetting in MLLMs and introduces a post-training adjustment method called Model Tailor. Our method primarily… ▽ More Catastrophic forgetting emerges as a critical challenge when fine-tuning multi-modal large language models (MLLMs), where improving performance on unseen tasks often leads to a significant performance drop on the original tasks. This paper presents a comprehensive analysis of catastrophic forgetting in MLLMs and introduces a post-training adjustment method called Model Tailor. Our method primarily preserves the pre-trained parameters while replacing a small number ($\leq$ 10\%) of fine-tuned parameters, maintaining $\sim$ 99\% effectiveness on original tasks versus pre-training, and achieving $\sim$ 97\% on new tasks compared to standard fine-tuning. Specifically, we derive a sparse mask to identify the "model patch", based on a fusion strategy that integrates salience and sensitivity analysis. Subsequently, a compensation mechanism is introduced to "decorate the patch", enhancing the model's performance on both target and original tasks. Additionally, our method is adaptable to multi-task scenarios. Through extensive experiments on InstructBLIP and LLaVA-1.5 in both image captioning and visual question answering tasks, our approach demonstrates significant task adaptability while preserving inherent pre-trained capabilities. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Showing 1–50 of 255 results for author: Yan, K