-
Towards Scale-Aware Full Surround Monodepth with Transformers
Authors:
Yuchen Yang,
Xinyi Wang,
Dong Li,
Lu Tian,
Ashish Sirasao,
Xun Yang
Abstract:
Full surround monodepth (FSM) methods can learn from multiple camera views simultaneously in a self-supervised manner to predict the scale-aware depth, which is more practical for real-world applications in contrast to scale-ambiguous depth from a standalone monocular camera. In this work, we focus on enhancing the scale-awareness of FSM methods for depth estimation. To this end, we propose to imp…
▽ More
Full surround monodepth (FSM) methods can learn from multiple camera views simultaneously in a self-supervised manner to predict the scale-aware depth, which is more practical for real-world applications in contrast to scale-ambiguous depth from a standalone monocular camera. In this work, we focus on enhancing the scale-awareness of FSM methods for depth estimation. To this end, we propose to improve FSM from two perspectives: depth network structure optimization and training pipeline optimization. First, we construct a transformer-based depth network with neighbor-enhanced cross-view attention (NCA). The cross-attention modules can better aggregate the cross-view context in both global and neighboring views. Second, we formulate a transformer-based feature matching scheme with progressive training to improve the structure-from-motion (SfM) pipeline. That allows us to learn scale-awareness with sufficient matches and further facilitate network convergence by removing mismatches based on SfM loss. Experiments demonstrate that the resulting Scale-aware full surround monodepth (SA-FSM) method largely improves the scale-aware depth predictions without median-scaling at the test time, and performs favorably against the state-of-the-art FSM methods, e.g., surpassing SurroundDepth by 3.8% in terms of accuracy at delta<1.25 on the DDAD benchmark.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
"One Soy Latte for Daniel": Visual and Movement Communication of Intention from a Robot Waiter to a Group of Customers
Authors:
Seung Chan Hong,
Leimin Tian,
Akansel Cosgun,
Dana Kulić
Abstract:
Service robots are increasingly employed in the hospitality industry for delivering food orders in restaurants. However, in current practice the robot often arrives at a fixed location for each table when delivering orders to different patrons in the same dining group, thus requiring a human staff member or the customers themselves to identify and retrieve each order. This study investigates how t…
▽ More
Service robots are increasingly employed in the hospitality industry for delivering food orders in restaurants. However, in current practice the robot often arrives at a fixed location for each table when delivering orders to different patrons in the same dining group, thus requiring a human staff member or the customers themselves to identify and retrieve each order. This study investigates how to improve the robot's service behaviours to facilitate clear intention communication to a group of users, thus achieving accurate delivery and positive user experiences. Specifically, we conduct user studies (N=30) with a Temi service robot as a representative delivery robot currently adopted in restaurants. We investigated two factors in the robot's intent communication, namely visualisation and movement trajectories, and their influence on the objective and subjective interaction outcomes. A robot personalising its movement trajectory and stopping location in addition to displaying a visualisation of the order yields more accurate intent communication and successful order delivery, as well as more positive user perception towards the robot and its service. Our results also showed that individuals in a group have different interaction experiences.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
VIPS-Odom: Visual-Inertial Odometry Tightly-coupled with Parking Slots for Autonomous Parking
Authors:
Xuefeng Jiang,
Fangyuan Wang,
Rongzhang Zheng,
Han Liu,
Yixiong Huo,
Jinzhang Peng,
Lu Tian,
Emad Barsoum
Abstract:
Precise localization is of great importance for autonomous parking task since it provides service for the downstream planning and control modules, which significantly affects the system performance. For parking scenarios, dynamic lighting, sparse textures, and the instability of global positioning system (GPS) signals pose challenges for most traditional localization methods. To address these diff…
▽ More
Precise localization is of great importance for autonomous parking task since it provides service for the downstream planning and control modules, which significantly affects the system performance. For parking scenarios, dynamic lighting, sparse textures, and the instability of global positioning system (GPS) signals pose challenges for most traditional localization methods. To address these difficulties, we propose VIPS-Odom, a novel semantic visual-inertial odometry framework for underground autonomous parking, which adopts tightly-coupled optimization to fuse measurements from multi-modal sensors and solves odometry. Our VIPS-Odom integrates parking slots detected from the synthesized bird-eye-view (BEV) image with traditional feature points in the frontend, and conducts tightly-coupled optimization with joint constraints introduced by measurements from the inertial measurement unit, wheel speed sensor and parking slots in the backend. We develop a multi-object tracking framework to robustly track parking slots' states. To prove the superiority of our method, we equip an electronic vehicle with related sensors and build an experimental platform based on ROS2 system. Extensive experiments demonstrate the efficacy and advantages of our method compared with other baselines for parking scenarios.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
SRAS: Self-governed Remote Attestation Scheme for Multi-party Collaboration
Authors:
Linan Tian,
Yunke Shen,
Zhiqiang Li
Abstract:
Trusted Execution Environments (TEEs), such as Intel Software Guard Extensions (SGX), ensure the confidentiality and integrity of user applications when using cloud computing resources. However, in the multi-party cloud computing scenario, how to select a Relying Party to verify the TEE of each party and avoid leaking sensitive data to each other remains an open question. In this paper, we propose…
▽ More
Trusted Execution Environments (TEEs), such as Intel Software Guard Extensions (SGX), ensure the confidentiality and integrity of user applications when using cloud computing resources. However, in the multi-party cloud computing scenario, how to select a Relying Party to verify the TEE of each party and avoid leaking sensitive data to each other remains an open question. In this paper, we propose SRAS, an open self-governed remote attestation scheme with attestation and verification functions for verifying the trustworthiness of TEEs and computing assets, achieving decentralized unified trusted attestation and verification platform for multi-party cloud users. In SRAS, we design a Relying Party enclave, which can form a virtual verifiable network, capable of local verification on behalf of other participants relying parties without leaking sensitive data to others. We provide an open-source prototype implementation of SRAS to facilitate the adoption of this technology by cloud users or developers.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Orbital origin of magnetic moment enhancement induced by charge density wave in kagome FeGe
Authors:
Shulun Han,
Linyang Li,
Chi Sin Tang,
Qi Wang,
Lingfeng Zhang,
Caozheng Diao,
Mingwen Zhao,
Shuo Sun,
Lijun Tian,
Mark B. H. Breese,
Chuanbing Cai,
Milorad V. Milosevic,
Yanpeng Qi,
Andrew T. S. Wee,
Xinmao Yin
Abstract:
Interactions among various electronic states such as CDW, magnetism, and superconductivity are of high significance in strongly correlated systems. While significant progress has been made in understanding the relationship between CDW and superconductivity, the interplay between CDW and magnetic order remains largely elusive. Kagome lattices, which intertwine nontrivial topology, charge order, and…
▽ More
Interactions among various electronic states such as CDW, magnetism, and superconductivity are of high significance in strongly correlated systems. While significant progress has been made in understanding the relationship between CDW and superconductivity, the interplay between CDW and magnetic order remains largely elusive. Kagome lattices, which intertwine nontrivial topology, charge order, and magnetism, offer an ideal platform for such studies. The kagome magnet FeGe, hosting the unique coupling between CDW and magnetism, has recently garnered considerable attention in that respect. Here we reveal the significant role of the orbital coupling effect during the CDW phase transition, highlighting the orbital origin of the magnetic moment enhancement in FeGe. Our X ray absorption experiments and first principles calculations illuminate the temperature dependent behavior of Fe3d_Ge4p orbital hybridization and corroborate its pivotal impact on the magnetic properties of FeGe. These findings introduce an orbital dimension to the correlation between charge and magnetic degrees of freedom, advancing our understanding of the intriguing quantum phases resulting from this interplay.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Short-time large deviation of constrained random acceleration process
Authors:
Hanshuang Chen,
Lulu Tian,
Guofeng Li
Abstract:
By optimal fluctuation method, we study short-time distribution $P(\mathcal{A}=A)$ of the functionals, $\mathcal{A}=\int_{0}^{t_f} x^n(t) dt$, along constrained trajectories of random acceleration process for a given time duration $t_f$, where $n$ is a positive integer. We consider two types of constraints: one is called the total constraint, where the initial position and velocity and the final p…
▽ More
By optimal fluctuation method, we study short-time distribution $P(\mathcal{A}=A)$ of the functionals, $\mathcal{A}=\int_{0}^{t_f} x^n(t) dt$, along constrained trajectories of random acceleration process for a given time duration $t_f$, where $n$ is a positive integer. We consider two types of constraints: one is called the total constraint, where the initial position and velocity and the final position and velocity are both fixed, and the other is called the partial constraint, where the initial position and velocity, the final position are fixed, and letting the final velocity be free. Via the variation of constrained action functionals, the resulting Euler-Lagrange equations are analytically solved for $n=1$ and 2, and the optimal path, i.e., the most probable realization of the random acceleration process $x(t)$, conditioned on specified $A$ and $n$, are correspondingly obtained. For $n \geq 3$, a numerical scheme is proposed to find the optimal path. We show that, for $n=1$, $P(A)$ is a Gaussian distribution with the variance proportional to $Dt_f^5$ ($D$ is the particle velocity diffusion constant). For $n \geq 2$, $P(A)$ exhibits the non-Gaussian feature. In the small-$A$ limit, $P(A)$ show a essential singularity, $-\ln P(A) \sim A^{-3}$, and the optimal path localizes around the initial state over a long-time window, and then escapes to the final position sharply at a late time. For $A$ much larger than its typical value, there are multiple optimal paths with the same $A$ but with different actions (or probability densities). Among these degenerate paths, one with the minimum action is dominant, and the others are exponentially unlikely. All the theoretical results are validated by simulating the effective Langevin equations governing the constrained random acceleration process.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style
Authors:
Zeping Li,
Xinlong Yang,
Ziheng Gao,
Ji Liu,
Zhuang Liu,
Dong Li,
Jinzhang Peng,
Lu Tian,
Emad Barsoum
Abstract:
Large Language Models (LLMs) inherently use autoregressive decoding, which lacks parallelism in inference and results in significantly slow inference speeds, especially when hardware parallel accelerators and memory bandwidth are not fully utilized. In this work, we propose Amphista, a speculative decoding algorithm that adheres to a non-autoregressive decoding paradigm. Owing to the increased par…
▽ More
Large Language Models (LLMs) inherently use autoregressive decoding, which lacks parallelism in inference and results in significantly slow inference speeds, especially when hardware parallel accelerators and memory bandwidth are not fully utilized. In this work, we propose Amphista, a speculative decoding algorithm that adheres to a non-autoregressive decoding paradigm. Owing to the increased parallelism, our method demonstrates higher efficiency in inference compared to autoregressive methods. Specifically, Amphista models an Auto-embedding Block capable of parallel inference, incorporating bi-directional attention to enable interaction between different drafting heads. Additionally, Amphista implements Staged Adaptation Layers to facilitate the transition of semantic information from the base model's autoregressive inference to the drafting heads' non-autoregressive speculation, thereby achieving paradigm transformation and feature fusion. We conduct a series of experiments on a suite of Vicuna models using MT-Bench and Spec-Bench. For the Vicuna 33B model, Amphista achieves up to 2.75$\times$ and 1.40$\times$ wall-clock acceleration compared to vanilla autoregressive decoding and Medusa, respectively, while preserving lossless generation quality.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Macroscopic Tunneling Probe of Moiré Spin Textures in Twisted CrI$_3$
Authors:
Bowen Yang,
Tarun Patel,
Meixin Cheng,
Kostyantyn Pichugin,
Lin Tian,
Nachiket Sherlekar,
Shaohua Yan,
Yang Fu,
Shangjie Tian,
Hechang Lei,
Michael E. Reimer,
Junichi Okamoto,
Adam W. Tsen
Abstract:
Various noncollinear spin textures and magnetic phases have been predicted in twisted two-dimensional CrI$_3$ due to competing ferromagnetic (FM) and antiferromagnetic (AFM) interlayer exchange from moiré stacking - with potential spintronic applications even when the underlying material possesses a negligible Dzyaloshinskii-Moriya or dipole-dipole interaction. Recent measurements have shown evide…
▽ More
Various noncollinear spin textures and magnetic phases have been predicted in twisted two-dimensional CrI$_3$ due to competing ferromagnetic (FM) and antiferromagnetic (AFM) interlayer exchange from moiré stacking - with potential spintronic applications even when the underlying material possesses a negligible Dzyaloshinskii-Moriya or dipole-dipole interaction. Recent measurements have shown evidence of coexisting FM and AFM layer order in small-twist-angle CrI$_3$ bilayers and double bilayers. Yet, the nature of the magnetic textures remains unresolved and possibilities for their manipulation and electrical readout are unexplored. Here, we use tunneling magnetoresistance to investigate the collective spin states of twisted double-bilayer CrI$_3$ under both out-of-plane and in-plane magnetic fields together with detailed micromagnetic simulations of domain dynamics based on magnetic circular dichroism. Our results capture hysteretic and anisotropic field evolutions of the magnetic states and we further uncover two distinct non-volatile spin textures (out-of-plane and in-plane domains) at $\approx$ 1° twist angle, with a different global tunneling resistance that can be switched by magnetic field.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
TernaryLLM: Ternarized Large Language Model
Authors:
Tianqi Chen,
Zhe Li,
Weixiang Xu,
Zeyu Zhu,
Dong Li,
Lu Tian,
Emad Barsoum,
Peisong Wang,
Jian Cheng
Abstract:
Large language models (LLMs) have achieved remarkable performance on Natural Language Processing (NLP) tasks, but they are hindered by high computational costs and memory requirements. Ternarization, an extreme form of quantization, offers a solution by reducing memory usage and enabling energy-efficient floating-point additions. However, applying ternarization to LLMs faces challenges stemming fr…
▽ More
Large language models (LLMs) have achieved remarkable performance on Natural Language Processing (NLP) tasks, but they are hindered by high computational costs and memory requirements. Ternarization, an extreme form of quantization, offers a solution by reducing memory usage and enabling energy-efficient floating-point additions. However, applying ternarization to LLMs faces challenges stemming from outliers in both weights and activations. In this work, observing asymmetric outliers and non-zero means in weights, we introduce Dual Learnable Ternarization (DLT), which enables both scales and shifts to be learnable. We also propose Outlier-Friendly Feature Knowledge Distillation (OFF) to recover the information lost in extremely low-bit quantization. The proposed OFF can incorporate semantic information and is insensitive to outliers. At the core of OFF is maximizing the mutual information between features in ternarized and floating-point models using cosine similarity. Extensive experiments demonstrate that our TernaryLLM surpasses previous low-bit quantization methods on the standard text generation and zero-shot benchmarks for different LLM families. Specifically, for one of the most powerful open-source models, LLaMA-3, our approach (W1.58A16) outperforms the previous state-of-the-art method (W2A16) by 5.8 in terms of perplexity on C4 and by 8.2% in terms of average accuracy on zero-shot tasks.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
RepoQA: Evaluating Long Context Code Understanding
Authors:
Jiawei Liu,
Jia Le Tian,
Vijay Daita,
Yuxiang Wei,
Yifeng Ding,
Yuhan Katherine Wang,
Jun Yang,
Lingming Zhang
Abstract:
Recent advances have been improving the context windows of Large Language Models (LLMs). To quantify the real long-context capabilities of LLMs, evaluators such as the popular Needle in a Haystack have been developed to test LLMs over a large chunk of raw texts. While effective, current evaluations overlook the insight of how LLMs work with long-context code, i.e., repositories. To this end, we in…
▽ More
Recent advances have been improving the context windows of Large Language Models (LLMs). To quantify the real long-context capabilities of LLMs, evaluators such as the popular Needle in a Haystack have been developed to test LLMs over a large chunk of raw texts. While effective, current evaluations overlook the insight of how LLMs work with long-context code, i.e., repositories. To this end, we initiate the RepoQA benchmark to evaluate LLMs on long-context code understanding. Traditional needle testers ask LLMs to directly retrieve the answer from the context without necessary deep understanding. In RepoQA, we built our initial task, namely Searching Needle Function (SNF), which exercises LLMs to search functions given their natural-language description, i.e., LLMs cannot find the desired function if they cannot understand the description and code. RepoQA is multilingual and comprehensive: it includes 500 code search tasks gathered from 50 popular repositories across 5 modern programming languages. By evaluating 26 general and code-specific LLMs on RepoQA, we show (i) there is still a small gap between the best open and proprietary models; (ii) different models are good at different languages; and (iii) models may understand code better without comments.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
A novel measurement method for SiPM external crosstalk probability at low temperature
Authors:
Guanda Li,
Lei Wang,
Xilei Sun,
Fang Liu,
Cong Guo,
Kangkang Zhao,
Lei Tian,
Zeyuan Yu,
Zhilong Hou,
Chi Li,
Yu Lei,
Bin Wang,
Rongbin Zhou
Abstract:
Silicon photomultipliers (SiPMs) are being considered as potential replacements for conventional photomultiplier tubes (PMTs). However, a significant disadvantage of SiPMs is crosstalk (CT), wherein photons propagate through other pixels, resulting in secondary avalanches. CT can be categorized into internal crosstalk and external crosstalk based on whether the secondary avalanche occurs within th…
▽ More
Silicon photomultipliers (SiPMs) are being considered as potential replacements for conventional photomultiplier tubes (PMTs). However, a significant disadvantage of SiPMs is crosstalk (CT), wherein photons propagate through other pixels, resulting in secondary avalanches. CT can be categorized into internal crosstalk and external crosstalk based on whether the secondary avalanche occurs within the same SiPM or a different one. Numerous methods exist for quantitatively estimating the percentage of internal crosstalk (iCT). However, external crosstalk (eCT) has not been extensively studied.
This article presents a novel measurement method for the probability of emitting an external crosstalk photon during a single pixel avalanche, using a setup involving two identical SiPMs facing each other, and without the need for complex optical designs. The entire apparatus is enclosed within a stainless steel chamber, functioning as a light-tight enclosure, and maintained at liquid nitrogen temperature. The experimental setup incorporates two Sensl J-60035 SiPM chips along with two 0.5-inch Hamamatsu Photonics (HPK) VUV4 S13370-6050CN SiPM arrays. The findings show a linear relationship between the probability of emitting an external crosstalk photon and the SiPM overvoltage for both SiPM samples. Surprisingly, this novel measurement method also rovides measurements of the SiPM photon detection efficiency (PDE) for eCT photons at low temperature.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
In vivo fundus imaging and computational refocusing with a diffuser-based fundus camera
Authors:
Corey Simmerer,
Marisa Morakis,
Lei Tian,
Lia Gomez-Perez,
T. Y. Alvin Liu,
Nicholas J. Durr
Abstract:
Access to eye care can be expanded with high-throughput, easy-to-use, and portable diagnostic tools. Phase mask encoded imaging could improve these aspects of the fundus camera by enabling computational refocusing without any moving parts. This approach circumvents the need to adjust lenses to compensate for refractive errors. We developed a computational fundus camera by introducing a holographic…
▽ More
Access to eye care can be expanded with high-throughput, easy-to-use, and portable diagnostic tools. Phase mask encoded imaging could improve these aspects of the fundus camera by enabling computational refocusing without any moving parts. This approach circumvents the need to adjust lenses to compensate for refractive errors. We developed a computational fundus camera by introducing a holographic diffuser at the plane conjugate to the ocular pupil, resulting in a laterally shift-invariant point spread function. We demonstrate computational refocusing of a model eye fundus over a large range of defocus errors. We also show computationally refocused, color, in vivo, human fundus images with a $\geq$35-degree field-of-view (FOV). This technology could eventually be combined with the wavefront-sensing capabilities of phase mask encoded imaging to create a compact ophthalmic imaging system that simultaneously captures a fundus image and performs aberrometry.
△ Less
Submitted 10 June, 2024; v1 submitted 31 May, 2024;
originally announced June 2024.
-
CARL: A Framework for Equivariant Image Registration
Authors:
Hastings Greer,
Lin Tian,
Francois-Xavier Vialard,
Roland Kwitt,
Raul San Jose Estepar,
Marc Niethammer
Abstract:
Image registration estimates spatial correspondences between a pair of images. These estimates are typically obtained via numerical optimization or regression by a deep network. A desirable property of such estimators is that a correspondence estimate (e.g., the true oracle correspondence) for an image pair is maintained under deformations of the input images. Formally, the estimator should be equ…
▽ More
Image registration estimates spatial correspondences between a pair of images. These estimates are typically obtained via numerical optimization or regression by a deep network. A desirable property of such estimators is that a correspondence estimate (e.g., the true oracle correspondence) for an image pair is maintained under deformations of the input images. Formally, the estimator should be equivariant to a desired class of image transformations. In this work, we present careful analyses of the desired equivariance properties in the context of multi-step deep registration networks. Based on these analyses we 1) introduce the notions of $[U,U]$ equivariance (network equivariance to the same deformations of the input images) and $[W,U]$ equivariance (where input images can undergo different deformations); we 2) show that in a suitable multi-step registration setup it is sufficient for overall $[W,U]$ equivariance if the first step has $[W,U]$ equivariance and all others have $[U,U]$ equivariance; we 3) show that common displacement-predicting networks only exhibit $[U,U]$ equivariance to translations instead of the more powerful $[W,U]$ equivariance; and we 4) show how to achieve multi-step $[W,U]$ equivariance via a coordinate-attention mechanism combined with displacement-predicting refinement layers (CARL). Overall, our approach obtains excellent practical registration performance on several 3D medical image registration tasks and outperforms existing unsupervised approaches for the challenging problem of abdomen registration.
△ Less
Submitted 28 May, 2024; v1 submitted 26 May, 2024;
originally announced May 2024.
-
Rethinking Overlooked Aspects in Vision-Language Models
Authors:
Yuan Liu,
Le Tian,
Xiao Zhou,
Jie Zhou
Abstract:
Recent advancements in large vision-language models (LVLMs), such as GPT4-V and LLaVA, have been substantial. LLaVA's modular architecture, in particular, offers a blend of simplicity and efficiency. Recent works mainly focus on introducing more pre-training and instruction tuning data to improve model's performance. This paper delves into the often-neglected aspects of data efficiency during pre-…
▽ More
Recent advancements in large vision-language models (LVLMs), such as GPT4-V and LLaVA, have been substantial. LLaVA's modular architecture, in particular, offers a blend of simplicity and efficiency. Recent works mainly focus on introducing more pre-training and instruction tuning data to improve model's performance. This paper delves into the often-neglected aspects of data efficiency during pre-training and the selection process for instruction tuning datasets. Our research indicates that merely increasing the size of pre-training data does not guarantee improved performance and may, in fact, lead to its degradation. Furthermore, we have established a pipeline to pinpoint the most efficient instruction tuning (SFT) dataset, implying that not all SFT data utilized in existing studies are necessary. The primary objective of this paper is not to introduce a state-of-the-art model, but rather to serve as a roadmap for future research, aiming to optimize data usage during pre-training and fine-tuning processes to enhance the performance of vision-language models.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Exciton polariton critical non-Hermitian skin effect with spin-momentum-locked gains
Authors:
Xingran Xu,
Lingyu Tian,
Zhiyuan An,
Qihua Xiong,
Sanjib Ghosh
Abstract:
The critical skin effect, an intriguing phenomenon in non-Hermitian systems, displays sensitivity to system size and manifests distinct dynamical behaviors. In this work, we propose a novel scheme to achieve the critical non-Hermitian skin effect of exciton polaritons in an elongated microcavity system. We show that by utilising longitudinal-transverse spin splitting and spin-momentum-locked gain,…
▽ More
The critical skin effect, an intriguing phenomenon in non-Hermitian systems, displays sensitivity to system size and manifests distinct dynamical behaviors. In this work, we propose a novel scheme to achieve the critical non-Hermitian skin effect of exciton polaritons in an elongated microcavity system. We show that by utilising longitudinal-transverse spin splitting and spin-momentum-locked gain, a critical non-Hermitian skin effect can be achieved in a continuous system without the need of an underlying lattice. We find that a phase transition can be induced by changing the cavity detuning with respect to the exciton energy. We identify a measurable order parameter associated with this phase transition and demonstrate the corresponding critical behavior. Our work offers a flexible approach to manipulate non-Hermitian phases of exciton polaritons, thereby expanding the potential applications of polaritonic devices.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Historically Relevant Event Structuring for Temporal Knowledge Graph Reasoning
Authors:
Jinchuan Zhang,
Bei Hui,
Chong Mu,
Ming Sun,
Ling Tian
Abstract:
Temporal Knowledge Graph (TKG) reasoning focuses on predicting events through historical information within snapshots distributed on a timeline. Existing studies mainly concentrate on two perspectives of leveraging the history of TKGs, including capturing evolution of each recent snapshot or correlations among global historical facts. Despite the achieved significant accomplishments, these models…
▽ More
Temporal Knowledge Graph (TKG) reasoning focuses on predicting events through historical information within snapshots distributed on a timeline. Existing studies mainly concentrate on two perspectives of leveraging the history of TKGs, including capturing evolution of each recent snapshot or correlations among global historical facts. Despite the achieved significant accomplishments, these models still fall short of (1) investigating the influences of multi-granularity interactions across recent snapshots and (2) harnessing the expressive semantics of significant links accorded with queries throughout the entire history, especially events exerting a profound impact on the future. These inadequacies restrict representation ability to reflect historical dependencies and future trends thoroughly. To overcome these drawbacks, we propose an innovative TKG reasoning approach towards \textbf{His}torically \textbf{R}elevant \textbf{E}vents \textbf{S}tructuring ($\mathsf{HisRES}$). Concretely, $\mathsf{HisRES}$ comprises two distinctive modules excelling in structuring historically relevant events within TKGs, including a multi-granularity evolutionary encoder that captures structural and temporal dependencies of the most recent snapshots, and a global relevance encoder that concentrates on crucial correlations among events relevant to queries from the entire history. Furthermore, $\mathsf{HisRES}$ incorporates a self-gating mechanism for adaptively merging multi-granularity recent and historically relevant structuring representations. Extensive experiments on four event-based benchmarks demonstrate the state-of-the-art performance of $\mathsf{HisRES}$ and indicate the superiority and effectiveness of structuring historical relevance for TKG reasoning.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Analysis of Near-Field Effects, Spatial Non-Stationary Characteristics Based on 11-15 GHz Channel Measurement in Indoor Scenario
Authors:
Haiyang Miao,
Pan Tang,
Weirang Zuo,
Qi Wei,
Lei Tian,
Jianhua Zhang
Abstract:
In the sixth-generation (6G), with the further expansion of array element number and frequency bands, the wireless communications are expected to operate in the near-field region. The near-field radio communications (NFRC) will become crucial in 6G communication systems. The new mid-band (6-24 GHz) is the 6G potential candidate spectrum. In this paper, we will investigate the channel measurements…
▽ More
In the sixth-generation (6G), with the further expansion of array element number and frequency bands, the wireless communications are expected to operate in the near-field region. The near-field radio communications (NFRC) will become crucial in 6G communication systems. The new mid-band (6-24 GHz) is the 6G potential candidate spectrum. In this paper, we will investigate the channel measurements and characteristics for the emerging NFRC. First, the near-field spherical-wave signal model is derived in detail, and the stationary interval (SI) division method is discussed based on the channel statistical properties. Then, the influence of line-of-sight (LOS) and obstructed-LOS (OLOS) environments on the near-field effects and spatial non-stationary (SnS) characteristic are explored based on the near-field channel measurements at 11-15 GHz band. We hope that this work will give some reference to the NFRC research.
△ Less
Submitted 19 April, 2024;
originally announced May 2024.
-
Entanglement Entropy, Phase Transition, and Island Rule for Reissner-Nordström-AdS Black Holes
Authors:
Shu-Yi Lin,
Ming-Hui Yu,
Xian-Hui Ge,
Li-Jun Tian
Abstract:
This study focuses on the examination of the island rule within the context of four-dimensional Reissner-Nordström-AdS (4D RN-AdS) black holes, illuminating the intricate relationship between the entanglement entropy and phase transitions of black holes. The entanglement entropy of 4D RN-AdS black holes follows the anticipated linear growth pattern before ultimately declining to a constant value,…
▽ More
This study focuses on the examination of the island rule within the context of four-dimensional Reissner-Nordström-AdS (4D RN-AdS) black holes, illuminating the intricate relationship between the entanglement entropy and phase transitions of black holes. The entanglement entropy of 4D RN-AdS black holes follows the anticipated linear growth pattern before ultimately declining to a constant value, in accordance with the well-established Page curve. The novelty of this study lies in the examination of the influence, previously unexplored, of the first-order phase transition on the shape and evolution of the Page curve in situations involving both eternal and evaporating black holes. Despite the morphological alterations of the curve induced by the transition, the inherent unitarity of the system persists. As the evaporation progresses, the Page curve displays diverse configurations, unveiling phenomena that are novel and defies traditional expectations, thereby enriching our comprehension of the thermodynamics of black holes interlinked with quantum information.
△ Less
Submitted 27 May, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
The Extension dimension of syzygy module categories
Authors:
Junling Zheng,
Lulu Tian,
Qianyu Shu,
Jinbi Zhang
Abstract:
In this paper, our primary focus is on investigating the extension dimensions of syzygy module categories associated with Artin algebras, particularly under various equivalences. We demonstrate that, for sufficiently large $i$, the $i$-th syzygy module categories of derived equivalent algebras exhibit identical extension dimensions. Furthermore, we establish that the extension dimension of the…
▽ More
In this paper, our primary focus is on investigating the extension dimensions of syzygy module categories associated with Artin algebras, particularly under various equivalences. We demonstrate that, for sufficiently large $i$, the $i$-th syzygy module categories of derived equivalent algebras exhibit identical extension dimensions. Furthermore, we establish that the extension dimension of the $i$-th syzygy module category is an invariant under both stable equivalence and separable equivalence for each nonnegative integer $i$.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Empirical Studies of Propagation Characteristics and Modeling Based on XL-MIMO Channel Measurement: From Far-Field to Near-Field
Authors:
Haiyang Miao,
Jianhua Zhang,
Pan Tang,
Lei Tian,
Weirang Zuo,
Qi Wei,
Guangyi Liu
Abstract:
In the sixth-generation (6G), the extremely large-scale multiple-input-multiple-output (XL-MIMO) is considered a promising enabling technology. With the further expansion of array element number and frequency bands, near-field effects will be more likely to occur in 6G communication systems. The near-field radio communications (NFRC) will become crucial in 6G communication systems. It is known tha…
▽ More
In the sixth-generation (6G), the extremely large-scale multiple-input-multiple-output (XL-MIMO) is considered a promising enabling technology. With the further expansion of array element number and frequency bands, near-field effects will be more likely to occur in 6G communication systems. The near-field radio communications (NFRC) will become crucial in 6G communication systems. It is known that the channel research is very important for the development and performance evaluation of the communication systems. In this paper, we will systematically investigate the channel measurements and modeling for the emerging NFRC. First, the principle design of massive MIMO channel measurement platform are solved. Second, an indoor XL-MIMO channel measurement campaign with 1600 array elements is conducted, and the channel characteristics are extracted and validated in the near-field region. Then, the outdoor XL-MIMO channel measurement campaign with 320 array elements is conducted, and the channel characteristics are extracted and modeled from near-field to far-field (NF-FF) region. The spatial non-stationary characteristics of angular spread at the transmitting end are more important in modeling. We hope that this work will give some reference to the near-field and far-field research for 6G.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
ESPM-D: Efficient Sparse Polynomial Multiplication for Dilithium on ARM Cortex-M4 and Apple M2
Authors:
Jieyu Zheng,
Hong Zhang,
Le Tian,
Zhuo Zhang,
Hanyu Wei,
Zhiwei Chu,
Yafang Yang,
Yunlei Zhao
Abstract:
Dilithium is a lattice-based digital signature scheme standardized by the NIST post-quantum cryptography (PQC) project. In this study, we focus on developing efficient sparse polynomial multiplication implementations of Dilithium for ARM Cortex-M4 and Apple M2, which are both based on the ARM architecture. The ARM Cortex-M4 is commonly utilized in resource-constrained devices such as sensors. Conv…
▽ More
Dilithium is a lattice-based digital signature scheme standardized by the NIST post-quantum cryptography (PQC) project. In this study, we focus on developing efficient sparse polynomial multiplication implementations of Dilithium for ARM Cortex-M4 and Apple M2, which are both based on the ARM architecture. The ARM Cortex-M4 is commonly utilized in resource-constrained devices such as sensors. Conversely, the Apple M2 is typically found on mobile devices, emphasizing high performance and versatility. Accordingly, our optimization strategies differ between ARM Cortex-M4 and Apple M2. We prioritize optimizing stack usage for the former while enhancing computational efficiency for the latter. Our optimized sparse polynomial multiplication achieves significant speedups of up to 30% on ARM Cortex-M4 and 55% on Apple M2 compared to the state-of-the-art Number-Theoretic Transform (NTT) implementation. Additionally, we integrate the sparse polynomial multiplication with the infinity norm judgments in the Dilithium signing process, further enhancing signing efficiency. Our optimized implementation not only reduces stack usage by 10.8%, 1.2%, and 7.7% in the signing procedure of Dilithium2, Dilithium3, and Dilithium5, respectively, but also enhances signing performance by 0.4% to 0.8% compared to the state-of-the-art ARM Cortex-M4 implementation. Furthermore, we optimize polynomial sampling, rounding functions, and polynomial packing and unpacking using ARM Cortex-M4 DSP instructions, resulting in a 0.4%-3.2% improvement in key generation and verification procedures. On the MacBook Air 2022, our Dilithium implementation achieves 10% to 11% speedups in the signing procedure. To the best of our knowledge, our work sets new performance records for Dilithium on both ARM Cortex-M4 and Apple M2 platforms.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
LADDER: An Efficient Framework for Video Frame Interpolation
Authors:
Tong Shen,
Dong Li,
Ziheng Gao,
Lu Tian,
Emad Barsoum
Abstract:
Video Frame Interpolation (VFI) is a crucial technique in various applications such as slow-motion generation, frame rate conversion, video frame restoration etc. This paper introduces an efficient video frame interpolation framework that aims to strike a favorable balance between efficiency and quality. Our framework follows a general paradigm consisting of a flow estimator and a refinement modul…
▽ More
Video Frame Interpolation (VFI) is a crucial technique in various applications such as slow-motion generation, frame rate conversion, video frame restoration etc. This paper introduces an efficient video frame interpolation framework that aims to strike a favorable balance between efficiency and quality. Our framework follows a general paradigm consisting of a flow estimator and a refinement module, while incorporating carefully designed components. First of all, we adopt depth-wise convolution with large kernels in the flow estimator that simultaneously reduces the parameters and enhances the receptive field for encoding rich context and handling complex motion. Secondly, diverging from a common design for the refinement module with a UNet-structure (encoder-decoder structure), which we find redundant, our decoder-only refinement module directly enhances the result from coarse to fine features, offering a more efficient process. In addition, to address the challenge of handling high-definition frames, we also introduce an innovative HD-aware augmentation strategy during training, leading to consistent enhancement on HD images. Extensive experiments are conducted on diverse datasets, Vimeo90K, UCF101, Xiph and SNU-FILM. The results demonstrate that our approach achieves state-of-the-art performance with clear improvement while requiring much less FLOPs and parameters, reaching to a better spot for balancing efficiency and quality.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Synthesizing Realistic Data for Table Recognition
Authors:
Qiyu Hou,
Jun Wang,
Meixuan Qiao,
Lujun Tian
Abstract:
To overcome the limitations and challenges of current automatic table data annotation methods and random table data synthesis approaches, we propose a novel method for synthesizing annotation data specifically designed for table recognition. This method utilizes the structure and content of existing complex tables, facilitating the efficient creation of tables that closely replicate the authentic…
▽ More
To overcome the limitations and challenges of current automatic table data annotation methods and random table data synthesis approaches, we propose a novel method for synthesizing annotation data specifically designed for table recognition. This method utilizes the structure and content of existing complex tables, facilitating the efficient creation of tables that closely replicate the authentic styles found in the target domain. By leveraging the actual structure and content of tables from Chinese financial announcements, we have developed the first extensive table annotation dataset in this domain. We used this dataset to train several recent deep learning-based end-to-end table recognition models. Additionally, we have established the inaugural benchmark for real-world complex tables in the Chinese financial announcement domain, using it to assess the performance of models trained on our synthetic data, thereby effectively validating our method's practicality and effectiveness. Furthermore, we applied our synthesis method to augment the FinTabNet dataset, extracted from English financial announcements, by increasing the proportion of tables with multiple spanning cells to introduce greater complexity. Our experiments show that models trained on this augmented dataset achieve comprehensive improvements in performance, especially in the recognition of tables with multiple spanning cells.
△ Less
Submitted 9 July, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report
Authors:
Bin Ren,
Yawei Li,
Nancy Mehta,
Radu Timofte,
Hongyuan Yu,
Cheng Wan,
Yuxin Hong,
Bingnan Han,
Zhuoyuan Wu,
Yajun Zou,
Yuqing Liu,
Jizhe Li,
Keji He,
Chao Fan,
Heng Zhang,
Xiaolin Zhang,
Xuanwu Yin,
Kunlong Zuo,
Bohao Liao,
Peizhe Xia,
Long Peng,
Zhibo Du,
Xin Di,
Wangkai Li,
Yang Wang
, et al. (109 additional authors not shown)
Abstract:
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such…
▽ More
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/.
△ Less
Submitted 25 June, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Sparse Laneformer
Authors:
Ji Liu,
Zifeng Zhang,
Mingjie Lu,
Hongyang Wei,
Dong Li,
Yile Xie,
Jinzhang Peng,
Lu Tian,
Ashish Sirasao,
Emad Barsoum
Abstract:
Lane detection is a fundamental task in autonomous driving, and has achieved great progress as deep learning emerges. Previous anchor-based methods often design dense anchors, which highly depend on the training dataset and remain fixed during inference. We analyze that dense anchors are not necessary for lane detection, and propose a transformer-based lane detection framework based on a sparse an…
▽ More
Lane detection is a fundamental task in autonomous driving, and has achieved great progress as deep learning emerges. Previous anchor-based methods often design dense anchors, which highly depend on the training dataset and remain fixed during inference. We analyze that dense anchors are not necessary for lane detection, and propose a transformer-based lane detection framework based on a sparse anchor mechanism. To this end, we generate sparse anchors with position-aware lane queries and angle queries instead of traditional explicit anchors. We adopt Horizontal Perceptual Attention (HPA) to aggregate the lane features along the horizontal direction, and adopt Lane-Angle Cross Attention (LACA) to perform interactions between lane queries and angle queries. We also propose Lane Perceptual Attention (LPA) based on deformable cross attention to further refine the lane predictions. Our method, named Sparse Laneformer, is easy-to-implement and end-to-end trainable. Extensive experiments demonstrate that Sparse Laneformer performs favorably against the state-of-the-art methods, e.g., surpassing Laneformer by 3.0% F1 score and O2SFormer by 0.7% F1 score with fewer MACs on CULane with the same ResNet-34 backbone.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
A Novel Stratified Analysis Method for Testing and Estimating Overall Treatment Effects on Time-to-Event Outcomes Using Average Hazard with Survival Weight
Authors:
Zihan Qian,
Lu Tian,
Miki Horiguchi,
Hajime Uno
Abstract:
Given the limitations of using the Cox hazard ratio to summarize the magnitude of the treatment effect, alternative measures that do not have these limitations are gaining attention. One of the recently proposed alternative methods uses the average hazard with survival weight (AH). This population quantity can be interpreted as the average intensity of the event occurrence in a given time window t…
▽ More
Given the limitations of using the Cox hazard ratio to summarize the magnitude of the treatment effect, alternative measures that do not have these limitations are gaining attention. One of the recently proposed alternative methods uses the average hazard with survival weight (AH). This population quantity can be interpreted as the average intensity of the event occurrence in a given time window that does not involve study-specific censoring. Inference procedures for the ratio of AH and difference in AH have already been proposed in simple randomized controlled trial settings to compare two groups. However, methods with stratification factors have not been well discussed, although stratified analysis is often used in practice to adjust for confounding factors and increase the power to detect a between-group difference. The conventional stratified analysis or meta-analysis approach, which integrates stratum-specific treatment effects using an optimal weight, directly applies to the ratio of AH and difference in AH. However, this conventional approach has significant limitations similar to the Cochran-Mantel-Haenszel method for a binary outcome and the stratified Cox procedure for a time-to-event outcome. To address this, we propose a new stratified analysis method for AH using standardization. With the proposed method, one can summarize the between-group treatment effect in both absolute difference and relative terms, adjusting for stratification factors. This can be a valuable alternative to the traditional stratified Cox procedure to estimate and report the magnitude of the treatment effect on time-to-event outcomes using hazard.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Assessing Delayed Treatment Benefits of Immunotherapy Using Long-Term Average Hazard: A Novel Test/Estimation Approach
Authors:
Miki Horiguchi,
Lu Tian,
Kenneth L. Kehl,
Hajime Uno
Abstract:
Delayed treatment effects on time-to-event outcomes have often been observed in randomized controlled studies of cancer immunotherapies. In the case of delayed onset of treatment effect, the conventional test/estimation approach using the log-rank test for between-group comparison and Cox's hazard ratio to estimate the magnitude of treatment effect is not optimal, because the log-rank test is not…
▽ More
Delayed treatment effects on time-to-event outcomes have often been observed in randomized controlled studies of cancer immunotherapies. In the case of delayed onset of treatment effect, the conventional test/estimation approach using the log-rank test for between-group comparison and Cox's hazard ratio to estimate the magnitude of treatment effect is not optimal, because the log-rank test is not the most powerful option, and the interpretation of the resulting hazard ratio is not obvious. Recently, alternative test/estimation approaches were proposed to address both the power issue and the interpretation problems of the conventional approach. One is a test/estimation approach based on long-term restricted mean survival time, and the other approach is based on average hazard with survival weight. This paper integrates these two ideas and proposes a novel test/estimation approach based on long-term average hazard (LT-AH) with survival weight. Numerical studies reveal specific scenarios where the proposed LT-AH method provides a higher power than the two alternative approaches. The proposed approach has test/estimation coherency and can provide robust estimates of the magnitude of treatment effect not dependent on study-specific censoring time distribution. Also, the proposed LT-AH approach can summarize the magnitude of the treatment effect in both absolute difference and relative terms using ``hazard'' (i.e., difference in LT-AH and ratio of LT-AH), meeting guideline recommendations and practical needs. This proposed approach can be a useful alternative to the traditional hazard-based test/estimation approach when delayed onset of survival benefit is expected.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Covert Communication for Untrusted UAV-Assisted Wireless Systems
Authors:
Chan Gao,
Linying Tian,
Dong Zheng
Abstract:
Wireless systems are of paramount importance for providing ubiquitous data transmission for smart cities. However, due to the broadcasting and openness of wireless channels, such systems face potential security challenges. UAV-assisted covert communication is a supporting technology for improving covert performances and has become a hot issue in the research of wireless communication security. Thi…
▽ More
Wireless systems are of paramount importance for providing ubiquitous data transmission for smart cities. However, due to the broadcasting and openness of wireless channels, such systems face potential security challenges. UAV-assisted covert communication is a supporting technology for improving covert performances and has become a hot issue in the research of wireless communication security. This paper investigates the performance of joint covert and security communication in a tow-hop UAV-assisted wireless system, where a source transmits the covert message to a destination with the help of an untrusted UAV. We first design a transmission scheme such that use UAVs to assist in covert communications while ensuring the security of covert messages. Then, we develop a theoretical model to derive the expressions for the detection error probability of the warden and the covert and security rate, and the maximum covert and security rate is optimized by power control under a given covertness and security requirements. Finally, numerical results are provided to illustrate our theoretical analysis and the performance of covert and security communication in such systems.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Wide-Field, High-Resolution Reconstruction in Computational Multi-Aperture Miniscope Using a Fourier Neural Network
Authors:
Qianwan Yang,
Ruipeng Guo,
Guorong Hu,
Yujia Xue,
Yunzhe Li,
Lei Tian
Abstract:
Traditional fluorescence microscopy is constrained by inherent trade-offs among resolution, field-of-view, and system complexity. To navigate these challenges, we introduce a simple and low-cost computational multi-aperture miniature microscope, utilizing a microlens array for single-shot wide-field, high-resolution imaging. Addressing the challenges posed by extensive view multiplexing and non-lo…
▽ More
Traditional fluorescence microscopy is constrained by inherent trade-offs among resolution, field-of-view, and system complexity. To navigate these challenges, we introduce a simple and low-cost computational multi-aperture miniature microscope, utilizing a microlens array for single-shot wide-field, high-resolution imaging. Addressing the challenges posed by extensive view multiplexing and non-local, shift-variant aberrations in this device, we present SV-FourierNet, a novel multi-channel Fourier neural network. SV-FourierNet facilitates high-resolution image reconstruction across the entire imaging field through its learned global receptive field. We establish a close relationship between the physical spatially-varying point-spread functions and the network's learned effective receptive field. This ensures that SV-FourierNet has effectively encapsulated the spatially-varying aberrations in our system, and learned a physically meaningful function for image reconstruction. Training of SV-FourierNet is conducted entirely on a physics-based simulator. We showcase wide-field, high-resolution video reconstructions on colonies of freely moving C. elegans and imaging of a mouse brain section. Our computational multi-aperture miniature microscope, augmented with SV-FourierNet, represents a major advancement in computational microscopy and may find broad applications in biomedical research and other fields requiring compact microscopy solutions.
△ Less
Submitted 30 May, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
uniGradICON: A Foundation Model for Medical Image Registration
Authors:
Lin Tian,
Hastings Greer,
Roland Kwitt,
Francois-Xavier Vialard,
Raul San Jose Estepar,
Sylvain Bouix,
Richard Rushmore,
Marc Niethammer
Abstract:
Conventional medical image registration approaches directly optimize over the parameters of a transformation model. These approaches have been highly successful and are used generically for registrations of different anatomical regions. Recent deep registration networks are incredibly fast and accurate but are only trained for specific tasks. Hence, they are no longer generic registration approach…
▽ More
Conventional medical image registration approaches directly optimize over the parameters of a transformation model. These approaches have been highly successful and are used generically for registrations of different anatomical regions. Recent deep registration networks are incredibly fast and accurate but are only trained for specific tasks. Hence, they are no longer generic registration approaches. We therefore propose uniGradICON, a first step toward a foundation model for registration providing 1) great performance \emph{across} multiple datasets which is not feasible for current learning-based registration methods, 2) zero-shot capabilities for new registration tasks suitable for different acquisitions, anatomical regions, and modalities compared to the training dataset, and 3) a strong initialization for finetuning on out-of-distribution registration tasks. UniGradICON unifies the speed and accuracy benefits of learning-based registration algorithms with the generic applicability of conventional non-deep-learning approaches. We extensively trained and evaluated uniGradICON on twelve different public datasets. Our code and the uniGradICON model are available at https://github.com/uncbiag/uniGradICON.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
A Paradigm Shift in Catheter Development: Thermally Drawn Polymeric Fibers for MR-Guided Cardiovascular Interventions
Authors:
Mohamed E. M. K. Abdelaziz,
Libaihe Tian,
Thomas Lottner,
Simon Reiss,
Timo Heidt,
Alexander Maier,
Klaus Düring,
Constantin von zur Mühlen,
Michael Bock,
Eric Yeatman,
Guang-Zhong Yang,
Burak Temelkuran
Abstract:
Cardiovascular diseases (CVDs) and congenital heart diseases (CHD) pose significant global health challenges. Fluoroscopy-guided endovascular interventions, though effective, are accompanied by ionizing radiation concerns, especially in pediatric cases. Magnetic resonance imaging (MRI) emerges as a radiation-free alternative, offering superior soft tissue visualization and functional insights. How…
▽ More
Cardiovascular diseases (CVDs) and congenital heart diseases (CHD) pose significant global health challenges. Fluoroscopy-guided endovascular interventions, though effective, are accompanied by ionizing radiation concerns, especially in pediatric cases. Magnetic resonance imaging (MRI) emerges as a radiation-free alternative, offering superior soft tissue visualization and functional insights. However, the lack of compatible instruments remains a hurdle. We present two novel catheter systems, a tendon-driven steerable catheter and an active tracking Tiger-shaped catheter, fabricated using a unique fiber drawing technique. These catheters, showcasing mechanical properties similar to commercial counterparts, have undergone rigorous in-vitro and in-vivo testing, yielding promising outcomes. This innovative approach has the potential to streamline medical device development, thus enhancing patient care in MR-guided interventions.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Cradle: Empowering Foundation Agents Towards General Computer Control
Authors:
Weihao Tan,
Wentao Zhang,
Xinrun Xu,
Haochong Xia,
Ziluo Ding,
Boyu Li,
Bohan Zhou,
Junpeng Yue,
Jiechuan Jiang,
Yewen Li,
Ruyi An,
Molei Qin,
Chuqiao Zong,
Longtao Zheng,
Yujie Wu,
Xiaoqiang Chai,
Yifei Bi,
Tianbao Xie,
Pengjie Gu,
Xiyun Li,
Ceyao Zhang,
Long Tian,
Chaojie Wang,
Xinrun Wang,
Börje F. Karlsson
, et al. (3 additional authors not shown)
Abstract:
Despite the success in specific scenarios, existing foundation agents still struggle to generalize across various virtual scenarios, mainly due to the dramatically different encapsulations of environments with manually designed observation and action spaces. To handle this issue, we propose the General Computer Control (GCC) setting to restrict foundation agents to interact with software through t…
▽ More
Despite the success in specific scenarios, existing foundation agents still struggle to generalize across various virtual scenarios, mainly due to the dramatically different encapsulations of environments with manually designed observation and action spaces. To handle this issue, we propose the General Computer Control (GCC) setting to restrict foundation agents to interact with software through the most unified and standardized interface, i.e., using screenshots as input and keyboard and mouse actions as output. We introduce Cradle, a modular and flexible LMM-powered framework, as a preliminary attempt towards GCC. Enhanced by six key modules, Cradle can understand input screenshots and output executable code for low-level keyboard and mouse control after high-level planning, so that Cradle can interact with any software and complete long-horizon complex tasks without relying on any built-in APIs. Experimental results show that Cradle exhibits remarkable generalizability and impressive performance across four previously unexplored commercial video games, five software applications, and a comprehensive benchmark, OSWorld. Cradle is the first to enable foundation agents to follow the main storyline and complete 40-minute-long real missions in the complex AAA game Red Dead Redemption 2 (RDR2). Cradle can also create a city of a thousand people in Cities: Skylines, farm and harvest parsnips in Stardew Valley, and trade and bargain with a maximal weekly total profit of 87% in Dealer's Life 2. Cradle can not only operate daily software, like Chrome, Outlook, and Feishu, but also edit images and videos using Meitu and CapCut. Cradle greatly extends the reach of foundation agents by enabling the easy conversion of any software, especially complex games, into benchmarks to evaluate agents' various abilities and facilitate further data collection, thus paving the way for generalist agents.
△ Less
Submitted 2 July, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Improving Visual Perception of a Social Robot for Controlled and In-the-wild Human-robot Interaction
Authors:
Wangjie Zhong,
Leimin Tian,
Duy Tho Le,
Hamid Rezatofighi
Abstract:
Social robots often rely on visual perception to understand their users and the environment. Recent advancements in data-driven approaches for computer vision have demonstrated great potentials for applying deep-learning models to enhance a social robot's visual perception. However, the high computational demands of deep-learning methods, as opposed to the more resource-efficient shallow-learning…
▽ More
Social robots often rely on visual perception to understand their users and the environment. Recent advancements in data-driven approaches for computer vision have demonstrated great potentials for applying deep-learning models to enhance a social robot's visual perception. However, the high computational demands of deep-learning methods, as opposed to the more resource-efficient shallow-learning models, bring up important questions regarding their effects on real-world interaction and user experience. It is unclear how will the objective interaction performance and subjective user experience be influenced when a social robot adopts a deep-learning based visual perception model. We employed state-of-the-art human perception and tracking models to improve the visual perception function of the Pepper robot and conducted a controlled lab study and an in-the-wild human-robot interaction study to evaluate this novel perception function for following a specific user with other people present in the scene.
△ Less
Submitted 5 March, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Authors:
Linrui Tian,
Qi Wang,
Bang Zhang,
Liefeng Bo
Abstract:
In this work, we tackle the challenge of enhancing the realism and expressiveness in talking head video generation by focusing on the dynamic and nuanced relationship between audio cues and facial movements. We identify the limitations of traditional techniques that often fail to capture the full spectrum of human expressions and the uniqueness of individual facial styles. To address these issues,…
▽ More
In this work, we tackle the challenge of enhancing the realism and expressiveness in talking head video generation by focusing on the dynamic and nuanced relationship between audio cues and facial movements. We identify the limitations of traditional techniques that often fail to capture the full spectrum of human expressions and the uniqueness of individual facial styles. To address these issues, we propose EMO, a novel framework that utilizes a direct audio-to-video synthesis approach, bypassing the need for intermediate 3D models or facial landmarks. Our method ensures seamless frame transitions and consistent identity preservation throughout the video, resulting in highly expressive and lifelike animations. Experimental results demonsrate that EMO is able to produce not only convincing speaking videos but also singing videos in various styles, significantly outperforming existing state-of-the-art methodologies in terms of expressiveness and realism.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Time persistence of climate and carbon flux networks
Authors:
Ting Qing,
Fan Wang,
Qiuyue Li,
Gaogao Dong,
Lixin Tian,
Shlomo Havlin
Abstract:
The persistence of the global climate system is critical for assuring the sustainability of the natural ecosystem and the further development of the prosperity of socio-economics. In this paper, we develop a framework and analyze the time persistence of the yearly networks of climate and carbon flux, based on cross-correlations between sites, using daily data from China, the contiguous United Stat…
▽ More
The persistence of the global climate system is critical for assuring the sustainability of the natural ecosystem and the further development of the prosperity of socio-economics. In this paper, we develop a framework and analyze the time persistence of the yearly networks of climate and carbon flux, based on cross-correlations between sites, using daily data from China, the contiguous United States, and the Europe land region during 2000-2019. There are many studies on time persistence of single nodes, e.g., climate variables at a given location, however persistence at a network level has been rarely discussed. Here we develop a framework to study time persistence of network and we apply it to climate and carbon flux. Our framework for determining the persistence is based on analyzing the similarity between the network structures, i.e., the links of climate and carbon flux in different years of systems using the Jaccard index. Our Jaccard results reveal that the similarity of climate and carbon flux networks in different years are within the range of 0.51$\pm$ 0.09 (p-value<0.05), implying that the climate and carbon flux networks studied in the Earth's climate system are generally persistent and in a steady state. Our results suggest that close to 50% of the links appear regularly in different years. We find a very small decay in similarity when the gap between the years increases. However, we observe unique behavior of less similarity to other years in the carbon flux network of the Chinese region during the years 2004-2005 and 2015-2016. This seems to reflect China's carbon reduction policies in these specific years. Analyzing the persistence and evolution of the climate and carbon flux networks, enhance our understanding of the spatial and temporal evolution of the global climate system.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Unveiling the Importance of Longer Paths in Quantum Networks
Authors:
Xinqi Hu,
Gaogao Dong,
Renaud Lambiotte,
Kim Christensen,
Jingfang Fan,
Lixin Tian,
Shlomo Havlin,
Xiangyi Meng
Abstract:
The advancement of quantum communication technologies is calling for a better understanding of quantum network (QN) design from first principles, approached through network science. Pioneering studies have established a classical percolation mapping to model the task of entanglement transmission across QN. Yet, this mapping does not capture the stronger, yet not fully understood connectivity obser…
▽ More
The advancement of quantum communication technologies is calling for a better understanding of quantum network (QN) design from first principles, approached through network science. Pioneering studies have established a classical percolation mapping to model the task of entanglement transmission across QN. Yet, this mapping does not capture the stronger, yet not fully understood connectivity observed in QNs, which facilitates more efficient entanglement transmission than predicted by classical percolation. In this work, we explore the critical phenomena of the potential statistical theory underlying this enhanced connectivity, known as concurrence percolation. Compared to classical percolation, the concurrence percolation mapping employs a unique approach of "superposing" path connectivities, utilizing a different set of path connectivity rules, thereby boosting the overall network connectivity. Firstly, we analytically derive the percolation critical exponents for hierarchical, scale-free networks, particularly the UV flower model, characterized by two distinct network length scales, U$\leq$V. Our analysis confirms that classical and concurrence percolations, albeit both satisfying the hyperscaling relation, fall into separate universality classes. Most importantly, this separation stems from their different treatment of non-shortest path contributions to overall connectivity. Notably, as the longer path scale V increases, concurrence percolation retains unignorable dependence of both its critical threshold and critical exponents on V and thus, comparing with its classical counterpart, shows a higher resilience to the weakening of non-shortest paths. This higher resilience is also observed in real-world network topology, e.g., the Internet. Our findings reveal a first principle for QN design: longer paths still contribute significantly to QN connectivity -- as long as they are abundant.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Quantum Shortcut to Adiabaticity for State Preparation in a Finite-Sized Jaynes-Cummings Lattice
Authors:
Kang Cai,
Prabin Parajuli,
Anuvetha Govindarajan,
Lin Tian
Abstract:
In noisy quantum systems, achieving high-fidelity state preparation using the adiabatic approach faces a dilemma: either extending the evolution time to reduce diabatic transitions or shortening it to mitigate decoherence effects. Here, we present a quantum shortcut to adiabaticity for state preparation in a finite-sized Jaynes-Cummings lattice by applying a counter-diabatic (CD) driving along giv…
▽ More
In noisy quantum systems, achieving high-fidelity state preparation using the adiabatic approach faces a dilemma: either extending the evolution time to reduce diabatic transitions or shortening it to mitigate decoherence effects. Here, we present a quantum shortcut to adiabaticity for state preparation in a finite-sized Jaynes-Cummings lattice by applying a counter-diabatic (CD) driving along given adiabatic trajectories. Leveraging the symmetry of eigenstates in this system, we derive a simplified CD Hamiltonian that only involves local qubit-cavity couplings for a two-site lattice with one polariton excitation. Additionally, we derive the analytical form of the CD Hamiltonian for this lattice with two excitations. Our numerical results demonstrate that this scheme is robust against circuit errors and environmental noise, with characterization achievable through qubit detection. The simplified CD Hamiltonian can be implemented in physical systems with realistic parameters. This approach can lead to a promising pathway to high-fidelity state preparation within a significantly reduced timescale compared to conventional adiabatic methods.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
CMA-R:Causal Mediation Analysis for Explaining Rumour Detection
Authors:
Lin Tian,
Xiuzhen Zhang,
Jey Han Lau
Abstract:
We apply causal mediation analysis to explain the decision-making process of neural models for rumour detection on Twitter. Interventions at the input and network level reveal the causal impacts of tweets and words in the model output. We find that our approach CMA-R -- Causal Mediation Analysis for Rumour detection -- identifies salient tweets that explain model predictions and show strong agreem…
▽ More
We apply causal mediation analysis to explain the decision-making process of neural models for rumour detection on Twitter. Interventions at the input and network level reveal the causal impacts of tweets and words in the model output. We find that our approach CMA-R -- Causal Mediation Analysis for Rumour detection -- identifies salient tweets that explain model predictions and show strong agreement with human judgements for critical tweets determining the truthfulness of stories. CMA-R can further highlight causally impactful words in the salient tweets, providing another layer of interpretability and transparency into these blackbox rumour detection systems. Code is available at: https://github.com/ltian678/cma-r.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
UniVG: Towards UNIfied-modal Video Generation
Authors:
Ludan Ruan,
Lei Tian,
Chuanwei Huang,
Xu Zhang,
Xinyan Xiao
Abstract:
Diffusion based video generation has received extensive attention and achieved considerable success within both the academic and industrial communities. However, current efforts are mainly concentrated on single-objective or single-task video generation, such as generation driven by text, by image, or by a combination of text and image. This cannot fully meet the needs of real-world application sc…
▽ More
Diffusion based video generation has received extensive attention and achieved considerable success within both the academic and industrial communities. However, current efforts are mainly concentrated on single-objective or single-task video generation, such as generation driven by text, by image, or by a combination of text and image. This cannot fully meet the needs of real-world application scenarios, as users are likely to input images and text conditions in a flexible manner, either individually or in combination. To address this, we propose a Unified-modal Video Genearation system that is capable of handling multiple video generation tasks across text and image modalities. To this end, we revisit the various video generation tasks within our system from the perspective of generative freedom, and classify them into high-freedom and low-freedom video generation categories. For high-freedom video generation, we employ Multi-condition Cross Attention to generate videos that align with the semantics of the input images or text. For low-freedom video generation, we introduce Biased Gaussian Noise to replace the pure random Gaussian Noise, which helps to better preserve the content of the input conditions. Our method achieves the lowest Fréchet Video Distance (FVD) on the public academic benchmark MSR-VTT, surpasses the current open-source methods in human evaluations, and is on par with the current close-source method Gen2. For more samples, visit https://univg-baidu.github.io.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Walert: Putting Conversational Search Knowledge into Action by Building and Evaluating a Large Language Model-Powered Chatbot
Authors:
Sachin Pathiyan Cherumanal,
Lin Tian,
Futoon M. Abushaqra,
Angel Felipe Magnossao de Paula,
Kaixin Ji,
Danula Hettiachchi,
Johanne R. Trippas,
Halil Ali,
Falk Scholer,
Damiano Spina
Abstract:
Creating and deploying customized applications is crucial for operational success and enriching user experiences in the rapidly evolving modern business world. A prominent facet of modern user experiences is the integration of chatbots or voice assistants. The rapid evolution of Large Language Models (LLMs) has provided a powerful tool to build conversational applications. We present Walert, a cus…
▽ More
Creating and deploying customized applications is crucial for operational success and enriching user experiences in the rapidly evolving modern business world. A prominent facet of modern user experiences is the integration of chatbots or voice assistants. The rapid evolution of Large Language Models (LLMs) has provided a powerful tool to build conversational applications. We present Walert, a customized LLM-based conversational agent able to answer frequently asked questions about computer science degrees and programs at RMIT University. Our demo aims to showcase how conversational information-seeking researchers can effectively communicate the benefits of using best practices to stakeholders interested in developing and deploying LLM-based chatbots. These practices are well-known in our community but often overlooked by practitioners who may not have access to this knowledge. The methodology and resources used in this demo serve as a bridge to facilitate knowledge transfer from experts, address industry professionals' practical needs, and foster a collaborative environment. The data and code of the demo are available at https://github.com/rmit-ir/walert.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
Dual-View Data Hallucination with Semantic Relation Guidance for Few-Shot Image Recognition
Authors:
Hefeng Wu,
Guangzhi Ye,
Ziyang Zhou,
Ling Tian,
Qing Wang,
Liang Lin
Abstract:
Learning to recognize novel concepts from just a few image samples is very challenging as the learned model is easily overfitted on the few data and results in poor generalizability. One promising but underexplored solution is to compensate the novel classes by generating plausible samples. However, most existing works of this line exploit visual information only, rendering the generated data easy…
▽ More
Learning to recognize novel concepts from just a few image samples is very challenging as the learned model is easily overfitted on the few data and results in poor generalizability. One promising but underexplored solution is to compensate the novel classes by generating plausible samples. However, most existing works of this line exploit visual information only, rendering the generated data easy to be distracted by some challenging factors contained in the few available samples. Being aware of the semantic information in the textual modality that reflects human concepts, this work proposes a novel framework that exploits semantic relations to guide dual-view data hallucination for few-shot image recognition. The proposed framework enables generating more diverse and reasonable data samples for novel classes through effective information transfer from base classes. Specifically, an instance-view data hallucination module hallucinates each sample of a novel class to generate new data by employing local semantic correlated attention and global semantic feature fusion derived from base classes. Meanwhile, a prototype-view data hallucination module exploits semantic-aware measure to estimate the prototype of a novel class and the associated distribution from the few samples, which thereby harvests the prototype as a more stable sample and enables resampling a large number of samples. We conduct extensive experiments and comparisons with state-of-the-art methods on several popular few-shot benchmarks to verify the effectiveness of the proposed framework.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer
Authors:
Ji Liu,
Dehua Tang,
Yuanxian Huang,
Li Zhang,
Xiaocheng Zeng,
Dong Li,
Mingjie Lu,
Jinzhang Peng,
Yu Wang,
Fan Jiang,
Lu Tian,
Ashish Sirasao
Abstract:
Traditional channel-wise pruning methods by reducing network channels struggle to effectively prune efficient CNN models with depth-wise convolutional layers and certain efficient modules, such as popular inverted residual blocks. Prior depth pruning methods by reducing network depths are not suitable for pruning some efficient models due to the existence of some normalization layers. Moreover, fi…
▽ More
Traditional channel-wise pruning methods by reducing network channels struggle to effectively prune efficient CNN models with depth-wise convolutional layers and certain efficient modules, such as popular inverted residual blocks. Prior depth pruning methods by reducing network depths are not suitable for pruning some efficient models due to the existence of some normalization layers. Moreover, finetuning subnet by directly removing activation layers would corrupt the original model weights, hindering the pruned model from achieving high performance. To address these issues, we propose a novel depth pruning method for efficient models. Our approach proposes a novel block pruning strategy and progressive training method for the subnet. Additionally, we extend our pruning method to vision transformer models. Experimental results demonstrate that our method consistently outperforms existing depth pruning methods across various pruning configurations. We obtained three pruned ConvNeXtV1 models with our method applying on ConvNeXtV1, which surpass most SOTA efficient models with comparable inference performance. Our method also achieves state-of-the-art pruning performance on the vision transformer model.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced Diffusion Models
Authors:
Hanzhang Wang,
Haoran Wang,
Jinze Yang,
Zhongrui Yu,
Zeke Xie,
Lei Tian,
Xinyan Xiao,
Junjun Jiang,
Xianming Liu,
Mingming Sun
Abstract:
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video. Existing methods usually focus on pursuing the balance between style and content, whereas ignoring the significant demand for flexible and customized stylization results and thereby limiting their practical application. To address this critical issue, a novel AST approach na…
▽ More
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video. Existing methods usually focus on pursuing the balance between style and content, whereas ignoring the significant demand for flexible and customized stylization results and thereby limiting their practical application. To address this critical issue, a novel AST approach namely HiCAST is proposed, which is capable of explicitly customizing the stylization results according to various source of semantic clues. In the specific, our model is constructed based on Latent Diffusion Model (LDM) and elaborately designed to absorb content and style instance as conditions of LDM. It is characterized by introducing of \textit{Style Adapter}, which allows user to flexibly manipulate the output results by aligning multi-level style information and intrinsic knowledge in LDM. Lastly, we further extend our model to perform video AST. A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency in the premise of maintaining stylization strength. Qualitative and quantitative comparisons as well as comprehensive user studies demonstrate that our HiCAST outperforms the existing SoTA methods in generating visually plausible stylization results.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Asymptotically Optimal Sequence Sets With Low/Zero Ambiguity Zone Properties
Authors:
Liying Tian,
Xiaoshi Song,
Zilong Liu,
Yubo Li
Abstract:
Sequences with low/zero ambiguity zone (LAZ/ZAZ) properties are useful for modern wireless communication and radar systems operating in mobile environments. This paper first presents a new family of ZAZ sequence sets by generalizing an earlier construction of zero correlation zone (ZCZ) sequences arising from perfect nonlinear functions. We then introduce a second family of ZAZ sequence sets with…
▽ More
Sequences with low/zero ambiguity zone (LAZ/ZAZ) properties are useful for modern wireless communication and radar systems operating in mobile environments. This paper first presents a new family of ZAZ sequence sets by generalizing an earlier construction of zero correlation zone (ZCZ) sequences arising from perfect nonlinear functions. We then introduce a second family of ZAZ sequence sets with comb-like spectrum, whereby the local Doppler resilience is ensured by their inherent spectral nulls in the frequency-domain. Finally, LAZ sequence sets are obtained thanks to its connection with a novel class of mapping functions. These proposed unimodular ZAZ and LAZ sets are cyclically distinct and asymptotically optimal with respect to the existing theoretical bounds.
△ Less
Submitted 1 January, 2024; v1 submitted 1 January, 2024;
originally announced January 2024.
-
Learning Multi-graph Structure for Temporal Knowledge Graph Reasoning
Authors:
Jinchuan Zhang,
Bei Hui,
Chong Mu,
Ling Tian
Abstract:
Temporal Knowledge Graph (TKG) reasoning that forecasts future events based on historical snapshots distributed over timestamps is denoted as extrapolation and has gained significant attention. Owing to its extreme versatility and variation in spatial and temporal correlations, TKG reasoning presents a challenging task, demanding efficient capture of concurrent structures and evolutional interacti…
▽ More
Temporal Knowledge Graph (TKG) reasoning that forecasts future events based on historical snapshots distributed over timestamps is denoted as extrapolation and has gained significant attention. Owing to its extreme versatility and variation in spatial and temporal correlations, TKG reasoning presents a challenging task, demanding efficient capture of concurrent structures and evolutional interactions among facts. While existing methods have made strides in this direction, they still fall short of harnessing the diverse forms of intrinsic expressive semantics of TKGs, which encompass entity correlations across multiple timestamps and periodicity of temporal information. This limitation constrains their ability to thoroughly reflect historical dependencies and future trends. In response to these drawbacks, this paper proposes an innovative reasoning approach that focuses on Learning Multi-graph Structure (LMS). Concretely, it comprises three distinct modules concentrating on multiple aspects of graph structure knowledge within TKGs, including concurrent and evolutional patterns along timestamps, query-specific correlations across timestamps, and semantic dependencies of timestamps, which capture TKG features from various perspectives. Besides, LMS incorporates an adaptive gate for merging entity representations both along and across timestamps effectively. Moreover, it integrates timestamp semantics into graph attention calculations and time-aware decoders, in order to impose temporal constraints on events and narrow down prediction scopes with historical statistics. Extensive experimental results on five event-based benchmark datasets demonstrate that LMS outperforms state-of-the-art extrapolation models, indicating the superiority of modeling a multi-graph perspective for TKG reasoning.
△ Less
Submitted 26 February, 2024; v1 submitted 4 December, 2023;
originally announced December 2023.
-
SAME++: A Self-supervised Anatomical eMbeddings Enhanced medical image registration framework using stable sampling and regularized transformation
Authors:
Lin Tian,
Zi Li,
Fengze Liu,
Xiaoyu Bai,
Jia Ge,
Le Lu,
Marc Niethammer,
Xianghua Ye,
Ke Yan,
Daikai Jin
Abstract:
Image registration is a fundamental medical image analysis task. Ideally, registration should focus on aligning semantically corresponding voxels, i.e., the same anatomical locations. However, existing methods often optimize similarity measures computed directly on intensities or on hand-crafted features, which lack anatomical semantic information. These similarity measures may lead to sub-optimal…
▽ More
Image registration is a fundamental medical image analysis task. Ideally, registration should focus on aligning semantically corresponding voxels, i.e., the same anatomical locations. However, existing methods often optimize similarity measures computed directly on intensities or on hand-crafted features, which lack anatomical semantic information. These similarity measures may lead to sub-optimal solutions where large deformations, complex anatomical differences, or cross-modality imagery exist. In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration building on top of a Self-supervised Anatomical eMbedding (SAM) algorithm, which is capable of computing dense anatomical correspondences between two images at the voxel level. We name our approach SAM-Enhanced registration (SAME++), which decomposes image registration into four steps: affine transformation, coarse deformation, deep non-parametric transformation, and instance optimization. Using SAM embeddings, we enhance these steps by finding more coherent correspondence and providing features with better semantic guidance. We extensively evaluated SAME++ using more than 50 labeled organs on three challenging inter-subject registration tasks of different body parts. As a complete registration framework, SAME++ markedly outperforms leading methods by $4.2\%$ - $8.2\%$ in terms of Dice score while being orders of magnitude faster than numerical optimization-based methods. Code is available at \url{https://github.com/alibaba-damo-academy/same}.
△ Less
Submitted 25 February, 2024; v1 submitted 25 November, 2023;
originally announced November 2023.
-
The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024
Authors:
Benjamin Kiefer,
Lojze Žust,
Matej Kristan,
Janez Perš,
Matija Teršek,
Arnold Wiliem,
Martin Messmer,
Cheng-Yen Yang,
Hsiang-Wei Huang,
Zhongyu Jiang,
Heng-Cheng Kuo,
Jie Mei,
Jenq-Neng Hwang,
Daniel Stadler,
Lars Sommer,
Kaer Huang,
Aiguo Zheng,
Weitu Chong,
Kanokphan Lertniphonphan,
Jun Xie,
Feng Chen,
Jian Li,
Zhepeng Wang,
Luca Zedda,
Andrea Loddo
, et al. (24 additional authors not shown)
Abstract:
The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obst…
▽ More
The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obstacle Segmentation and Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference on real-world embedded devices. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 195 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi24.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
MixTEA: Semi-supervised Entity Alignment with Mixture Teaching
Authors:
Feng Xie,
Xin Song,
Xiang Zeng,
Xuechen Zhao,
Lei Tian,
Bin Zhou,
Yusong Tan
Abstract:
Semi-supervised entity alignment (EA) is a practical and challenging task because of the lack of adequate labeled mappings as training data. Most works address this problem by generating pseudo mappings for unlabeled entities. However, they either suffer from the erroneous (noisy) pseudo mappings or largely ignore the uncertainty of pseudo mappings. In this paper, we propose a novel semi-supervise…
▽ More
Semi-supervised entity alignment (EA) is a practical and challenging task because of the lack of adequate labeled mappings as training data. Most works address this problem by generating pseudo mappings for unlabeled entities. However, they either suffer from the erroneous (noisy) pseudo mappings or largely ignore the uncertainty of pseudo mappings. In this paper, we propose a novel semi-supervised EA method, termed as MixTEA, which guides the model learning with an end-to-end mixture teaching of manually labeled mappings and probabilistic pseudo mappings. We firstly train a student model using few labeled mappings as standard. More importantly, in pseudo mapping learning, we propose a bi-directional voting (BDV) strategy that fuses the alignment decisions in different directions to estimate the uncertainty via the joint matching confidence score. Meanwhile, we also design a matching diversity-based rectification (MDR) module to adjust the pseudo mapping learning, thus reducing the negative influence of noisy mappings. Extensive results on benchmark datasets as well as further analyses demonstrate the superiority and the effectiveness of our proposed method.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Non-Autoregressive Diffusion-based Temporal Point Processes for Continuous-Time Long-Term Event Prediction
Authors:
Wang-Tao Zhou,
Zhao Kang,
Ling Tian
Abstract:
Continuous-time long-term event prediction plays an important role in many application scenarios. Most existing works rely on autoregressive frameworks to predict event sequences, which suffer from error accumulation, thus compromising prediction quality. Inspired by the success of denoising diffusion probabilistic models, we propose a diffusion-based non-autoregressive temporal point process mode…
▽ More
Continuous-time long-term event prediction plays an important role in many application scenarios. Most existing works rely on autoregressive frameworks to predict event sequences, which suffer from error accumulation, thus compromising prediction quality. Inspired by the success of denoising diffusion probabilistic models, we propose a diffusion-based non-autoregressive temporal point process model for long-term event prediction in continuous time. Instead of generating events one at a time in an autoregressive way, our model predicts the future event sequence entirely as a whole. In order to perform diffusion processes on event sequences, we develop a bidirectional map between target event sequences and the Euclidean vector space. Furthermore, we design a novel denoising network to capture both sequential and contextual features for better sample quality. Extensive experiments are conducted to prove the superiority of our proposed model over state-of-the-art methods on long-term event prediction in continuous time. To the best of our knowledge, this is the first work to apply diffusion methods to long-term event prediction problems.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
A Robust Deep Learning Method with Uncertainty Estimation for the Pathological Classification of Renal Cell Carcinoma based on CT Images
Authors:
Ni Yao,
Hang Hu,
Kaicong Chen,
Chen Zhao,
Yuan Guo,
Boya Li,
Jiaofen Nan,
Yanting Li,
Chuang Han,
Fubao Zhu,
Weihua Zhou,
Li Tian
Abstract:
Objectives To develop and validate a deep learning-based diagnostic model incorporating uncertainty estimation so as to facilitate radiologists in the preoperative differentiation of the pathological subtypes of renal cell carcinoma (RCC) based on CT images. Methods Data from 668 consecutive patients, pathologically proven RCC, were retrospectively collected from Center 1. By using five-fold cross…
▽ More
Objectives To develop and validate a deep learning-based diagnostic model incorporating uncertainty estimation so as to facilitate radiologists in the preoperative differentiation of the pathological subtypes of renal cell carcinoma (RCC) based on CT images. Methods Data from 668 consecutive patients, pathologically proven RCC, were retrospectively collected from Center 1. By using five-fold cross-validation, a deep learning model incorporating uncertainty estimation was developed to classify RCC subtypes into clear cell RCC (ccRCC), papillary RCC (pRCC), and chromophobe RCC (chRCC). An external validation set of 78 patients from Center 2 further evaluated the model's performance. Results In the five-fold cross-validation, the model's area under the receiver operating characteristic curve (AUC) for the classification of ccRCC, pRCC, and chRCC was 0.868 (95% CI: 0.826-0.923), 0.846 (95% CI: 0.812-0.886), and 0.839 (95% CI: 0.802-0.88), respectively. In the external validation set, the AUCs were 0.856 (95% CI: 0.838-0.882), 0.787 (95% CI: 0.757-0.818), and 0.793 (95% CI: 0.758-0.831) for ccRCC, pRCC, and chRCC, respectively. Conclusions The developed deep learning model demonstrated robust performance in predicting the pathological subtypes of RCC, while the incorporated uncertainty emphasized the importance of understanding model confidence, which is crucial for assisting clinical decision-making for patients with renal tumors. Clinical relevance statement Our deep learning approach, integrated with uncertainty estimation, offers clinicians a dual advantage: accurate RCC subtype predictions complemented by diagnostic confidence references, promoting informed decision-making for patients with RCC.
△ Less
Submitted 12 November, 2023; v1 submitted 1 November, 2023;
originally announced November 2023.