-
Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts
Authors:
Jianhao Li,
Tianyu Sun,
Zhongdao Wang,
Enze Xie,
Bailan Feng,
Hongbo Zhang,
Ze Yuan,
Ke Xu,
Jiaheng Liu,
Ping Luo
Abstract:
This paper proposes an algorithm for automatically labeling 3D objects from 2D point or box prompts, especially focusing on applications in autonomous driving. Unlike previous arts, our auto-labeler predicts 3D shapes instead of bounding boxes and does not require training on a specific dataset. We propose a Segment, Lift, and Fit (SLF) paradigm to achieve this goal. Firstly, we segment high-quali…
▽ More
This paper proposes an algorithm for automatically labeling 3D objects from 2D point or box prompts, especially focusing on applications in autonomous driving. Unlike previous arts, our auto-labeler predicts 3D shapes instead of bounding boxes and does not require training on a specific dataset. We propose a Segment, Lift, and Fit (SLF) paradigm to achieve this goal. Firstly, we segment high-quality instance masks from the prompts using the Segment Anything Model (SAM) and transform the remaining problem into predicting 3D shapes from given 2D masks. Due to the ill-posed nature of this problem, it presents a significant challenge as multiple 3D shapes can project into an identical mask. To tackle this issue, we then lift 2D masks to 3D forms and employ gradient descent to adjust their poses and shapes until the projections fit the masks and the surfaces conform to surrounding LiDAR points. Notably, since we do not train on a specific dataset, the SLF auto-labeler does not overfit to biased annotation patterns in the training set as other methods do. Thus, the generalization ability across different datasets improves. Experimental results on the KITTI dataset demonstrate that the SLF auto-labeler produces high-quality bounding box annotations, achieving an AP@0.5 IoU of nearly 90\%. Detectors trained with the generated pseudo-labels perform nearly as well as those trained with actual ground-truth annotations. Furthermore, the SLF auto-labeler shows promising results in detailed shape predictions, providing a potential alternative for the occupancy annotation of dynamic objects.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
One-dimensional flat bands in phosphorene nanoribbons with pentagonal nature
Authors:
Shuo Sun,
Jing-Yang You,
Zhihao Cai,
Jie Su,
Tong Yang,
Xinnan Peng,
Yihe Wang,
Daiyu Geng,
Jian Gou,
Yuli Huang,
Sisheng Duan,
Lan Chen,
Kehui Wu,
Andrew T. S. Wee,
Yuan Ping Feng,
Jia Lin Zhang,
Jiong Lu,
Baojie Feng,
Wei Chen
Abstract:
Materials with topological flat bands can serve as a promising platform to investigate strongly interacting phenomena. However, experimental realization of ideal flat bands is mostly limited to artificial lattices or moiré systems. Here we report a general way to construct one-dimensional (1D) flat bands in phosphorene nanoribbons (PNRs) with pentagonal nature: penta-hexa-PNRs and penta-dodeca-PNR…
▽ More
Materials with topological flat bands can serve as a promising platform to investigate strongly interacting phenomena. However, experimental realization of ideal flat bands is mostly limited to artificial lattices or moiré systems. Here we report a general way to construct one-dimensional (1D) flat bands in phosphorene nanoribbons (PNRs) with pentagonal nature: penta-hexa-PNRs and penta-dodeca-PNRs, wherein the corresponding flat bands are directly verified by using angle-resolved photoemission spectroscopy. We confirm that the observed 1D flat bands originate from the electronic 1D sawtooth and Lieb lattices, respectively, as revealed by the combination of bond-resolved scanning tunneling microscopy, scanning tunneling spectroscopy, tight-binding models, and first-principles calculations. Our study demonstrates a general way to construct 1D flat bands in 1D solid materials system, which provides a robust platform to explore strongly interacting phases of matter.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting
Authors:
Chenxin Li,
Brandon Y. Feng,
Yifan Liu,
Hengyu Liu,
Cheng Wang,
Weihao Yu,
Yixuan Yuan
Abstract:
3D reconstruction of biological tissues from a collection of endoscopic images is a key to unlock various important downstream surgical applications with 3D capabilities. Existing methods employ various advanced neural rendering techniques for photorealistic view synthesis, but they often struggle to recover accurate 3D representations when only sparse observations are available, which is usually…
▽ More
3D reconstruction of biological tissues from a collection of endoscopic images is a key to unlock various important downstream surgical applications with 3D capabilities. Existing methods employ various advanced neural rendering techniques for photorealistic view synthesis, but they often struggle to recover accurate 3D representations when only sparse observations are available, which is usually the case in real-world clinical scenarios. To tackle this {sparsity} challenge, we propose a framework leveraging the prior knowledge from multiple foundation models during the reconstruction process, dubbed as \textit{EndoSparse}. Experimental results indicate that our proposed strategy significantly improves the geometric and appearance quality under challenging sparse-view conditions, including using only three views. In rigorous benchmarking experiments against state-of-the-art methods, \textit{EndoSparse} achieves superior results in terms of accurate geometry, realistic appearance, and rendering efficiency, confirming the robustness to sparse-view limitations in endoscopic reconstruction. \textit{EndoSparse} signifies a steady step towards the practical deployment of neural 3D reconstruction in real-world clinical scenarios. Project page: https://endo-sparse.github.io/.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Relational Reasoning On Graphs Using Opinion Dynamics
Authors:
Yulong Yang,
Bowen Feng,
Keqin Wang,
Naomi Leonard,
Adji Bousso Dieng,
Christine Allen-Blanchette
Abstract:
From pedestrians to Kuramoto oscillators, interactions between agents govern how a multitude of dynamical systems evolve in space and time. Discovering how these agents relate to each other can improve our understanding of the often complex dynamics that underlie these systems. Recent works learn to categorize relationships between agents based on observations of their physical behavior. These app…
▽ More
From pedestrians to Kuramoto oscillators, interactions between agents govern how a multitude of dynamical systems evolve in space and time. Discovering how these agents relate to each other can improve our understanding of the often complex dynamics that underlie these systems. Recent works learn to categorize relationships between agents based on observations of their physical behavior. These approaches are limited in that the relationship categories are modelled as independent and mutually exclusive, when in real world systems categories are often interacting. In this work, we introduce a level of abstraction between the physical behavior of agents and the categories that define their behavior. To do this, we learn a mapping from the agents' states to their affinities for each category in a graph neural network. We integrate the physical proximity of agents and their affinities in a nonlinear opinion dynamics model which provides a mechanism to identify mutually exclusive categories, predict an agent's evolution in time, and control an agent's behavior. We demonstrate the utility of our model for learning interpretable categories for mechanical systems, and demonstrate its efficacy on several long-horizon trajectory prediction benchmarks where we consistently out perform existing methods.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Neural Approximate Mirror Maps for Constrained Diffusion Models
Authors:
Berthy T. Feng,
Ricardo Baptista,
Katherine L. Bouman
Abstract:
Diffusion models excel at creating visually-convincing images, but they often struggle to meet subtle constraints inherent in the training data. Such constraints could be physics-based (e.g., satisfying a PDE), geometric (e.g., respecting symmetry), or semantic (e.g., including a particular number of objects). When the training data all satisfy a certain constraint, enforcing this constraint on a…
▽ More
Diffusion models excel at creating visually-convincing images, but they often struggle to meet subtle constraints inherent in the training data. Such constraints could be physics-based (e.g., satisfying a PDE), geometric (e.g., respecting symmetry), or semantic (e.g., including a particular number of objects). When the training data all satisfy a certain constraint, enforcing this constraint on a diffusion model not only improves its distribution-matching accuracy but also makes it more reliable for generating valid synthetic data and solving constrained inverse problems. However, existing methods for constrained diffusion models are inflexible with different types of constraints. Recent work proposed to learn mirror diffusion models (MDMs) in an unconstrained space defined by a mirror map and to impose the constraint with an inverse mirror map, but analytical mirror maps are challenging to derive for complex constraints. We propose neural approximate mirror maps (NAMMs) for general constraints. Our approach only requires a differentiable distance function from the constraint set. We learn an approximate mirror map that pushes data into an unconstrained space and a corresponding approximate inverse that maps data back to the constraint set. A generative model, such as an MDM, can then be trained in the learned mirror space and its samples restored to the constraint set by the inverse map. We validate our approach on a variety of constraints, showing that compared to an unconstrained diffusion model, a NAMM-based MDM substantially improves constraint satisfaction. We also demonstrate how existing diffusion-based inverse-problem solvers can be easily applied in the learned mirror space to solve constrained inverse problems.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
LiCAF: LiDAR-Camera Asymmetric Fusion for Gait Recognition
Authors:
Yunze Deng,
Haijun Xiong,
Bin Feng
Abstract:
Gait recognition is a biometric technology that identifies individuals by using walking patterns. Due to the significant achievements of multimodal fusion in gait recognition, we consider employing LiDAR-camera fusion to obtain robust gait representations. However, existing methods often overlook intrinsic characteristics of modalities, and lack fine-grained fusion and temporal modeling. In this p…
▽ More
Gait recognition is a biometric technology that identifies individuals by using walking patterns. Due to the significant achievements of multimodal fusion in gait recognition, we consider employing LiDAR-camera fusion to obtain robust gait representations. However, existing methods often overlook intrinsic characteristics of modalities, and lack fine-grained fusion and temporal modeling. In this paper, we introduce a novel modality-sensitive network LiCAF for LiDAR-camera fusion, which employs an asymmetric modeling strategy. Specifically, we propose Asymmetric Cross-modal Channel Attention (ACCA) and Interlaced Cross-modal Temporal Modeling (ICTM) for cross-modal valuable channel information selection and powerful temporal modeling. Our method achieves state-of-the-art performance (93.9% in Rank-1 and 98.8% in Rank-5) on the SUSTech1K dataset, demonstrating its effectiveness.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting
Authors:
Zhengqi Zhao,
Xiaohu Huang,
Hao Zhou,
Kun Yao,
Errui Ding,
Jingdong Wang,
Xinggang Wang,
Wenyu Liu,
Bin Feng
Abstract:
The key to action counting is accurately locating each video's repetitive actions. Instead of estimating the probability of each frame belonging to an action directly, we propose a dual-branch network, i.e., SkimFocusNet, working in a two-step manner. The model draws inspiration from empirical observations indicating that humans typically engage in coarse skimming of entire sequences to grasp the…
▽ More
The key to action counting is accurately locating each video's repetitive actions. Instead of estimating the probability of each frame belonging to an action directly, we propose a dual-branch network, i.e., SkimFocusNet, working in a two-step manner. The model draws inspiration from empirical observations indicating that humans typically engage in coarse skimming of entire sequences to grasp the general action pattern initially, followed by a finer, frame-by-frame focus to determine if it aligns with the target action. Specifically, SkimFocusNet incorporates a skim branch and a focus branch. The skim branch scans the global contextual information throughout the sequence to identify potential target action for guidance. Subsequently, the focus branch utilizes the guidance to diligently identify repetitive actions using a long-short adaptive guidance (LSAG) block. Additionally, we have observed that videos in existing datasets often feature only one type of repetitive action, which inadequately represents real-world scenarios. To more accurately describe real-life situations, we establish the Multi-RepCount dataset, which includes videos containing multiple repetitive motions. On Multi-RepCount, our SkimFoucsNet can perform specified action counting, that is, to enable counting a particular action type by referencing an exemplary video. This capability substantially exhibits the robustness of our method. Extensive experiments demonstrate that SkimFocusNet achieves state-of-the-art performances with significant improvements. We also conduct a thorough ablation study to evaluate the network components. The source code will be published upon acceptance.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Event-horizon-scale Imaging of M87* under Different Assumptions via Deep Generative Image Priors
Authors:
Berthy T. Feng,
Katherine L. Bouman,
William T. Freeman
Abstract:
Reconstructing images from the Event Horizon Telescope (EHT) observations of M87*, the supermassive black hole at the center of the galaxy M87, depends on a prior to impose desired image statistics. However, given the impossibility of directly observing black holes, there is no clear choice for a prior. We present a framework for flexibly designing a range of priors, each bringing different biases…
▽ More
Reconstructing images from the Event Horizon Telescope (EHT) observations of M87*, the supermassive black hole at the center of the galaxy M87, depends on a prior to impose desired image statistics. However, given the impossibility of directly observing black holes, there is no clear choice for a prior. We present a framework for flexibly designing a range of priors, each bringing different biases to the image reconstruction. These priors can be weak (e.g., impose only basic natural-image statistics) or strong (e.g., impose assumptions of black-hole structure). Our framework uses Bayesian inference with score-based priors, which are data-driven priors arising from a deep generative model that can learn complicated image distributions. Using our Bayesian imaging approach with sophisticated data-driven priors, we can assess how visual features and uncertainty of reconstructed images change depending on the prior. In addition to simulated data, we image the real EHT M87* data and discuss how recovered features are influenced by the choice of prior.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Real-space tilting method for atomic resolution STEM imaging of nanocrystalline materials
Authors:
Jiake Wei,
Zhangze Xu,
Wenjie Shen,
Bin Feng,
Ryo Ishikawa,
Naoya Shibata,
Yuichi Ikuhara,
Xuedong Bai
Abstract:
Atomic-resolution scanning transmission electron microscopy (STEM) characterization requires precise tilting of the specimen to high symmetric zone axis, which is usually processed in reciprocal space by following the diffraction patterns. However, for small-sized nanocrystalline materials, their diffraction patterns are too faint to guide the tilting process. Here, a simple and effective tilting…
▽ More
Atomic-resolution scanning transmission electron microscopy (STEM) characterization requires precise tilting of the specimen to high symmetric zone axis, which is usually processed in reciprocal space by following the diffraction patterns. However, for small-sized nanocrystalline materials, their diffraction patterns are too faint to guide the tilting process. Here, a simple and effective tilting method is developed based on the diffraction contrast change of the shadow image in the Ronchigram. We can calculate the misorientation angle of the specimen and tilt it to the zone axis based on the position of the shadow image with lowest intensity. This method requires no prior knowledge of the sample and the maximum misorientation angle we can correct is greater than +-6.9 degree with sub-mrad accuracy. It is processed in real space, without recording the diffraction patterns of the specimens, which can effectively apply to nanocrystalline materials. Combined with the scripting to control the microscope, we can automatically tilt the sample to the zone axis under low dose condition (<0.17 e-/A2/s), which could facilitate the imaging of beam sensitive materials such as zeolites or metal organic frameworks. This automated tilting method could contribute to the atomic-scale characterization of the nanocrystalline materials by STEM imaging.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
VividDream: Generating 3D Scene with Ambient Dynamics
Authors:
Yao-Chih Lee,
Yi-Ting Chen,
Andrew Wang,
Ting-Hsuan Liao,
Brandon Y. Feng,
Jia-Bin Huang
Abstract:
We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of…
▽ More
We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of the static 3D scene from the sampled camera trajectories. We then optimize a canonical 4D scene representation using an animated video ensemble, with per-video motion embeddings and visibility masks to mitigate inconsistencies. The resulting 4D scene enables free-view exploration of a 3D scene with plausible ambient scene dynamics. Experiments demonstrate that VividDream can provide human viewers with compelling 4D experiences generated based on diverse real images and text prompts.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Evidence for Multiferroicity in Single-Layer CuCrSe$_2$
Authors:
Zhenyu Sun,
Yueqi Su,
Aomiao Zhi,
Zhicheng Gao,
Xu Han,
Kang Wu,
Lihong Bao,
Yuan Huang,
Youguo Shi,
Xuedong Bai,
Peng Cheng,
Lan Chen,
Kehui Wu,
Xuezeng Tian,
Changzheng Wu,
Baojie Feng
Abstract:
Multiferroic materials, which simultaneously exhibit ferroelectricity and magnetism, have attracted substantial attention due to their fascinating physical properties and potential technological applications. With the trends towards device miniaturization, there is an increasing demand for the persistence of multiferroicity in single-layer materials at elevated temperatures. Here, we report high-t…
▽ More
Multiferroic materials, which simultaneously exhibit ferroelectricity and magnetism, have attracted substantial attention due to their fascinating physical properties and potential technological applications. With the trends towards device miniaturization, there is an increasing demand for the persistence of multiferroicity in single-layer materials at elevated temperatures. Here, we report high-temperature multiferroicity in single-layer CuCrSe$_2$, which hosts room-temperature ferroelectricity and 120 K ferromagnetism. Notably, the ferromagnetic coupling in single-layer CuCrSe$_2$ is enhanced by the ferroelectricity-induced orbital shift of Cr atoms, which is distinct from both types I and II multiferroicity. These findings are supported by a combination of second-harmonic generation, piezo-response force microscopy, scanning transmission electron microscopy, magnetic, and Hall measurements. Our research provides not only an exemplary platform for delving into intrinsic magnetoelectric interactions at the single-layer limit but also sheds light on potential development of electronic and spintronic devices utilizing two-dimensional multiferroics.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Single-shot volumetric fluorescence imaging with neural fields
Authors:
Oumeng Zhang,
Haowen Zhou,
Brandon Y. Feng,
Elin M. Larsson,
Reinaldo E. Alcalde,
Siyuan Yin,
Catherine Deng,
Changhuei Yang
Abstract:
Single-shot volumetric fluorescence (SVF) imaging offers a significant advantage over traditional imaging methods that require scanning across multiple axial planes as it can capture biological processes with high temporal resolution across a large field of view. The key challenges in SVF imaging include requiring sparsity constraints to meet the multiplexing requirements of compressed sensing, el…
▽ More
Single-shot volumetric fluorescence (SVF) imaging offers a significant advantage over traditional imaging methods that require scanning across multiple axial planes as it can capture biological processes with high temporal resolution across a large field of view. The key challenges in SVF imaging include requiring sparsity constraints to meet the multiplexing requirements of compressed sensing, eliminating depth ambiguity in the reconstruction, and maintaining high resolution across a large field of view. In this paper, we introduce the QuadraPol point spread function (PSF) combined with neural fields, a novel approach for SVF imaging. This method utilizes a custom polarizer at the back focal plane and a polarization camera to detect fluorescence, effectively encoding the 3D scene within a compact PSF without depth ambiguity. Additionally, we propose a reconstruction algorithm based on the neural fields technique that provides improved reconstruction quality and addresses the inaccuracies of phase retrieval methods used to correct imaging system aberrations. This algorithm combines the accuracy of experimental PSFs with the long depth of field of computationally generated retrieved PSFs. QuadraPol PSF, combined with neural fields, significantly reduces the acquisition time of a conventional fluorescence microscope by approximately 20 times and captures a 100 mm$^3$ cubic volume in one shot. We validate the effectiveness of both our hardware and algorithm through all-in-focus imaging of bacterial colonies on sand surfaces and visualization of plant root morphology. Our approach offers a powerful tool for advancing biological research and ecological studies.
△ Less
Submitted 4 June, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Stochastic Gradient MCMC for Massive Geostatistical Data
Authors:
Mohamed A. Abba,
Brian J. Reich,
Reetam Majumder,
Brandon Feng
Abstract:
Gaussian processes (GPs) are commonly used for prediction and inference for spatial data analyses. However, since estimation and prediction tasks have cubic time and quadratic memory complexity in number of locations, GPs are difficult to scale to large spatial datasets. The Vecchia approximation induces sparsity in the dependence structure and is one of several methods proposed to scale GP infere…
▽ More
Gaussian processes (GPs) are commonly used for prediction and inference for spatial data analyses. However, since estimation and prediction tasks have cubic time and quadratic memory complexity in number of locations, GPs are difficult to scale to large spatial datasets. The Vecchia approximation induces sparsity in the dependence structure and is one of several methods proposed to scale GP inference. Our work adds to the substantial research in this area by developing a stochastic gradient Markov chain Monte Carlo (SGMCMC) framework for efficient computation in GPs. At each step, the algorithm subsamples a minibatch of locations and subsequently updates process parameters through a Vecchia-approximated GP likelihood. Since the Vecchia-approximated GP has a time complexity that is linear in the number of locations, this results in scalable estimation in GPs. Through simulation studies, we demonstrate that SGMCMC is competitive with state-of-the-art scalable GP algorithms in terms of computational time and parameter estimation. An application of our method is also provided using the Argo dataset of ocean temperature measurements.
△ Less
Submitted 3 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Authors:
DeepSeek-AI,
Aixin Liu,
Bei Feng,
Bin Wang,
Bingxuan Wang,
Bo Liu,
Chenggang Zhao,
Chengqi Dengr,
Chong Ruan,
Damai Dai,
Daya Guo,
Dejian Yang,
Deli Chen,
Dongjie Ji,
Erhang Li,
Fangyun Lin,
Fuli Luo,
Guangbo Hao,
Guanting Chen,
Guowei Li,
H. Zhang,
Hanwei Xu,
Hao Yang,
Haowei Zhang,
Honghui Ding
, et al. (132 additional authors not shown)
Abstract:
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference…
▽ More
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.
△ Less
Submitted 19 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Research on signalized intersection mixed traffic flow platoon control method considering Backward-looking effect
Authors:
Binghao Feng,
Hui Guo,
Minghui Ma,
Yuepeng Wu,
Shidong Liang,
Yansong Wang
Abstract:
Connected and Autonomous Vehicles (CAVs) technology facilitates the advancement of intelligent transportation. However, intelligent control techniques for mixed traffic flow at signalized intersections involving both CAVs and Human-Driven Vehicles (HDVs) require further investigation into the impact of backward-looking effect. This paper proposes the concept of 1+n+1 mixed platoon considering the…
▽ More
Connected and Autonomous Vehicles (CAVs) technology facilitates the advancement of intelligent transportation. However, intelligent control techniques for mixed traffic flow at signalized intersections involving both CAVs and Human-Driven Vehicles (HDVs) require further investigation into the impact of backward-looking effect. This paper proposes the concept of 1+n+1 mixed platoon considering the backward-looking effect, consisting of one leading CAV, n following HDVs, and one trailing CAV. The leading and trailing CAVs collectively guide the movement of intermediate HDVs at intersections, forming an optimal control framework for platoon-based CAVs at signalized intersections. Initially, a linearized dynamic model for the 1+n+1 mixed platoon is established and compared with a benchmark model focusing solely on controlling the lead vehicle. Subsequently, constraints are formulated for the optimal control framework, aiming to enhance overall intersection traffic efficiency and fuel economy by directly controlling the leading and trailing CAVs in the platoon. Finally, extensive numerical simulations compare vehicle throughput and fuel consumption at signalized intersections under different mixed platoon control methods, validating that considering both front and backward-looking effects in the mixed platoon control method outperforms traditional methods focusing solely on the lead CAV.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Cerium Oxide-based Solid-State Thermal Transistors with Wide Switching Width of 9.5 W/mK
Authors:
Ahrong Jeong,
Mitsuki Yoshimura,
Zhiping Bian,
Jason Tam,
Bin Feng,
Yuichi Ikuhara,
Yusaku Magari,
Takashi Endo,
Yasutaka Matsuo,
Hiromichi Ohta
Abstract:
Thermal transistors that electrically switch heat flow on and off have attracted attention as thermal management devices. Electrochemical reduction/oxidation switches the thermal conductivity (\k{appa}) of active metal oxide layers. The \k{appa}-switching width (difference between on-state and off-state \k{appa}) of the previously proposed electrochemical thermal transistors is narrow, less than 5…
▽ More
Thermal transistors that electrically switch heat flow on and off have attracted attention as thermal management devices. Electrochemical reduction/oxidation switches the thermal conductivity (\k{appa}) of active metal oxide layers. The \k{appa}-switching width (difference between on-state and off-state \k{appa}) of the previously proposed electrochemical thermal transistors is narrow, less than 5 W/mK. Here, we show solid-state electrochemical thermal transistors with a wide \k{appa}-switching width of 9.5 W/mK. We used CeO2 thin film as the active layer directly deposited on a solid electrolyte YSZ substrate. A Pt thin film was deposited on the surface of the CeO2 thin film and the back surface of the YSZ substrate to create a solid-state electrochemical thermal transistor. When the CeO2 thin film was once reduced (off-state) and then oxidized (on-state), the \k{appa} was approximately 2.5 W/mK in its most reduced state, and \k{appa} increased with oxidation to 11.8 W/mK (on-state). This reduction (off-state)/oxidation (on-state) cycle was repeated five times and the average value of \k{appa} was 2.5 W/mK after reduction (off-state) and 12 W/mK after oxidation (on-state). The \k{appa}-switching width was 9.5 W/mK. The CeO2-based solid-state electrochemical thermal transistors are potential materials for thermal shutters and thermal displays.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Integrable semi-discretization for a modified Camassa-Holm equation with cubic nonlinearity
Authors:
Bao-Feng Feng,
Heng-Chun Hu,
Han-Han Sheng,
Wei Yin,
Guo-Fu Yu
Abstract:
In the present paper, an integrable semi-discretization of the modified Camassa-Holm (mCH) equation with cubic nonlinearity is presented. The key points of the construction are based on the discrete Kadomtsev-Petviashvili (KP) equation and appropriate definition of discrete reciprocal transformations. First, we demonstrate that these bilinear equations and their determinant solutions can be derive…
▽ More
In the present paper, an integrable semi-discretization of the modified Camassa-Holm (mCH) equation with cubic nonlinearity is presented. The key points of the construction are based on the discrete Kadomtsev-Petviashvili (KP) equation and appropriate definition of discrete reciprocal transformations. First, we demonstrate that these bilinear equations and their determinant solutions can be derived from the discrete KP equation through Miwa transformation and some reductions. Then, by scrutinizing the reduction process, we obtain a set of semi-discrete bilinear equations and their general soliton solutions in the Gram-type determinant form. Finally, we obtain an integrable semi-discrete analog of the mCH equation by introducing dependent variables and discrete reciprocal transformation. It is also shown that the semi-discrete mCH equation converges to the continuous one in the continuum limit.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Rechargeable UAV Trajectory Optimization for Real-Time Persistent Data Collection of Large-Scale Sensor Networks
Authors:
Rui Wang,
Deshi Li,
Qingqing Wu,
Kaitao Meng,
Boning Feng,
Lele Cong
Abstract:
Unmanned aerial vehicles (UAVs) have received plenty of attention due to their high flexibility and enhanced communication ability, nonetheless, the limited onboard energy restricts UAVs' application on persistent data collection missions in large areas. In this paper, we propose a rechargeable UAV-assisted periodic data collection scheme, where a UAV is dispatched to periodically collect data fro…
▽ More
Unmanned aerial vehicles (UAVs) have received plenty of attention due to their high flexibility and enhanced communication ability, nonetheless, the limited onboard energy restricts UAVs' application on persistent data collection missions in large areas. In this paper, we propose a rechargeable UAV-assisted periodic data collection scheme, where a UAV is dispatched to periodically collect data from sensor nodes (SNs) in the mission area and charged by a wireless charging platform. Specifically, the periodic data collection completion time is minimized by optimizing the UAV trajectory to reach the optimal balance among the collection time, flight time, and recharging time. The formulated problem is non-convex and difficult to solve directly. To tackle this problem, we divide the main problem into two sub-problems and address them by leveraging successive convex approximation (SCA), bisection search, and heuristic methods. Then, we propose a periodic trajectory optimization algorithm to iteratively solve the two sub-problems to minimize the completion time. Furthermore, to deal with the dynamics of SNs, we propose a low-complexity trajectory adjustment strategy, where the trajectory can be maintained or adjusted locally at the SNs change, which significantly mitigates the computation cost of re-optimization. The simulation results show the superiority and robustness of the proposed scheme and the completion time is on average 39% and 33% lower than the two benchmarks, respectively.
△ Less
Submitted 6 June, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving
Authors:
Guoqing Wang,
Zhongdao Wang,
Pin Tang,
Jilai Zheng,
Xiangxuan Ren,
Bailan Feng,
Chao Ma
Abstract:
Existing solutions for 3D semantic occupancy prediction typically treat the task as a one-shot 3D voxel-wise segmentation perception problem. These discriminative methods focus on learning the mapping between the inputs and occupancy map in a single step, lacking the ability to gradually refine the occupancy map and the reasonable scene imaginative capacity to complete the local regions somewhere.…
▽ More
Existing solutions for 3D semantic occupancy prediction typically treat the task as a one-shot 3D voxel-wise segmentation perception problem. These discriminative methods focus on learning the mapping between the inputs and occupancy map in a single step, lacking the ability to gradually refine the occupancy map and the reasonable scene imaginative capacity to complete the local regions somewhere. In this paper, we introduce OccGen, a simple yet powerful generative perception model for the task of 3D semantic occupancy prediction. OccGen adopts a ''noise-to-occupancy'' generative paradigm, progressively inferring and refining the occupancy map by predicting and eliminating noise originating from a random Gaussian distribution. OccGen consists of two main components: a conditional encoder that is capable of processing multi-modal inputs, and a progressive refinement decoder that applies diffusion denoising using the multi-modal features as conditions. A key insight of this generative pipeline is that the diffusion denoising process is naturally able to model the coarse-to-fine refinement of the dense 3D occupancy map, therefore producing more detailed predictions. Extensive experiments on several occupancy benchmarks demonstrate the effectiveness of the proposed method compared to the state-of-the-art methods. For instance, OccGen relatively enhances the mIoU by 9.5%, 6.3%, and 13.3% on nuScenes-Occupancy dataset under the muli-modal, LiDAR-only, and camera-only settings, respectively. Moreover, as a generative perception model, OccGen exhibits desirable properties that discriminative models cannot achieve, such as providing uncertainty estimates alongside its multiple-step predictions.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
Authors:
Tianyuan Zhang,
Hong-Xing Yu,
Rundi Wu,
Brandon Y. Feng,
Changxi Zheng,
Noah Snavely,
Jiajun Wu,
William T. Freeman
Abstract:
Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these…
▽ More
Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these properties, such as object stiffness. However, estimating physical material properties is an open problem due to the lack of material ground-truth data, as measuring these properties for real objects is highly difficult. We present PhysDreamer, a physics-based approach that endows static 3D objects with interactive dynamics by leveraging the object dynamics priors learned by video generation models. By distilling these priors, PhysDreamer enables the synthesis of realistic object responses to novel interactions, such as external forces or agent manipulations. We demonstrate our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study. PhysDreamer takes a step towards more engaging and realistic virtual experiences by enabling static 3D objects to dynamically respond to interactive stimuli in a physically plausible manner. See our project page at https://physdreamer.github.io/.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Weighted Sum-Rate Maximization for Movable Antenna-Enhanced Wireless Networks
Authors:
Biqian Feng,
Yongpeng Wu,
Xiang-Gen Xia,
Chengshan Xiao
Abstract:
This letter investigates the weighted sum rate maximization problem in movable antenna (MA)-enhanced systems. To reduce the computational complexity, we transform it into a more tractable weighted minimum mean square error (WMMSE) problem well-suited for MA. We then adopt the WMMSE algorithm and majorization-minimization algorithm to optimize the beamforming and antenna positions, respectively. Mo…
▽ More
This letter investigates the weighted sum rate maximization problem in movable antenna (MA)-enhanced systems. To reduce the computational complexity, we transform it into a more tractable weighted minimum mean square error (WMMSE) problem well-suited for MA. We then adopt the WMMSE algorithm and majorization-minimization algorithm to optimize the beamforming and antenna positions, respectively. Moreover, we propose a planar movement mode, which constrains each MA to a specified area, we obtain a low-complexity closed-form solution. Numerical results demonstrate that the MA-enhanced system outperforms the conventional system. Besides, the computation time for the planar movement mode is reduced by approximately 30\% at a little performance expense.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction
Authors:
Pin Tang,
Zhongdao Wang,
Guoqing Wang,
Jilai Zheng,
Xiangxuan Ren,
Bailan Feng,
Chao Ma
Abstract:
Vision-based perception for autonomous driving requires an explicit modeling of a 3D space, where 2D latent representations are mapped and subsequent 3D operators are applied. However, operating on dense latent spaces introduces a cubic time and space complexity, which limits scalability in terms of perception range or spatial resolution. Existing approaches compress the dense representation using…
▽ More
Vision-based perception for autonomous driving requires an explicit modeling of a 3D space, where 2D latent representations are mapped and subsequent 3D operators are applied. However, operating on dense latent spaces introduces a cubic time and space complexity, which limits scalability in terms of perception range or spatial resolution. Existing approaches compress the dense representation using projections like Bird's Eye View (BEV) or Tri-Perspective View (TPV). Although efficient, these projections result in information loss, especially for tasks like semantic occupancy prediction. To address this, we propose SparseOcc, an efficient occupancy network inspired by sparse point cloud processing. It utilizes a lossless sparse latent representation with three key innovations. Firstly, a 3D sparse diffuser performs latent completion using spatially decomposed 3D sparse convolutional kernels. Secondly, a feature pyramid and sparse interpolation enhance scales with information from others. Finally, the transformer head is redesigned as a sparse variant. SparseOcc achieves a remarkable 74.9% reduction on FLOPs over the dense baseline. Interestingly, it also improves accuracy, from 12.8% to 14.1% mIOU, which in part can be attributed to the sparse representation's ability to avoid hallucinations on empty voxels.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
WaveMo: Learning Wavefront Modulations to See Through Scattering
Authors:
Mingyang Xie,
Haiyun Guo,
Brandon Y. Feng,
Lingbo Jin,
Ashok Veeraraghavan,
Christopher A. Metzler
Abstract:
Imaging through scattering media is a fundamental and pervasive challenge in fields ranging from medical diagnostics to astronomy. A promising strategy to overcome this challenge is wavefront modulation, which induces measurement diversity during image acquisition. Despite its importance, designing optimal wavefront modulations to image through scattering remains under-explored. This paper introdu…
▽ More
Imaging through scattering media is a fundamental and pervasive challenge in fields ranging from medical diagnostics to astronomy. A promising strategy to overcome this challenge is wavefront modulation, which induces measurement diversity during image acquisition. Despite its importance, designing optimal wavefront modulations to image through scattering remains under-explored. This paper introduces a novel learning-based framework to address the gap. Our approach jointly optimizes wavefront modulations and a computationally lightweight feedforward "proxy" reconstruction network. This network is trained to recover scenes obscured by scattering, using measurements that are modified by these modulations. The learned modulations produced by our framework generalize effectively to unseen scattering scenarios and exhibit remarkable versatility. During deployment, the learned modulations can be decoupled from the proxy network to augment other more computationally expensive restoration algorithms. Through extensive experiments, we demonstrate our approach significantly advances the state of the art in imaging through scattering media. Our project webpage is at https://wavemo-2024.github.io/.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Score-Based Diffusion Models for Photoacoustic Tomography Image Reconstruction
Authors:
Sreemanti Dey,
Snigdha Saha,
Berthy T. Feng,
Manxiu Cui,
Laure Delisle,
Oscar Leong,
Lihong V. Wang,
Katherine L. Bouman
Abstract:
Photoacoustic tomography (PAT) is a rapidly-evolving medical imaging modality that combines optical absorption contrast with ultrasound imaging depth. One challenge in PAT is image reconstruction with inadequate acoustic signals due to limited sensor coverage or due to the density of the transducer array. Such cases call for solving an ill-posed inverse reconstruction problem. In this work, we use…
▽ More
Photoacoustic tomography (PAT) is a rapidly-evolving medical imaging modality that combines optical absorption contrast with ultrasound imaging depth. One challenge in PAT is image reconstruction with inadequate acoustic signals due to limited sensor coverage or due to the density of the transducer array. Such cases call for solving an ill-posed inverse reconstruction problem. In this work, we use score-based diffusion models to solve the inverse problem of reconstructing an image from limited PAT measurements. The proposed approach allows us to incorporate an expressive prior learned by a diffusion model on simulated vessel structures while still being robust to varying transducer sparsity conditions.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field
Authors:
Jiarui Hu,
Xianhao Chen,
Boyin Feng,
Guanglin Li,
Liangjing Yang,
Hujun Bao,
Guofeng Zhang,
Zhaopeng Cui
Abstract:
Recently neural radiance fields (NeRF) have been widely exploited as 3D representations for dense simultaneous localization and mapping (SLAM). Despite their notable successes in surface modeling and novel view synthesis, existing NeRF-based methods are hindered by their computationally intensive and time-consuming volume rendering pipeline. This paper presents an efficient dense RGB-D SLAM system…
▽ More
Recently neural radiance fields (NeRF) have been widely exploited as 3D representations for dense simultaneous localization and mapping (SLAM). Despite their notable successes in surface modeling and novel view synthesis, existing NeRF-based methods are hindered by their computationally intensive and time-consuming volume rendering pipeline. This paper presents an efficient dense RGB-D SLAM system, i.e., CG-SLAM, based on a novel uncertainty-aware 3D Gaussian field with high consistency and geometric stability. Through an in-depth analysis of Gaussian Splatting, we propose several techniques to construct a consistent and stable 3D Gaussian field suitable for tracking and mapping. Additionally, a novel depth uncertainty model is proposed to ensure the selection of valuable Gaussian primitives during optimization, thereby improving tracking efficiency and accuracy. Experiments on various datasets demonstrate that CG-SLAM achieves superior tracking and mapping performance with a notable tracking speed of up to 15 Hz. We will make our source code publicly available. Project page: https://zju3dv.github.io/cg-slam.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
General One-loop Generating Function by IBP relations
Authors:
Bo Feng,
Chang Hu,
Jiyuan Shen,
Yaobo Zhang
Abstract:
In this paper we have studied the most general generating function of reduction for one loop integrals with arbitrary tensor structure in numerator and arbitrary power distribution of propagators in denominator. Using IBP relations, we have established the partial differential equations for these generating functions and solved them analytically. These results provide useful guidance for applying…
▽ More
In this paper we have studied the most general generating function of reduction for one loop integrals with arbitrary tensor structure in numerator and arbitrary power distribution of propagators in denominator. Using IBP relations, we have established the partial differential equations for these generating functions and solved them analytically. These results provide useful guidance for applying generating function method to reductions of higher loop integrals.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
TimeRewind: Rewinding Time with Image-and-Events Video Diffusion
Authors:
Jingxi Chen,
Brandon Y. Feng,
Haoming Cai,
Mingyang Xie,
Christopher Metzler,
Cornelia Fermuller,
Yiannis Aloimonos
Abstract:
This paper addresses the novel challenge of ``rewinding'' time from a single captured image to recover the fleeting moments missed just before the shutter button is pressed. This problem poses a significant challenge in computer vision and computational photography, as it requires predicting plausible pre-capture motion from a single static frame, an inherently ill-posed task due to the high degre…
▽ More
This paper addresses the novel challenge of ``rewinding'' time from a single captured image to recover the fleeting moments missed just before the shutter button is pressed. This problem poses a significant challenge in computer vision and computational photography, as it requires predicting plausible pre-capture motion from a single static frame, an inherently ill-posed task due to the high degree of freedom in potential pixel movements. We overcome this challenge by leveraging the emerging technology of neuromorphic event cameras, which capture motion information with high temporal resolution, and integrating this data with advanced image-to-video diffusion models. Our proposed framework introduces an event motion adaptor conditioned on event camera data, guiding the diffusion model to generate videos that are visually coherent and physically grounded in the captured events. Through extensive experimentation, we demonstrate the capability of our approach to synthesize high-quality videos that effectively ``rewind'' time, showcasing the potential of combining event camera technology with generative models. Our work opens new avenues for research at the intersection of computer vision, computational photography, and generative modeling, offering a forward-thinking solution to capturing missed moments and enhancing future consumer cameras and smartphones. Please see the project page at https://timerewind.github.io/ for video results and code release.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Endora: Video Generation Models as Endoscopy Simulators
Authors:
Chenxin Li,
Hengyu Liu,
Yifan Liu,
Brandon Y. Feng,
Wuyang Li,
Xinyu Liu,
Zhen Chen,
Jing Shao,
Yixuan Yuan
Abstract:
Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for machine learning. Despite progress in generating 2D medical images, the complex domain of clinical video generation has largely remained untapped.This paper introduces \model, an innovative approach to generate medical videos that simulate clinical endoscopy scenes. We present a…
▽ More
Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for machine learning. Despite progress in generating 2D medical images, the complex domain of clinical video generation has largely remained untapped.This paper introduces \model, an innovative approach to generate medical videos that simulate clinical endoscopy scenes. We present a novel generative model design that integrates a meticulously crafted spatial-temporal video transformer with advanced 2D vision foundation model priors, explicitly modeling spatial-temporal dynamics during video generation. We also pioneer the first public benchmark for endoscopy simulation with video generation models, adapting existing state-of-the-art methods for this endeavor.Endora demonstrates exceptional visual quality in generating endoscopy videos, surpassing state-of-the-art methods in extensive testing. Moreover, we explore how this endoscopy simulator can empower downstream video analysis tasks and even generate 3D medical scenes with multi-view consistency. In a nutshell, Endora marks a notable breakthrough in the deployment of generative AI for clinical endoscopy research, setting a substantial stage for further advances in medical content generation. For more details, please visit our project page: https://endora-medvidgen.github.io/.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Layer-dependent Raman spectroscopy of ultrathin Ta$_2$Pd$_3$Te$_5$
Authors:
Zhenyu Sun,
Zhaopeng Guo,
Dayu Yan,
Peng Cheng,
Lan Chen,
Youguo Shi,
Yuan Huang,
Zhijun Wang,
Kehui Wu,
Baojie Feng
Abstract:
Two-dimensional topological insulators (2DTIs) or quantum spin Hall insulators are attracting increasing attention due to their potential applications in next-generation spintronic devices. Despite their promising prospects, realizable 2DTIs are still limited. Recently, Ta2Pd3Te5, a semiconducting van der Waals material, has shown spectroscopic evidence of quantum spin Hall states. However, achiev…
▽ More
Two-dimensional topological insulators (2DTIs) or quantum spin Hall insulators are attracting increasing attention due to their potential applications in next-generation spintronic devices. Despite their promising prospects, realizable 2DTIs are still limited. Recently, Ta2Pd3Te5, a semiconducting van der Waals material, has shown spectroscopic evidence of quantum spin Hall states. However, achieving controlled preparation of few- to monolayer samples, a crucial step in realizing quantum spin Hall devices, has not yet been achieved. In this work, we fabricated few- to monolayer Ta$_2$Pd$_3$Te$_5$ and performed systematic thickness- and temperature-dependent Raman spectroscopy measurements. Our results demonstrate that Raman spectra can provide valuable information to determine the thickness of Ta2Pd3Te5 thin flakes. Moreover, our angle-resolved polarized Raman (ARPR) spectroscopy measurements show that the intensities of the Raman peaks are strongly anisotropic due to the quasi-one-dimensional atomic structure, providing a straightforward method to determine its crystalline orientation. Our findings may stimulate further efforts to realize quantum devices based on few or monolayer Ta$_2$Pd$_3$Te$_5$.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Ptycho-endoscopy on a lensless ultrathin fiber bundle tip
Authors:
Pengming Song,
Ruihai Wang,
Lars Loetgering,
Jia Liu,
Peter Vouras,
Yujin Lee,
Shaowei Jiang,
Bin Feng,
Andrew Maiden,
Changhuei Yang,
Guoan Zheng
Abstract:
Synthetic aperture radar (SAR) utilizes an aircraft-carried antenna to emit electromagnetic pulses and detect the returning echoes. As the aircraft travels across a designated area, it synthesizes a large virtual aperture to improve image resolution. Inspired by SAR, we introduce synthetic aperture ptycho-endoscopy (SAPE) for micro-endoscopic imaging beyond the diffraction limit. SAPE operates by…
▽ More
Synthetic aperture radar (SAR) utilizes an aircraft-carried antenna to emit electromagnetic pulses and detect the returning echoes. As the aircraft travels across a designated area, it synthesizes a large virtual aperture to improve image resolution. Inspired by SAR, we introduce synthetic aperture ptycho-endoscopy (SAPE) for micro-endoscopic imaging beyond the diffraction limit. SAPE operates by hand-holding a lensless fiber bundle tip to record coherent diffraction patterns from specimens. The fiber cores at the distal tip modulate the diffracted wavefield within a confined area, emulating the role of the 'airborne antenna' in SAR. The handheld operation introduces positional shifts to the tip, analogous to the aircraft's movement. These shifts facilitate the acquisition of a ptychogram and synthesize a large virtual aperture extending beyond the bundle's physical limit. We mitigate the influences of hand motion and fiber bending through a low-rank spatiotemporal decomposition of the bundle's modulation profile. Our tests demonstrate the ability to resolve a 548-nm linewidth on a resolution target. The achieved space-bandwidth product is ~1.1 million effective pixels, representing a 36-fold increase compared to that of the original fiber bundle. Furthermore, SAPE's refocusing capability enables imaging over an extended depth of field exceeding 2 cm. The aperture synthesizing process in SAPE surpasses the diffraction limit set by the probe's maximum collection angle, opening new opportunities for both fiber-based and distal-chip endoscopy in applications such as medical diagnostics and industrial inspection.
△ Less
Submitted 6 July, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Asymptotic behavior for a new higher-order nonlinear Schrödinger equation
Authors:
Hongyi Zhang,
Yufeng Zhang,
Binlu Feng
Abstract:
We investigate the Cauchy problem of a new higher-order nonlinear Schrödinger equation (NHNSE) with weighted Sobolev initial data which is derived by ourselves. By applying $\bar{\partial}$-steepest descent method, we derive the long-time asymptotics of the NHNSE. Explicit steps are as follows: first of all, based on the spectral analysis of a Lax pair and scattering matrice, the solution of the N…
▽ More
We investigate the Cauchy problem of a new higher-order nonlinear Schrödinger equation (NHNSE) with weighted Sobolev initial data which is derived by ourselves. By applying $\bar{\partial}$-steepest descent method, we derive the long-time asymptotics of the NHNSE. Explicit steps are as follows: first of all, based on the spectral analysis of a Lax pair and scattering matrice, the solution of the NHNSE is exhibted through solving the corresponding Riemann-Hilbert problem. Secondly, by applying some properties of the Riemann-Hilbert problem, we obtain the long-time asymptotics of the solution to the NHNSE. As we know that the properties of the NHNSE presented in the paper have not been found in any scholar journals.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Towards Systematic Evaluation of de Sitter Correlators via Generalized Integration-By-Parts Relations
Authors:
Jiaqi Chen,
Bo Feng
Abstract:
We generalize Integration-By-Parts (IBP) and differential equations methods to de Sitter correlators related to inflation. While massive correlators in de Sitter spacetime are usually regarded as highly intricate, we find they have remarkably hidden concise structures from the perspective of IBP. We find the factorization of the IBP relations of each vertex integral family corresponding to…
▽ More
We generalize Integration-By-Parts (IBP) and differential equations methods to de Sitter correlators related to inflation. While massive correlators in de Sitter spacetime are usually regarded as highly intricate, we find they have remarkably hidden concise structures from the perspective of IBP. We find the factorization of the IBP relations of each vertex integral family corresponding to $\mathrm{d} τ_i$ integration. Furthermore, with a smart construction of master integrals, the universal formulas for iterative reduction and $\mathrm{d} \log$-form differential equations of arbitrary vertex integral family are presented and proved. These formulas dominate all tree-level de Sitter correlators and play a kernel role at the loop-level as well.
△ Less
Submitted 29 June, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
ConVRT: Consistent Video Restoration Through Turbulence with Test-time Optimization of Neural Video Representations
Authors:
Haoming Cai,
Jingxi Chen,
Brandon Y. Feng,
Weiyun Jiang,
Mingyang Xie,
Kevin Zhang,
Ashok Veeraraghavan,
Christopher Metzler
Abstract:
tmospheric turbulence presents a significant challenge in long-range imaging. Current restoration algorithms often struggle with temporal inconsistency, as well as limited generalization ability across varying turbulence levels and scene content different than the training data. To tackle these issues, we introduce a self-supervised method, Consistent Video Restoration through Turbulence (ConVRT)…
▽ More
tmospheric turbulence presents a significant challenge in long-range imaging. Current restoration algorithms often struggle with temporal inconsistency, as well as limited generalization ability across varying turbulence levels and scene content different than the training data. To tackle these issues, we introduce a self-supervised method, Consistent Video Restoration through Turbulence (ConVRT) a test-time optimization method featuring a neural video representation designed to enhance temporal consistency in restoration. A key innovation of ConVRT is the integration of a pretrained vision-language model (CLIP) for semantic-oriented supervision, which steers the restoration towards sharp, photorealistic images in the CLIP latent space. We further develop a principled selection strategy of text prompts, based on their statistical correlation with a perceptual metric. ConVRT's test-time optimization allows it to adapt to a wide range of real-world turbulence conditions, effectively leveraging the insights gained from pre-trained models on simulated data. ConVRT offers a comprehensive and effective solution for mitigating real-world turbulence in dynamic videos.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM
Authors:
Jiayi Pan,
Chengcan Wang,
Kaifu Zheng,
Yangguang Li,
Zhenyu Wang,
Bin Feng
Abstract:
Large language models (LLMs) have shown remarkable capabilities in various tasks. However their huge model size and the consequent demand for computational and memory resources also pose challenges to model deployment. Currently, 4-bit post-training quantization (PTQ) has achieved some success in LLMs, reducing the memory footprint by approximately 75% compared to FP16 models, albeit with some acc…
▽ More
Large language models (LLMs) have shown remarkable capabilities in various tasks. However their huge model size and the consequent demand for computational and memory resources also pose challenges to model deployment. Currently, 4-bit post-training quantization (PTQ) has achieved some success in LLMs, reducing the memory footprint by approximately 75% compared to FP16 models, albeit with some accuracy loss. In this paper, we propose SmoothQuant+, an accurate and efficient 4-bit weight-only PTQ that requires no additional training, which enables lossless in accuracy for LLMs for the first time. Based on the fact that the loss of weight quantization is amplified by the activation outliers, SmoothQuant+ smoothes the activation outliers by channel before quantization, while adjusting the corresponding weights for mathematical equivalence, and then performs group-wise 4-bit weight quantization for linear layers. We have integrated SmoothQuant+ into the vLLM framework, an advanced high-throughput inference engine specially developed for LLMs, and equipped it with an efficient W4A16 CUDA kernels, so that vLLM can seamlessly support SmoothQuant+ 4-bit weight quantization. Our results show that, with SmoothQuant+, the Code Llama-34B model can be quantized and deployed on a A100 40GB GPU, achieving lossless accuracy and a throughput increase of 1.9 to 4.0 times compared to the FP16 model deployed on two A100 40GB GPUs. Moreover, the latency per token is only 68% of the FP16 model deployed on two A100 40GB GPUs. This is the state-of-the-art 4-bit weight quantization for LLMs as we know.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
AIM: Automatic Interrupt Modeling for Dynamic Firmware Analysis
Authors:
Bo Feng,
Meng Luo,
Changming Liu,
Long Lu,
Engin Kirda
Abstract:
The security of microcontrollers, which drive modern IoT and embedded devices, continues to raise major concerns. Within a microcontroller (MCU), the firmware is a monolithic piece of software that contains the whole software stack, whereas a variety of peripherals represent the hardware. As MCU firmware contains vulnerabilities, it is ideal to test firmware with off-the-shelf software testing tec…
▽ More
The security of microcontrollers, which drive modern IoT and embedded devices, continues to raise major concerns. Within a microcontroller (MCU), the firmware is a monolithic piece of software that contains the whole software stack, whereas a variety of peripherals represent the hardware. As MCU firmware contains vulnerabilities, it is ideal to test firmware with off-the-shelf software testing techniques, such as dynamic symbolic execution and fuzzing. Nevertheless, no emulator can emulate the diverse MCU peripherals or execute/test the firmware. Specifically, the interrupt interface, among all I/O interfaces used by MCU peripherals, is extremely challenging to emulate.
In this paper, we present AIM -- a generic, scalable, and hardware-independent dynamic firmware analysis framework that supports unemulated MCU peripherals by a novel interrupt modeling mechanism. AIM effectively and efficiently covers interrupt-dependent code in firmware by a novel, firmware-guided, Just-in-Time Interrupt Firing technique. We implemented our framework in angr and performed dynamic symbolic execution for eight real-world MCU firmware. According to testing results, our framework covered up to 11.2 times more interrupt-dependent code than state-of-the-art approaches while accomplishing several challenging goals not feasible previously. Finally, a comparison with a state-of-the-art firmware fuzzer demonstrates dynamic symbolic execution and fuzzing together can achieve better firmware testing coverage.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
The general solutions for a non-isospectral integrable TD hierarchy via the inverse scattering transform
Authors:
Hongyi Zhang,
Yufeng Zhang,
Binlu Feng
Abstract:
A non-isospectral Lax pair is first introduced from which a kind of non-isospectral integrable TD hierarchy is derived, whose reduction is an integrable system called the non-isospectral integrable TD system. Then by using the inverse scattering transform (IST) method, new general soliton solutions for the non-isospectral integrable TD hierarchy are obtained. Because we investigate soliton solutio…
▽ More
A non-isospectral Lax pair is first introduced from which a kind of non-isospectral integrable TD hierarchy is derived, whose reduction is an integrable system called the non-isospectral integrable TD system. Then by using the inverse scattering transform (IST) method, new general soliton solutions for the non-isospectral integrable TD hierarchy are obtained. Because we investigate soliton solutions of non-isospectral integrable systems by the IST method, a new Gel'fand-Levitan-Marchenko (GLM) equation needs to be constructed. Finally, we explicitly obtain the exact solutions of the non-isospectral integrable TD system. The method presented in the paper can be extensively applied to other integrable equations.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
FPM-INR: Fourier ptychographic microscopy image stack reconstruction using implicit neural representations
Authors:
Haowen Zhou,
Brandon Y. Feng,
Haiyun Guo,
Siyu Lin,
Mingshu Liang,
Christopher A. Metzler,
Changhuei Yang
Abstract:
Image stacks provide invaluable 3D information in various biological and pathological imaging applications. Fourier ptychographic microscopy (FPM) enables reconstructing high-resolution, wide field-of-view image stacks without z-stack scanning, thus significantly accelerating image acquisition. However, existing FPM methods take tens of minutes to reconstruct and gigabytes of memory to store a hig…
▽ More
Image stacks provide invaluable 3D information in various biological and pathological imaging applications. Fourier ptychographic microscopy (FPM) enables reconstructing high-resolution, wide field-of-view image stacks without z-stack scanning, thus significantly accelerating image acquisition. However, existing FPM methods take tens of minutes to reconstruct and gigabytes of memory to store a high-resolution volumetric scene, impeding fast gigapixel-scale remote digital pathology. While deep learning approaches have been explored to address this challenge, existing methods poorly generalize to novel datasets and can produce unreliable hallucinations. This work presents FPM-INR, a compact and efficient framework that integrates physics-based optical models with implicit neural representations (INR) to represent and reconstruct FPM image stacks. FPM-INR is agnostic to system design or sample types and does not require external training data. In our demonstrated experiments, FPM-INR substantially outperforms traditional FPM algorithms with up to a 25-fold increase in speed and an 80-fold reduction in memory usage for continuous image stack representations.
△ Less
Submitted 31 October, 2023; v1 submitted 27 October, 2023;
originally announced October 2023.
-
Provable Probabilistic Imaging using Score-Based Generative Priors
Authors:
Yu Sun,
Zihui Wu,
Yifan Chen,
Berthy T. Feng,
Katherine L. Bouman
Abstract:
Estimating high-quality images while also quantifying their uncertainty are two desired features in an image reconstruction algorithm for solving ill-posed inverse problems. In this paper, we propose plug-and-play Monte Carlo (PMC) as a principled framework for characterizing the space of possible solutions to a general inverse problem. PMC is able to incorporate expressive score-based generative…
▽ More
Estimating high-quality images while also quantifying their uncertainty are two desired features in an image reconstruction algorithm for solving ill-posed inverse problems. In this paper, we propose plug-and-play Monte Carlo (PMC) as a principled framework for characterizing the space of possible solutions to a general inverse problem. PMC is able to incorporate expressive score-based generative priors for high-quality image reconstruction while also performing uncertainty quantification via posterior sampling. In particular, we introduce two PMC algorithms which can be viewed as the sampling analogues of the traditional plug-and-play priors (PnP) and regularization by denoising (RED) algorithms. We also establish a theoretical analysis for characterizing the convergence of the PMC algorithms. Our analysis provides non-asymptotic stationarity guarantees for both algorithms, even in the presence of non-log-concave likelihoods and imperfect score networks. We demonstrate the performance of the PMC algorithms on multiple representative inverse problems with both linear and nonlinear forward models. Experimental results show that PMC significantly improves reconstruction quality and enables high-fidelity uncertainty quantification.
△ Less
Submitted 29 December, 2023; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task
Authors:
Guanting Dong,
Jinxu Zhao,
Tingfeng Hui,
Daichi Guo,
Wenlong Wan,
Boqi Feng,
Yueyan Qiu,
Zhuoma Gongque,
Keqing He,
Zechen Wang,
Weiran Xu
Abstract:
With the increasing capabilities of large language models (LLMs), these high-performance models have achieved state-of-the-art results on a wide range of natural language processing (NLP) tasks. However, the models' performance on commonly-used benchmark datasets often fails to accurately reflect their reliability and robustness when applied to real-world noisy data. To address these challenges, w…
▽ More
With the increasing capabilities of large language models (LLMs), these high-performance models have achieved state-of-the-art results on a wide range of natural language processing (NLP) tasks. However, the models' performance on commonly-used benchmark datasets often fails to accurately reflect their reliability and robustness when applied to real-world noisy data. To address these challenges, we propose a unified robustness evaluation framework based on the slot-filling task to systematically evaluate the dialogue understanding capability of LLMs in diverse input perturbation scenarios. Specifically, we construct a input perturbation evaluation dataset, Noise-LLM, which contains five types of single perturbation and four types of mixed perturbation data. Furthermore, we utilize a multi-level data augmentation method (character, word, and sentence levels) to construct a candidate data pool, and carefully design two ways of automatic task demonstration construction strategies (instance-level and entity-level) with various prompt templates. Our aim is to assess how well various robustness methods of LLMs perform in real-world noisy scenarios. The experiments have demonstrated that the current open-source LLMs generally achieve limited perturbation robustness performance. Based on these experimental observations, we make some forward-looking suggestions to fuel the research in this direction.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Shielding the Unseen: Privacy Protection through Poisoning NeRF with Spatial Deformation
Authors:
Yihan Wu,
Brandon Y. Feng,
Heng Huang
Abstract:
In this paper, we introduce an innovative method of safeguarding user privacy against the generative capabilities of Neural Radiance Fields (NeRF) models. Our novel poisoning attack method induces changes to observed views that are imperceptible to the human eye, yet potent enough to disrupt NeRF's ability to accurately reconstruct a 3D scene. To achieve this, we devise a bi-level optimization alg…
▽ More
In this paper, we introduce an innovative method of safeguarding user privacy against the generative capabilities of Neural Radiance Fields (NeRF) models. Our novel poisoning attack method induces changes to observed views that are imperceptible to the human eye, yet potent enough to disrupt NeRF's ability to accurately reconstruct a 3D scene. To achieve this, we devise a bi-level optimization algorithm incorporating a Projected Gradient Descent (PGD)-based spatial deformation. We extensively test our approach on two common NeRF benchmark datasets consisting of 29 real-world scenes with high-quality images. Our results compellingly demonstrate that our privacy-preserving method significantly impairs NeRF's performance across these benchmark datasets. Additionally, we show that our method is adaptable and versatile, functioning across various perturbation strengths and NeRF architectures. This work offers valuable insights into NeRF's vulnerabilities and emphasizes the need to account for such potential privacy risks when developing robust 3D scene reconstruction algorithms. Our study contributes to the larger conversation surrounding responsible AI and generative machine learning, aiming to protect user privacy and respect creative ownership in the digital age.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Quantum Privacy-preserving Two-party Circle Intersection Protocol Based on Phase-encoded Query
Authors:
Zi-Xian Li,
Qi Yang,
Bao Feng,
Wen-Jie Liu
Abstract:
Privacy-preserving geometric intersection (PGI) is an important issue in Secure multiparty computation (SMC). The existing quantum PGI protocols are mainly based on grid coding, which requires a lot of computational complexity. The phase-encoded query method which has been used in some Quantum SMC protocols is suitable to solve the decision problem, but it needs to apply high dimensional Oracle op…
▽ More
Privacy-preserving geometric intersection (PGI) is an important issue in Secure multiparty computation (SMC). The existing quantum PGI protocols are mainly based on grid coding, which requires a lot of computational complexity. The phase-encoded query method which has been used in some Quantum SMC protocols is suitable to solve the decision problem, but it needs to apply high dimensional Oracle operators. In this paper, we use the principle of phase-encoded query to solve an important PGI problem, namely privacy-preserving two-party circle intersection. We study the implementation of Oracle operator in detail, and achieve polynomial computational complexity by decompsing it into quantum arithmetic operations. Performance analysis shows that our protocol is correct and efficient, and can protect the privacy of all participants against internal and external attacks.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
Corporate Credit Rating: A Survey
Authors:
Bojing Feng,
Xi Cheng,
Dan Li,
Zeyu Liu,
Wenfang Xue
Abstract:
Corporate credit rating (CCR) plays a very important role in the process of contemporary economic and social development. How to use credit rating methods for enterprises has always been a problem worthy of discussion. Through reading and studying the relevant literature at home and abroad, this paper makes a systematic survey of CCR. This paper combs the context of the development of CCR methods…
▽ More
Corporate credit rating (CCR) plays a very important role in the process of contemporary economic and social development. How to use credit rating methods for enterprises has always been a problem worthy of discussion. Through reading and studying the relevant literature at home and abroad, this paper makes a systematic survey of CCR. This paper combs the context of the development of CCR methods from the three levels: statistical models, machine learning models and neural network models, summarizes the common databases of CCR, and deeply compares the advantages and disadvantages of the models. Finally, this paper summarizes the problems existing in the current research and prospects the future of CCR. Compared with the existing review of CCR, this paper expounds and analyzes the progress of neural network model in this field in recent years.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Continuous Levels of Detail for Light Field Networks
Authors:
David Li,
Brandon Y. Feng,
Amitabh Varshney
Abstract:
Recently, several approaches have emerged for generating neural representations with multiple levels of detail (LODs). LODs can improve the rendering by using lower resolutions and smaller model sizes when appropriate. However, existing methods generally focus on a few discrete LODs which suffer from aliasing and flicker artifacts as details are changed and limit their granularity for adapting to…
▽ More
Recently, several approaches have emerged for generating neural representations with multiple levels of detail (LODs). LODs can improve the rendering by using lower resolutions and smaller model sizes when appropriate. However, existing methods generally focus on a few discrete LODs which suffer from aliasing and flicker artifacts as details are changed and limit their granularity for adapting to resource limitations. In this paper, we propose a method to encode light field networks with continuous LODs, allowing for finely tuned adaptations to rendering conditions. Our training procedure uses summed-area table filtering allowing efficient and continuous filtering at various LODs. Furthermore, we use saliency-based importance sampling which enables our light field networks to distribute their capacity, particularly limited at lower LODs, towards representing the details viewers are most likely to focus on. Incorporating continuous LODs into neural representations enables progressive streaming of neural representations, decreasing the latency and resource utilization for rendering.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Efficient Bayesian Computational Imaging with a Surrogate Score-Based Prior
Authors:
Berthy T. Feng,
Katherine L. Bouman
Abstract:
We propose a surrogate function for efficient use of score-based priors for Bayesian inverse imaging. Recent work turned score-based diffusion models into probabilistic priors for solving ill-posed imaging problems by appealing to an ODE-based log-probability function. However, evaluating this function is computationally inefficient and inhibits posterior estimation of high-dimensional images. Our…
▽ More
We propose a surrogate function for efficient use of score-based priors for Bayesian inverse imaging. Recent work turned score-based diffusion models into probabilistic priors for solving ill-posed imaging problems by appealing to an ODE-based log-probability function. However, evaluating this function is computationally inefficient and inhibits posterior estimation of high-dimensional images. Our proposed surrogate prior is based on the evidence lower-bound of a score-based diffusion model. We demonstrate the surrogate prior on variational inference for efficient approximate posterior sampling of large images. Compared to the exact prior in previous work, our surrogate prior accelerates optimization of the variational image distribution by at least two orders of magnitude. We also find that our principled approach achieves higher-fidelity images than non-Bayesian baselines that involve hyperparameter-tuning at inference. Our work establishes a practical path forward for using score-based diffusion models as general-purpose priors for imaging.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Facing Unknown: Open-World Encrypted Traffic Classification Based on Contrastive Pre-Training
Authors:
Xiang Li,
Beibei Feng,
Tianning Zang,
Shuyuan Zhao,
Jingrun Ma
Abstract:
Traditional Encrypted Traffic Classification (ETC) methods face a significant challenge in classifying large volumes of encrypted traffic in the open-world assumption, i.e., simultaneously classifying the known applications and detecting unknown applications. We propose a novel Open-World Contrastive Pre-training (OWCP) framework for this. OWCP performs contrastive pre-training to obtain a robust…
▽ More
Traditional Encrypted Traffic Classification (ETC) methods face a significant challenge in classifying large volumes of encrypted traffic in the open-world assumption, i.e., simultaneously classifying the known applications and detecting unknown applications. We propose a novel Open-World Contrastive Pre-training (OWCP) framework for this. OWCP performs contrastive pre-training to obtain a robust feature representation. Based on this, we determine the spherical mapping space to find the marginal flows for each known class, which are used to train GANs to synthesize new flows similar to the known parts but do not belong to any class. These synthetic flows are assigned to Softmax's unknown node to modify the classifier, effectively enhancing sensitivity towards known flows and significantly suppressing unknown ones. Extensive experiments on three datasets show that OWCP significantly outperforms existing ETC and generic open-world classification methods. Furthermore, we conduct comprehensive ablation studies and sensitivity analyses to validate each integral component of OWCP.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Joint Beamforming and Antenna Movement Design for Moveable Antenna Systems Based on Statistical CSI
Authors:
Xintai Chen,
Biqian Feng,
Yongpeng Wu,
Derrick Wing Kwan Ng,
Robert Schober
Abstract:
This paper studies a novel movable antenna (MA)-enhanced multiple-input multiple-output (MIMO) system to leverage the corresponding spatial degrees of freedom (DoFs) for improving the performance of wireless communications. We aim to maximize the achievable rate by jointly optimizing the MA positions and the transmit covariance matrix based on statistical channel state information (CSI). To solve…
▽ More
This paper studies a novel movable antenna (MA)-enhanced multiple-input multiple-output (MIMO) system to leverage the corresponding spatial degrees of freedom (DoFs) for improving the performance of wireless communications. We aim to maximize the achievable rate by jointly optimizing the MA positions and the transmit covariance matrix based on statistical channel state information (CSI). To solve the resulting design problem, we develop a constrained stochastic successive convex approximation (CSSCA) algorithm applicable for the general movement mode. Furthermore, we propose two simplified antenna movement modes, namely the linear movement mode and the planar movement mode, to facilitate efficient antenna movement and reduce the computational complexity of the CSSCA algorithm. Numerical results show that the considered MA-enhanced system can significantly improve the achievable rate compared to conventional MIMO systems employing uniform planar arrays (UPAs) and that the proposed planar movement mode performs closely to the performance upper bound achieved by the general movement mode.
△ Less
Submitted 18 August, 2023; v1 submitted 13 August, 2023;
originally announced August 2023.
-
Condition-Adaptive Graph Convolution Learning for Skeleton-Based Gait Recognition
Authors:
Xiaohu Huang,
Xinggang Wang,
Zhidianqiu Jin,
Bo Yang,
Botao He,
Bin Feng,
Wenyu Liu
Abstract:
Graph convolutional networks have been widely applied in skeleton-based gait recognition. A key challenge in this task is to distinguish the individual walking styles of different subjects across various views. Existing state-of-the-art methods employ uniform convolutions to extract features from diverse sequences and ignore the effects of viewpoint changes. To overcome these limitations, we propo…
▽ More
Graph convolutional networks have been widely applied in skeleton-based gait recognition. A key challenge in this task is to distinguish the individual walking styles of different subjects across various views. Existing state-of-the-art methods employ uniform convolutions to extract features from diverse sequences and ignore the effects of viewpoint changes. To overcome these limitations, we propose a condition-adaptive graph (CAG) convolution network that can dynamically adapt to the specific attributes of each skeleton sequence and the corresponding view angle. In contrast to using fixed weights for all joints and sequences, we introduce a joint-specific filter learning (JSFL) module in the CAG method, which produces sequence-adaptive filters at the joint level. The adaptive filters capture fine-grained patterns that are unique to each joint, enabling the extraction of diverse spatial-temporal information about body parts. Additionally, we design a view-adaptive topology learning (VATL) module that generates adaptive graph topologies. These graph topologies are used to correlate the joints adaptively according to the specific view conditions. Thus, CAG can simultaneously adjust to various walking styles and viewpoints. Experiments on the two most widely used datasets (i.e., CASIA-B and OU-MVLP) show that CAG surpasses all previous skeleton-based methods. Moreover, the recognition performance can be enhanced by simply combining CAG with appearance-based methods, demonstrating the ability of CAG to provide useful complementary information.The source code will be available at https://github.com/OliverHxh/CAG.
△ Less
Submitted 13 August, 2023;
originally announced August 2023.
-
3D Motion Magnification: Visualizing Subtle Motions with Time Varying Radiance Fields
Authors:
Brandon Y. Feng,
Hadi Alzayer,
Michael Rubinstein,
William T. Freeman,
Jia-Bin Huang
Abstract:
Motion magnification helps us visualize subtle, imperceptible motion. However, prior methods only work for 2D videos captured with a fixed camera. We present a 3D motion magnification method that can magnify subtle motions from scenes captured by a moving camera, while supporting novel view rendering. We represent the scene with time-varying radiance fields and leverage the Eulerian principle for…
▽ More
Motion magnification helps us visualize subtle, imperceptible motion. However, prior methods only work for 2D videos captured with a fixed camera. We present a 3D motion magnification method that can magnify subtle motions from scenes captured by a moving camera, while supporting novel view rendering. We represent the scene with time-varying radiance fields and leverage the Eulerian principle for motion magnification to extract and amplify the variation of the embedding of a fixed point over time. We study and validate our proposed principle for 3D motion magnification using both implicit and tri-plane-based radiance fields as our underlying 3D scene representation. We evaluate the effectiveness of our method on both synthetic and real-world scenes captured under various camera setups.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Emergent electronic landscapes in a novel valence-ordered nickelate with tri-component nickel coordination
Authors:
Aravind Raji,
Zhengang Dong,
Victor Porée,
Alaska Subedi,
Xiaoyan Li,
Bernat Mundet,
Lucia Varbaro,
Claribel Domínguez,
Marios Hadjimichael,
Bohan Feng,
Alessandro Nicolaou,
Jean-Pascal Rueff,
Danfeng Li,
Alexandre Gloter
Abstract:
The metal-hydride-based topochemical reduction process has produced novel thermodynamically unstable phases across various transition metal oxide series with unusual crystal structures and non-trivial ground states. Here, by such an oxygen (de-) intercalation method we synthesis a novel samarium nickelate with ordered nickel valences associated with tri-component coordination configurations. This…
▽ More
The metal-hydride-based topochemical reduction process has produced novel thermodynamically unstable phases across various transition metal oxide series with unusual crystal structures and non-trivial ground states. Here, by such an oxygen (de-) intercalation method we synthesis a novel samarium nickelate with ordered nickel valences associated with tri-component coordination configurations. This structure, with a formula of Sm$_{9}$Ni$_{9}$O$_{22}$ as revealed by four-dimensional scanning transmission electron microscopy, emerges from the intricate planes of {303}$_{\text{pc}}$ ordered apical oxygen vacancies. X-ray spectroscopy measurements and ab-initio calculations show the coexistence of square-planar, pyramidal and octahedral Ni sites with mono-, bi- and tri-valences. It leads to an intense orbital polarization, charge-ordering, and a ground state with a strong electron localization marked by the disappearance of ligand-hole configuration at low-temperature. This new nickelate compound provides another example of previously inaccessible materials enabled by topotactic transformations and presents a unique platform where mixed Ni valence can give rise to exotic phenomena.
△ Less
Submitted 5 August, 2023;
originally announced August 2023.