-
Image-Conditional Diffusion Transformer for Underwater Image Enhancement
Authors:
Xingyang Nie,
Su Pan,
Xiaoyu Zhai,
Shifei Tao,
Fengzhong Qu,
Biao Wang,
Huilin Ge,
Guojie Xiao
Abstract:
Underwater image enhancement (UIE) has attracted much attention owing to its importance for underwater operation and marine engineering. Motivated by the recent advance in generative models, we propose a novel UIE method based on image-conditional diffusion transformer (ICDT). Our method takes the degraded underwater image as the conditional input and converts it into latent space where ICDT is ap…
▽ More
Underwater image enhancement (UIE) has attracted much attention owing to its importance for underwater operation and marine engineering. Motivated by the recent advance in generative models, we propose a novel UIE method based on image-conditional diffusion transformer (ICDT). Our method takes the degraded underwater image as the conditional input and converts it into latent space where ICDT is applied. ICDT replaces the conventional U-Net backbone in a denoising diffusion probabilistic model (DDPM) with a transformer, and thus inherits favorable properties such as scalability from transformers. Furthermore, we train ICDT with a hybrid loss function involving variances to achieve better log-likelihoods, which meanwhile significantly accelerates the sampling process. We experimentally assess the scalability of ICDTs and compare with prior works in UIE on the Underwater ImageNet dataset. Besides good scaling properties, our largest model, ICDT-XL/2, outperforms all comparison methods, achieving state-of-the-art (SOTA) quality of image enhancement.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Classification of Power Quality Disturbances Using Resnet with Channel Attention Mechanism
Authors:
Su Pan,
Xingyang Nie,
Xiaoyu Zhai,
Biao Wang,
Huilin Ge,
Cheng He,
Zhenping Ding
Abstract:
The detection and classification of power quality disturbances (PQDs) carries significant importance for power systems. In response to this imperative, numerous intelligent diagnostic methods have been developed. However, existing identification methods usually concentrate on single-type signals or on complex signals with two types, rendering them susceptible to noisy labels and environmental effe…
▽ More
The detection and classification of power quality disturbances (PQDs) carries significant importance for power systems. In response to this imperative, numerous intelligent diagnostic methods have been developed. However, existing identification methods usually concentrate on single-type signals or on complex signals with two types, rendering them susceptible to noisy labels and environmental effects. This study proposes a novel method for the classification of PQDs, termed ST-GSResNet, which utilizes the S-Transform and an improved residual neural network (ResNet) with a channel attention mechanism. The ST-GSResNet approach initially uses the S-Transform to transform a time-series signal into a 2D time-frequency image for feature enhancement. Then, an improved ResNet model is introduced, which employs grouped convolution instead of the traditional convolution operation. This improvement aims to facilitate learning with a block-diagonal structured sparsity on the channel dimension, the highly-correlated filters are learned in a more structured way in the networks with filter groups. By reducing the number of parameters in the network in this significant manner, the model becomes less prone to overfitting. Furthermore, the SE module concentrates on primary components, which enhances the model's robustness in recognition and immunity to noise. Experimental results demonstrate that, compared to existing deep learning models, our approach has advantages in computational efficiency and classification accuracy.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Approximate Solutions for Multi-Trip Route Planning in Time-Sensitive Situations
Authors:
Bahar Cavdar,
Joseph Geunes,
Xiaofeng Nie,
Yue Wang
Abstract:
We consider emergent situations that require transporting individuals from their locations to a facility using a single capacitated vehicle, where transportation duration has a negative impact on the individuals. A dispatcher determines routes to maximize total satisfaction. We call this problem the Ambulance Bus Routing Problem. We develop efficient approximate policies for the dispatcher to allo…
▽ More
We consider emergent situations that require transporting individuals from their locations to a facility using a single capacitated vehicle, where transportation duration has a negative impact on the individuals. A dispatcher determines routes to maximize total satisfaction. We call this problem the Ambulance Bus Routing Problem. We develop efficient approximate policies for the dispatcher to allocate individuals to multiple routes, characterize an optimal solution of the relaxed approximate model, and devise a heuristic to obtain a near-optimal integer solution quickly.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Authors:
Jiaqi Wang,
Yuhang Zang,
Pan Zhang,
Tao Chu,
Yuhang Cao,
Zeyi Sun,
Ziyu Liu,
Xiaoyi Dong,
Tong Wu,
Dahua Lin,
Zeming Chen,
Zhi Wang,
Lingchen Meng,
Wenhao Yao,
Jianwei Yang,
Sihong Wu,
Zhineng Chen,
Zuxuan Wu,
Yu-Gang Jiang,
Peixi Wu,
Bosong Chai,
Xuan Nie,
Longquan Yan,
Zeyu Wang,
Qifan Zhou
, et al. (9 additional authors not shown)
Abstract:
Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3…
▽ More
Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3Det Challenge 2024 in conjunction with the 4th Open World Vision Workshop: Visual Perception via Learning in an Open World (VPLOW) at CVPR 2024, Seattle, US. This challenge aims to push the boundaries of object detection research and encourage innovation in this field. The V3Det Challenge 2024 consists of two tracks: 1) Vast Vocabulary Object Detection: This track focuses on detecting objects from a large set of 13204 categories, testing the detection algorithm's ability to recognize and locate diverse objects. 2) Open Vocabulary Object Detection: This track goes a step further, requiring algorithms to detect objects from an open set of categories, including unknown objects. In the following sections, we will provide a comprehensive summary and analysis of the solutions submitted by participants. By analyzing the methods and solutions presented, we aim to inspire future research directions in vast vocabulary and open-vocabulary object detection, driving progress in this field. Challenge homepage: https://v3det.openxlab.org.cn/challenge
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Authors:
Chufan Shi,
Cheng Yang,
Yaxin Liu,
Bo Shui,
Junjie Wang,
Mohan Jing,
Linran Xu,
Xinyu Zhu,
Siheng Li,
Yuxiang Zhang,
Gongye Liu,
Xiaomei Nie,
Deng Cai,
Yujiu Yang
Abstract:
We introduce a new benchmark, ChartMimic, aimed at assessing the visually-grounded code generation capabilities of large multimodal models (LMMs). ChartMimic utilizes information-intensive visual charts and textual instructions as inputs, requiring LMMs to generate the corresponding code for chart rendering. ChartMimic includes 1,000 human-curated (figure, instruction, code) triplets, which repres…
▽ More
We introduce a new benchmark, ChartMimic, aimed at assessing the visually-grounded code generation capabilities of large multimodal models (LMMs). ChartMimic utilizes information-intensive visual charts and textual instructions as inputs, requiring LMMs to generate the corresponding code for chart rendering. ChartMimic includes 1,000 human-curated (figure, instruction, code) triplets, which represent the authentic chart use cases found in scientific papers across various domains(e.g., Physics, Computer Science, Economics, etc). These charts span 18 regular types and 4 advanced types, diversifying into 191 subcategories. Furthermore, we propose multi-level evaluation metrics to provide an automatic and thorough assessment of the output code and the rendered charts. Unlike existing code generation benchmarks, ChartMimic places emphasis on evaluating LMMs' capacity to harmonize a blend of cognitive capabilities, encompassing visual understanding, code generation, and cross-modal reasoning. The evaluation of 3 proprietary models and 11 open-weight models highlights the substantial challenges posed by ChartMimic. Even the advanced GPT-4V, Claude-3-opus only achieve an average score of 73.2 and 53.7, respectively, indicating significant room for improvement. We anticipate that ChartMimic will inspire the development of LMMs, advancing the pursuit of artificial general intelligence.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024
Authors:
Peixi Wu,
Bosong Chai,
Xuan Nie,
Longquan Yan,
Zeyu Wang,
Qifan Zhou,
Boning Wang,
Yansong Peng,
Hebei Li
Abstract:
In this technical report, we present our findings from the research conducted on the Vast Vocabulary Visual Detection (V3Det) dataset for Supervised Vast Vocabulary Visual Detection task. How to deal with complex categories and detection boxes has become a difficulty in this track. The original supervised detector is not suitable for this task. We have designed a series of improvements, including…
▽ More
In this technical report, we present our findings from the research conducted on the Vast Vocabulary Visual Detection (V3Det) dataset for Supervised Vast Vocabulary Visual Detection task. How to deal with complex categories and detection boxes has become a difficulty in this track. The original supervised detector is not suitable for this task. We have designed a series of improvements, including adjustments to the network structure, changes to the loss function, and design of training strategies. Our model has shown improvement over the baseline and achieved excellent rankings on the Leaderboard for both the Vast Vocabulary Object Detection (Supervised) track and the Open Vocabulary Object Detection (OVD) track of the V3Det Challenge 2024.
△ Less
Submitted 21 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Mixed Finite Element Method for Multi-layer Elastic Contact Systems
Authors:
Zhizhuo Zhang,
Mikaël Barboteu,
Xiaobing Nie,
Serge Dumont,
Mahmoud Abdel-Aty,
Jinde Cao
Abstract:
With the development of multi-layer elastic systems in the field of engineering mechanics, the corresponding variational inequality theory and algorithm design have received more attention and research. In this study, a class of equivalent saddle point problems with interlayer Tresca friction conditions and the mixed finite element method are proposed and analyzed. Then, the convergence of the num…
▽ More
With the development of multi-layer elastic systems in the field of engineering mechanics, the corresponding variational inequality theory and algorithm design have received more attention and research. In this study, a class of equivalent saddle point problems with interlayer Tresca friction conditions and the mixed finite element method are proposed and analyzed. Then, the convergence of the numerical solution of the mixed finite element method is theoretically proven, and the corresponding algebraic dual algorithm is given. Finally, through numerical experiments, the mixed finite element method is not only compared with the layer decomposition method, but also its convergence relationship with respect to the spatial discretization parameter $H$ is verified.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
A layer decomposition method for multi-layer elastic contact systems with interlayer Tresca friction
Authors:
Zhizhuo Zhang,
Xiaobing Nie,
Mikaël Barboteu,
Jinde Cao
Abstract:
With the increasing demand for the accuracy of numerical simulation of pavement mechanics, the variational inequality model and its induced finite element method which can simulate the interlayer contact state becomes a potential solution. In this paper, a layer decomposition algorithm for solving variational inequality models of multi-layer elastic contact systems with interlayer Tresca friction…
▽ More
With the increasing demand for the accuracy of numerical simulation of pavement mechanics, the variational inequality model and its induced finite element method which can simulate the interlayer contact state becomes a potential solution. In this paper, a layer decomposition algorithm for solving variational inequality models of multi-layer elastic contact systems with interlayer Tresca friction conditions is studied. Continuous and discrete versions of the algorithm and their convergence theorems have been proposed and proved successively. Then, the algebraic form of the executable optimization algorithm and the numerical experimental results verify the practicability of the variational inequality model and its algorithm in the pavement mechanics modeling.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Experimental Validation of Enhanced Information Capacity by Quantum Switch in Accordance with Thermodynamic Laws
Authors:
Cheng Xi,
Xiangjing Liu,
Hongfeng Liu,
Keyi Huang,
Xinyue Long,
Daniel Ebler,
Xinfang Nie,
Oscar Dahlsten,
Dawei Lu
Abstract:
We experimentally probe the interplay of the quantum switch with the laws of thermodynamics. The quantum switch places two channels in a superposition of orders and may be applied to thermalizing channels. Quantum-switching thermal channels has been shown to give apparent violations of the second law. Central to these apparent violations is how quantum switching channels can increase the capacity…
▽ More
We experimentally probe the interplay of the quantum switch with the laws of thermodynamics. The quantum switch places two channels in a superposition of orders and may be applied to thermalizing channels. Quantum-switching thermal channels has been shown to give apparent violations of the second law. Central to these apparent violations is how quantum switching channels can increase the capacity to communicate information. We experimentally show this increase and how it is consistent with the laws of thermodynamics, demonstrating how thermodynamic resources are consumed. We use a nuclear magnetic resonance approach with coherently controlled interactions of nuclear spin qubits. We verify an analytical upper bound on the increase in capacity for channels that preserve energy and thermal states, and demonstrate that the bound can be exceeded for an energy-altering channel. We show that the switch can be used to take a thermal state to a state that is not thermal, whilst consuming free energy associated with the coherence of a control system. The results show how the switch can be incorporated into quantum thermodynamics experiments as an additional resource.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
The SkatingVerse Workshop & Challenge: Methods and Results
Authors:
Jian Zhao,
Lei Jin,
Jianshu Li,
Zheng Zhu,
Yinglei Teng,
Jiaojiao Zhao,
Sadaf Gulshad,
Zheng Wang,
Bo Zhao,
Xiangbo Shu,
Yunchao Wei,
Xuecheng Nie,
Xiaojie Jin,
Xiaodan Liang,
Shin'ichi Satoh,
Yandong Guo,
Cewu Lu,
Junliang Xing,
Jane Shen Shengmei
Abstract:
The SkatingVerse Workshop & Challenge aims to encourage research in developing novel and accurate methods for human action understanding. The SkatingVerse dataset used for the SkatingVerse Challenge has been publicly released. There are two subsets in the dataset, i.e., the training subset and testing subset. The training subsets consists of 19,993 RGB video sequences, and the testing subsets cons…
▽ More
The SkatingVerse Workshop & Challenge aims to encourage research in developing novel and accurate methods for human action understanding. The SkatingVerse dataset used for the SkatingVerse Challenge has been publicly released. There are two subsets in the dataset, i.e., the training subset and testing subset. The training subsets consists of 19,993 RGB video sequences, and the testing subsets consists of 8,586 RGB video sequences. Around 10 participating teams from the globe competed in the SkatingVerse Challenge. In this paper, we provide a brief summary of the SkatingVerse Workshop & Challenge including brief introductions to the top three methods. The submission leaderboard will be reopened for researchers that are interested in the human action understanding challenge. The benchmark dataset and other information can be found at: https://skatingverse.github.io/.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Authors:
DeepSeek-AI,
Aixin Liu,
Bei Feng,
Bin Wang,
Bingxuan Wang,
Bo Liu,
Chenggang Zhao,
Chengqi Dengr,
Chong Ruan,
Damai Dai,
Daya Guo,
Dejian Yang,
Deli Chen,
Dongjie Ji,
Erhang Li,
Fangyun Lin,
Fuli Luo,
Guangbo Hao,
Guanting Chen,
Guowei Li,
H. Zhang,
Hanwei Xu,
Hao Yang,
Haowei Zhang,
Honghui Ding
, et al. (132 additional authors not shown)
Abstract:
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference…
▽ More
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.
△ Less
Submitted 19 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Authors:
Bin Xiao,
Chunan Shi,
Xiaonan Nie,
Fan Yang,
Xiangwei Deng,
Lei Su,
Weipeng Chen,
Bin Cui
Abstract:
Large language models (LLMs) suffer from low efficiency as the mismatch between the requirement of auto-regressive decoding and the design of most contemporary GPUs. Specifically, billions to trillions of parameters must be loaded to the GPU cache through its limited memory bandwidth for computation, but only a small batch of tokens is actually computed. Consequently, the GPU spends most of its ti…
▽ More
Large language models (LLMs) suffer from low efficiency as the mismatch between the requirement of auto-regressive decoding and the design of most contemporary GPUs. Specifically, billions to trillions of parameters must be loaded to the GPU cache through its limited memory bandwidth for computation, but only a small batch of tokens is actually computed. Consequently, the GPU spends most of its time on memory transfer instead of computation. Recently, parallel decoding, a type of speculative decoding algorithms, is becoming more popular and has demonstrated impressive efficiency improvement in generation. It introduces extra decoding heads to large models, enabling them to predict multiple subsequent tokens simultaneously and verify these candidate continuations in a single decoding step. However, this approach deviates from the training objective of next token prediction used during pre-training, resulting in a low hit rate for candidate tokens. In this paper, we propose a new speculative decoding algorithm, Clover, which integrates sequential knowledge into the parallel decoding process. This enhancement improves the hit rate of speculators and thus boosts the overall efficiency. Clover transmits the sequential knowledge from pre-speculated tokens via the Regressive Connection, then employs an Attention Decoder to integrate these speculated tokens. Additionally, Clover incorporates an Augmenting Block that modifies the hidden states to better align with the purpose of speculative generation rather than next token prediction. The experiment results demonstrate that Clover outperforms the baseline by up to 91% on Baichuan-Small and 146% on Baichuan-Large, respectively, and exceeds the performance of the previously top-performing method, Medusa, by up to 37% on Baichuan-Small and 57% on Baichuan-Large, respectively.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
Defying Imbalanced Forgetting in Class Incremental Learning
Authors:
Shixiong Xu,
Gaofeng Meng,
Xing Nie,
Bolin Ni,
Bin Fan,
Shiming Xiang
Abstract:
We observe a high level of imbalance in the accuracy of different classes in the same old task for the first time. This intriguing phenomenon, discovered in replay-based Class Incremental Learning (CIL), highlights the imbalanced forgetting of learned classes, as their accuracy is similar before the occurrence of catastrophic forgetting. This discovery remains previously unidentified due to the re…
▽ More
We observe a high level of imbalance in the accuracy of different classes in the same old task for the first time. This intriguing phenomenon, discovered in replay-based Class Incremental Learning (CIL), highlights the imbalanced forgetting of learned classes, as their accuracy is similar before the occurrence of catastrophic forgetting. This discovery remains previously unidentified due to the reliance on average incremental accuracy as the measurement for CIL, which assumes that the accuracy of classes within the same task is similar. However, this assumption is invalid in the face of catastrophic forgetting. Further empirical studies indicate that this imbalanced forgetting is caused by conflicts in representation between semantically similar old and new classes. These conflicts are rooted in the data imbalance present in replay-based CIL methods. Building on these insights, we propose CLass-Aware Disentanglement (CLAD) to predict the old classes that are more likely to be forgotten and enhance their accuracy. Importantly, CLAD can be seamlessly integrated into existing CIL methods. Extensive experiments demonstrate that CLAD consistently improves current replay-based methods, resulting in performance gains of up to 2.56%.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
ContextVis: Envision Contextual Learning and Interaction with Generative Models
Authors:
Bo Shui,
Chufan Shi,
Yujiu Yang,
Xiaomei Nie
Abstract:
ContextVis introduces a workflow by integrating generative models to create contextual learning materials. It aims to boost knowledge acquisition through the creation of resources with contextual cues. A case study on vocabulary learning demonstrates the effectiveness of generative models in developing educational resources that enrich language understanding and aid memory retention. The system co…
▽ More
ContextVis introduces a workflow by integrating generative models to create contextual learning materials. It aims to boost knowledge acquisition through the creation of resources with contextual cues. A case study on vocabulary learning demonstrates the effectiveness of generative models in developing educational resources that enrich language understanding and aid memory retention. The system combines an easy-to-use Dashboard for educators with an interactive Playground for learners, establishing a unified platform for content creation and interaction. Future work may expand to include a wider range of generative models, media formats, and customization features for educators.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
High precision proton beam monitor system concept design on CSNS based on SiC
Authors:
Ye He,
Xingchen Li,
Zijun Xu,
Ming Qi,
Congcong Wang,
Chenwei Wang,
Hai Lu,
Xiaojun Nie,
Ruirui Fan,
Hantao Jing,
Weiming Song,
Keqi Wang,
Kai Liu,
Peilian Liu,
Hui Li,
Zaiyi Li,
Chenxi Fu,
Xiyuan Zhang,
Xiaoshen Kang,
Zhan Li,
Weiguo Lu,
Suyu Xiao,
Xin Shi
Abstract:
A high precision beam monitor system based on silicon carbide PIN sensor is designed for China Spallation Neutron Source 1.6 GeV proton beam to monitor the proton beam fluence.The concept design of the beam monitor system is finished together with front-end electronics with silicon carbide PIN sensors, readout system and mechanical system.Several tests are performed to study the performance of eac…
▽ More
A high precision beam monitor system based on silicon carbide PIN sensor is designed for China Spallation Neutron Source 1.6 GeV proton beam to monitor the proton beam fluence.The concept design of the beam monitor system is finished together with front-end electronics with silicon carbide PIN sensors, readout system and mechanical system.Several tests are performed to study the performance of each component of the system.The charge collection of the SiC PIN sensors after proton radiation is studied with 80 MeV proton beam for continuous running. Research on the performance of the front-end electronics and readout system is finished for better data acquisition.The uncertainty of proton beam fluence is below 1% in the beam monitor system.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering
Authors:
Xinmin Qiu,
Congying Han,
Zicheng Zhang,
Bonan Li,
Tiande Guo,
Pingyu Wang,
Xuecheng Nie
Abstract:
Developing blind video deflickering (BVD) algorithms to enhance video temporal consistency, is gaining importance amid the flourish of image processing and video generation. However, the intricate nature of video data complicates the training of deep learning methods, leading to high resource consumption and instability, notably under severe lighting flicker. This underscores the critical need for…
▽ More
Developing blind video deflickering (BVD) algorithms to enhance video temporal consistency, is gaining importance amid the flourish of image processing and video generation. However, the intricate nature of video data complicates the training of deep learning methods, leading to high resource consumption and instability, notably under severe lighting flicker. This underscores the critical need for a compact representation beyond pixel values to advance BVD research and applications. Inspired by the classic scale-time equalization (STE), our work introduces the histogram-assisted solution, called BlazeBVD, for high-fidelity and rapid BVD. Compared with STE, which directly corrects pixel values by temporally smoothing color histograms, BlazeBVD leverages smoothed illumination histograms within STE filtering to ease the challenge of learning temporal data using neural networks. In technique, BlazeBVD begins by condensing pixel values into illumination histograms that precisely capture flickering and local exposure variations. These histograms are then smoothed to produce singular frames set, filtered illumination maps, and exposure maps. Resorting to these deflickering priors, BlazeBVD utilizes a 2D network to restore faithful and consistent texture impacted by lighting changes or localized exposure issues. BlazeBVD also incorporates a lightweight 3D network to amend slight temporal inconsistencies, avoiding the resource consumption issue. Comprehensive experiments on synthetic, real-world and generated videos, showcase the superior qualitative and quantitative results of BlazeBVD, achieving inference speeds up to 10x faster than state-of-the-arts.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
Authors:
Zhe Li,
Laurence T. Yang,
Bocheng Ren,
Xin Nie,
Zhangyang Gao,
Cheng Tan,
Stan Z. Li
Abstract:
The scarcity of annotated data has sparked significant interest in unsupervised pre-training methods that leverage medical reports as auxiliary signals for medical visual representation learning. However, existing research overlooks the multi-granularity nature of medical visual representation and lacks suitable contrastive learning techniques to improve the models' generalizability across differe…
▽ More
The scarcity of annotated data has sparked significant interest in unsupervised pre-training methods that leverage medical reports as auxiliary signals for medical visual representation learning. However, existing research overlooks the multi-granularity nature of medical visual representation and lacks suitable contrastive learning techniques to improve the models' generalizability across different granularities, leading to the underutilization of image-text information. To address this, we propose MLIP, a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning. Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge. Experimental evaluations reveal the efficacy of our model in enhancing transfer performance for tasks such as image classification, object detection, and semantic segmentation. Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Effects of Magnetic Helicity on 3D Equilibria and Self-Organized States in KTX Reversed Field Pinch
Authors:
Ke Liu,
Guodong Yu,
Yuhua Huang,
Wenzhe Mao,
Yidong Xie,
Xianyi Nie,
Hong Li,
Tao Lan,
Jinlin Xie,
Weixing Ding,
Wandong Liu,
Ge Zhuang,
Caoxiang Zhu
Abstract:
The RFP is a toroidal magnetic configuration in which plasmas can spontaneously transform into different self-organized states. Among various states, the QSH state has a dominant component for the magnetic field and significantly improves confinement. Many theoretical and experimental efforts have investigated the transitions among different states. This paper employs the MRxMHD model to study the…
▽ More
The RFP is a toroidal magnetic configuration in which plasmas can spontaneously transform into different self-organized states. Among various states, the QSH state has a dominant component for the magnetic field and significantly improves confinement. Many theoretical and experimental efforts have investigated the transitions among different states. This paper employs the MRxMHD model to study the properties of QSH and other states. The SPEC is used to compute MHD equilibria for the KTX. The toroidal volume of KTX is partitioned into two subvolumes by an internal transport barrier. The geometry of this barrier is adjusted to achieve force balance across the interface, ensuring that the plasma in each subvolume is force-free and that magnetic helicity is conserved. By varying the parameters, we generate distinct self-organized states in KTX. Our findings highlight the crucial role of magnetic helicity in shaping these states. In states with low magnetic helicity in both subvolumes, the plasma exhibits axisymmetric behavior. With increasing core helicity, the plasma gradually transforms from an axisymmetric state to a double-axis helical state and finally to a single-helical-axis state. Elevated core magnetic helicity leads to a more pronounced dominant mode of the boundary magnetic field and a reduced core magnetic shear. This is consistent with previous experimental and numerical results in other RFP devices. We find a linear relationship between the plasma current and helicity in different self-organized states. Our findings suggest that KTX may enter the QSH state when the toroidal current reaches 0.72 MA. This study demonstrates that the stellarator equilibrium code SPEC unveils crucial RFP equilibrium properties, rendering it applicable to a broad range of RFP devices and other toroidal configurations.
△ Less
Submitted 6 April, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Quasi-single-stage optimization for permanent magnet stellarators
Authors:
Guodong Yu,
Ke Liu,
Tianyi Qian,
Yidong Xie,
Xianyi Nie,
Caoxiang Zhu
Abstract:
Advanced stellarators are typically optimized in two stages. The plasma equilibrium is optimized first, followed by the design of coils/permanent magnets. However, the coils/permanent magnets in the second stage may become too complex to achieve the desired equilibrium. To address this problem, a quasi-single-stage optimization method has been proposed. In this paper, we introduce this method for…
▽ More
Advanced stellarators are typically optimized in two stages. The plasma equilibrium is optimized first, followed by the design of coils/permanent magnets. However, the coils/permanent magnets in the second stage may become too complex to achieve the desired equilibrium. To address this problem, a quasi-single-stage optimization method has been proposed. In this paper, we introduce this method for designing permanent magnet (PM) stellarators. The new approach combines straightforward PM metrics to penalize the maximum required PM thickness and the mismatch between the fixed-boundary equilibrium and the free-boundary one, along with typical physical targets. Since the degrees of freedom of the PMs are not included and directly used to minimize the objective function in this method, so we call it "quasi-single-stage" optimization. We apply this quasi-single-stage optimization method to find a new quasi-axisymmetric PM design. The new design starts from MUSE, which was initially designed using a two-stage optimization approach. The resulting design, MUSE++, exhibits an order of magnitude lower quasi-symmetric error and a one-order reduction in normal field error. We show that MUSE++ has approximately 30% fewer magnets compared to a proxy model "MUSE-0" that uses the same FAMUS optimization without the benefit of a single-stage equilibrium optimization. These results demonstrate that the new single-stage optimization method can concurrently improve plasma properties and simplify permanent magnet complexity.
△ Less
Submitted 30 April, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Authors:
DeepSeek-AI,
:,
Xiao Bi,
Deli Chen,
Guanting Chen,
Shanhuang Chen,
Damai Dai,
Chengqi Deng,
Honghui Ding,
Kai Dong,
Qiushi Du,
Zhe Fu,
Huazuo Gao,
Kaige Gao,
Wenjun Gao,
Ruiqi Ge,
Kang Guan,
Daya Guo,
Jianzhong Guo,
Guangbo Hao,
Zhewen Hao,
Ying He,
Wenjie Hu,
Panpan Huang,
Erhang Li
, et al. (63 additional authors not shown)
Abstract:
The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B…
▽ More
The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Authors:
Qi Yang,
Xing Nie,
Tong Li,
Pengfei Gao,
Ying Guo,
Cheng Zhen,
Pengfei Yan,
Shiming Xiang
Abstract:
Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral…
▽ More
Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral relatiOns. For the first time, our framework explores three types of bilateral entanglements within AVS: pixel entanglement, modality entanglement, and temporal entanglement. Regarding pixel entanglement, we employ a Siam-Encoder Module (SEM) that leverages prior knowledge to generate more precise visual features from the foundational model. For modality entanglement, we design a Bilateral-Fusion Module (BFM), enabling COMBO to align corresponding visual and auditory signals bi-directionally. As for temporal entanglement, we introduce an innovative adaptive inter-frame consistency loss according to the inherent rules of temporal. Comprehensive experiments and ablation studies on AVSBench-object (84.7 mIoU on S4, 59.2 mIou on MS3) and AVSBench-semantic (42.1 mIoU on AVSS) datasets demonstrate that COMBO surpasses previous state-of-the-art methods. Code and more results will be publicly available at https://yannqi.github.io/AVS-COMBO/.
△ Less
Submitted 7 April, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training
Authors:
Runze He,
Shaofei Huang,
Xuecheng Nie,
Tianrui Hui,
Luoqi Liu,
Jiao Dai,
Jizhong Han,
Guanbin Li,
Si Liu
Abstract:
In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt. However, obtaining desired editing results conformed with the editing prompt is nontrivial since there exist two significant challenges, including accurate editing of only foreground regions and multi-view consistency…
▽ More
In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt. However, obtaining desired editing results conformed with the editing prompt is nontrivial since there exist two significant challenges, including accurate editing of only foreground regions and multi-view consistency given a single-view reference image. To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing, aimed at foreground-only manipulation while preserving the background. For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem among different views in image-driven editing. Extensive experiments show that our CustomNeRF produces precise editing results under various real scenes for both text- and image-driven settings.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Triplet Attention Transformer for Spatiotemporal Predictive Learning
Authors:
Xuesong Nie,
Xi Chen,
Haoyuan Jin,
Zhihang Zhu,
Yunfeng Yan,
Donglian Qi
Abstract:
Spatiotemporal predictive learning offers a self-supervised learning paradigm that enables models to learn both spatial and temporal patterns by predicting future sequences based on historical sequences. Mainstream methods are dominated by recurrent units, yet they are limited by their lack of parallelization and often underperform in real-world scenarios. To improve prediction quality while maint…
▽ More
Spatiotemporal predictive learning offers a self-supervised learning paradigm that enables models to learn both spatial and temporal patterns by predicting future sequences based on historical sequences. Mainstream methods are dominated by recurrent units, yet they are limited by their lack of parallelization and often underperform in real-world scenarios. To improve prediction quality while maintaining computational efficiency, we propose an innovative triplet attention transformer designed to capture both inter-frame dynamics and intra-frame static features. Specifically, the model incorporates the Triplet Attention Module (TAM), which replaces traditional recurrent units by exploring self-attention mechanisms in temporal, spatial, and channel dimensions. In this configuration: (i) temporal tokens contain abstract representations of inter-frame, facilitating the capture of inherent temporal dependencies; (ii) spatial and channel attention combine to refine the intra-frame representation by performing fine-grained interactions across spatial and channel dimensions. Alternating temporal, spatial, and channel-level attention allows our approach to learn more complex short- and long-range spatiotemporal dependencies. Extensive experiments demonstrate performance surpassing existing recurrent-based and recurrent-free methods, achieving state-of-the-art under multi-scenario examination including moving object trajectory prediction, traffic flow prediction, driving scene prediction, and human motion capture.
△ Less
Submitted 28 October, 2023;
originally announced October 2023.
-
OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language Models
Authors:
Yuhe Liu,
Changhua Pei,
Longlong Xu,
Bohan Chen,
Mingze Sun,
Zhirui Zhang,
Yongqian Sun,
Shenglin Zhang,
Kun Wang,
Haiming Zhang,
Jianhui Li,
Gaogang Xie,
Xidao Wen,
Xiaohui Nie,
Minghua Ma,
Dan Pei
Abstract:
Information Technology (IT) Operations (Ops), particularly Artificial Intelligence for IT Operations (AIOps), is the guarantee for maintaining the orderly and stable operation of existing information systems. According to Gartner's prediction, the use of AI technology for automated IT operations has become a new trend. Large language models (LLMs) that have exhibited remarkable capabilities in NLP…
▽ More
Information Technology (IT) Operations (Ops), particularly Artificial Intelligence for IT Operations (AIOps), is the guarantee for maintaining the orderly and stable operation of existing information systems. According to Gartner's prediction, the use of AI technology for automated IT operations has become a new trend. Large language models (LLMs) that have exhibited remarkable capabilities in NLP-related tasks, are showing great potential in the field of AIOps, such as in aspects of root cause analysis of failures, generation of operations and maintenance scripts, and summarizing of alert information. Nevertheless, the performance of current LLMs in Ops tasks is yet to be determined. In this paper, we present OpsEval, a comprehensive task-oriented Ops benchmark designed for LLMs. For the first time, OpsEval assesses LLMs' proficiency in various crucial scenarios at different ability levels. The benchmark includes 7184 multi-choice questions and 1736 question-answering (QA) formats in English and Chinese. By conducting a comprehensive performance evaluation of the current leading large language models, we show how various LLM techniques can affect the performance of Ops, and discussed findings related to various topics, including model quantification, QA evaluation, and hallucination issues. To ensure the credibility of our evaluation, we invite dozens of domain experts to manually review our questions. At the same time, we have open-sourced 20% of the test QA to assist current researchers in preliminary evaluations of their OpsLLM models. The remaining 80% of the data, which is not disclosed, is used to eliminate the issue of the test set leakage. Additionally, we have constructed an online leaderboard that is updated in real-time and will continue to be updated, ensuring that any newly emerging LLMs will be evaluated promptly. Both our dataset and leaderboard have been made public.
△ Less
Submitted 16 February, 2024; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Efficient Creation of Ultracold Ground State $^{6}\textrm{Li}^{40}\textrm{K}$ Polar Molecules
Authors:
Canming He,
Xiaoyu Nie,
Victor Avalos,
Sofia Botsi,
Sunil Kumar,
Anbang Yang,
Kai Dieckmann
Abstract:
We report the creation of ultracold ground state $^{6}\textrm{Li}^{40}\textrm{K}$ polar molecules with high efficiency. Starting from weakly-bound molecules state, stimulated Raman adiabatic passage (STIRAP) is adopted to coherently transfer the molecules to their singlet ro-vibrational ground state $|\textrm{X}^{1}Σ^{+},v=0,J=0>$. By employing a singlet STIRAP pathway and low-phase-noise narrow-l…
▽ More
We report the creation of ultracold ground state $^{6}\textrm{Li}^{40}\textrm{K}$ polar molecules with high efficiency. Starting from weakly-bound molecules state, stimulated Raman adiabatic passage (STIRAP) is adopted to coherently transfer the molecules to their singlet ro-vibrational ground state $|\textrm{X}^{1}Σ^{+},v=0,J=0>$. By employing a singlet STIRAP pathway and low-phase-noise narrow-linewidth lasers, we observed a one-way transfer efficiency of 96(4)\,\%. Held in an optical dipole trap, the lifetime of the ground-state molecules is measured to be 5.0(3)\,ms. The large permanent dipole moment of LiK is confirmed by applying a DC electric field on the molecules and performing Stark shift spectroscopy of the ground state. With recent advances in the quantum control of collisions, our work paves the way for exploring quantum many-body physics with strongly-interacting $^{6}\textrm{Li}^{40}\textrm{K}$ molecules.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
A spin-rotation mechanism of Einstein-de Haas effect based on a ferromagnetic disk
Authors:
Xin Nie,
Jun Li,
Trinanjan Datta,
Dao-Xin Yao
Abstract:
Spin-rotation coupling (SRC) is a fundamental phenomenon that connects electronic spins with the rotational motion of a medium. We elucidate the Einstein-de Haas (EdH) effect and its inverse with SRC as the microscopic mechanism using the dynamic spin-lattice equations derived by elasticity theory and Lagrangian formalism. By applying the coupling equations to an iron disk in a magnetic field, we…
▽ More
Spin-rotation coupling (SRC) is a fundamental phenomenon that connects electronic spins with the rotational motion of a medium. We elucidate the Einstein-de Haas (EdH) effect and its inverse with SRC as the microscopic mechanism using the dynamic spin-lattice equations derived by elasticity theory and Lagrangian formalism. By applying the coupling equations to an iron disk in a magnetic field, we exhibit the transfer of angular momentum and energy between spins and lattice, with or without damping. The timescale of the angular momentum transfer from spins to the entire lattice is estimated by our theory to be on the order of 0.01 ns, for the disk with a radius of 100 nm. Moreover, we discover a linear relationship between the magnetic field strength and the rotation frequency, which is also enhanced by a higher ratio of Young's modulus to Poisson's coefficient. In the presence of damping, we notice that the spin-lattice relaxation time is nearly inversely proportional to the magnetic field. Our explorations will contribute to a better understanding of the EdH effect and provide valuable insights for magneto-mechanical manufacturing.
△ Less
Submitted 8 April, 2024; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Non-equilibrium phases of Fermi gas inside a cavity with imbalanced pumping
Authors:
Xiaotian Nie,
Wei Zheng
Abstract:
In this work, we investigate the non-equilibrium dynamics of one-dimensional spinless fermions loaded in a cavity with imbalanced pumping lasers. Our study is motivated by previous work on a similar setup using bosons, and we explore the unique properties of fermionic systems in this context. By considering the imbalance in the pumping, we find that the system exhibits multiple superradiant steady…
▽ More
In this work, we investigate the non-equilibrium dynamics of one-dimensional spinless fermions loaded in a cavity with imbalanced pumping lasers. Our study is motivated by previous work on a similar setup using bosons, and we explore the unique properties of fermionic systems in this context. By considering the imbalance in the pumping, we find that the system exhibits multiple superradiant steady phases and an unstable phase. Furthermore, by making use of the hysteresis structure of superradiant phases, we propose a unidirectional topological pumping. Unlike the usual topological pumping in which the driving protocol breaks time reversal symmetry, the driving protocol can be time reversal invariant in our proposal.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Authors:
Yujie Wang,
Youhe Jiang,
Xupeng Miao,
Fangcheng Fu,
Shenhan Zhu,
Xiaonan Nie,
Yaofeng Tu,
Bin Cui
Abstract:
Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models. However, efficiently training these models across multiple GPUs remains a complex challenge due to the abundance of parallelism options. Existing DL systems either require manual efforts…
▽ More
Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models. However, efficiently training these models across multiple GPUs remains a complex challenge due to the abundance of parallelism options. Existing DL systems either require manual efforts to design distributed training plans or limit parallelism combinations to a constrained search space. In this paper, we present Galvatron-BMW, a novel system framework that integrates multiple prevalent parallelism dimensions and automatically identifies the most efficient hybrid parallelism strategy. To effectively navigate this vast search space, we employ a decision tree approach for decomposition and pruning based on intuitive insights. We further utilize a dynamic programming search algorithm to derive the optimal plan. Moreover, to improve resource utilization and enhance system efficiency, we propose a bi-objective optimization workflow that focuses on workload balance. Our evaluations on different Transformer models demonstrate the capabilities of Galvatron-BMW in automating distributed training under varying GPU memory constraints. Across all tested scenarios, Galvatron-BMW consistently achieves superior system throughput, surpassing previous approaches that rely on limited parallelism strategies.
△ Less
Submitted 24 February, 2024; v1 submitted 5 July, 2023;
originally announced July 2023.
-
Boundary metric of Epstein-Penner convex hull and discrete conformality
Authors:
Xin Nie
Abstract:
The Epstein-Penner convex hull construction associates to every decorated punctured hyperbolic surface a polyhedral convex body in the Minkowski space. It works in the de Sitter and anti-de Sitter spaces as well. In these three spaces, the quotient of the spacelike boundary part of the convex body has an induced Euclidean, spherical and hyperbolic metric, respectively, with conical singularities.…
▽ More
The Epstein-Penner convex hull construction associates to every decorated punctured hyperbolic surface a polyhedral convex body in the Minkowski space. It works in the de Sitter and anti-de Sitter spaces as well. In these three spaces, the quotient of the spacelike boundary part of the convex body has an induced Euclidean, spherical and hyperbolic metric, respectively, with conical singularities. We show that this gives a bijection from the decorated Teichmüller space to a moduli space of such metrics in the Euclidean and hyperbolic cases, as well as a bijection between specific subspaces of them in the spherical case. Moreover, varying the decoration of a fixed hyperbolic surface corresponds to a discrete conformal change of the metric. This gives a new $3$-dimensional interpretation of discrete conformality which is in a sense inverse to the Bobenko-Pinkall-Springborn interpretation.
△ Less
Submitted 3 July, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Improving Generalization in Meta-Learning via Meta-Gradient Augmentation
Authors:
Ren Wang,
Haoliang Sun,
Qi Wei,
Xiushan Nie,
Yuling Ma,
Yilong Yin
Abstract:
Meta-learning methods typically follow a two-loop framework, where each loop potentially suffers from notorious overfitting, hindering rapid adaptation and generalization to new tasks. Existing schemes solve it by enhancing the mutual-exclusivity or diversity of training samples, but these data manipulation strategies are data-dependent and insufficiently flexible. This work alleviates overfitting…
▽ More
Meta-learning methods typically follow a two-loop framework, where each loop potentially suffers from notorious overfitting, hindering rapid adaptation and generalization to new tasks. Existing schemes solve it by enhancing the mutual-exclusivity or diversity of training samples, but these data manipulation strategies are data-dependent and insufficiently flexible. This work alleviates overfitting in meta-learning from the perspective of gradient regularization and proposes a data-independent \textbf{M}eta-\textbf{G}radient \textbf{Aug}mentation (\textbf{MGAug}) method. The key idea is to first break the rote memories by network pruning to address memorization overfitting in the inner loop, and then the gradients of pruned sub-networks naturally form the high-quality augmentation of the meta-gradient to alleviate learner overfitting in the outer loop. Specifically, we explore three pruning strategies, including \textit{random width pruning}, \textit{random parameter pruning}, and a newly proposed \textit{catfish pruning} that measures a Meta-Memorization Carrying Amount (MMCA) score for each parameter and prunes high-score ones to break rote memories as much as possible. The proposed MGAug is theoretically guaranteed by the generalization bound from the PAC-Bayes framework. In addition, we extend a lightweight version, called MGAug-MaxUp, as a trade-off between performance gains and resource overhead. Extensive experiments on multiple few-shot learning benchmarks validate MGAug's effectiveness and significant improvement over various meta-baselines. The code is publicly available at \url{https://github.com/xxLifeLover/Meta-Gradient-Augmentation}.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Towards Consistent Video Editing with Text-to-Image Diffusion Models
Authors:
Zicheng Zhang,
Bonan Li,
Xuecheng Nie,
Congying Han,
Tiande Guo,
Luoqi Liu
Abstract:
Existing works have advanced Text-to-Image (TTI) diffusion models for video editing in a one-shot learning manner. Despite their low requirements of data and computation, these methods might produce results of unsatisfied consistency with text prompt as well as temporal sequence, limiting their applications in the real world. In this paper, we propose to address the above issues with a novel EI…
▽ More
Existing works have advanced Text-to-Image (TTI) diffusion models for video editing in a one-shot learning manner. Despite their low requirements of data and computation, these methods might produce results of unsatisfied consistency with text prompt as well as temporal sequence, limiting their applications in the real world. In this paper, we propose to address the above issues with a novel EI$^2$ model towards \textbf{E}nhancing v\textbf{I}deo \textbf{E}diting cons\textbf{I}stency of TTI-based frameworks. Specifically, we analyze and find that the inconsistent problem is caused by newly added modules into TTI models for learning temporal information. These modules lead to covariate shift in the feature space, which harms the editing capability. Thus, we design EI$^2$ to tackle the above drawbacks with two classical modules: Shift-restricted Temporal Attention Module (STAM) and Fine-coarse Frame Attention Module (FFAM). First, through theoretical analysis, we demonstrate that covariate shift is highly related to Layer Normalization, thus STAM employs a \textit{Instance Centering} layer replacing it to preserve the distribution of temporal features. In addition, {STAM} employs an attention layer with normalized mapping to transform temporal features while constraining the variance shift. As the second part, we incorporate {STAM} with a novel {FFAM}, which efficiently leverages fine-coarse spatial information of overall frames to further enhance temporal consistency. Extensive experiments demonstrate the superiority of the proposed EI$^2$ model for text-driven video editing.
△ Less
Submitted 27 May, 2023;
originally announced May 2023.
-
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Authors:
Youhe Jiang,
Fangcheng Fu,
Xupeng Miao,
Xiaonan Nie,
Bin Cui
Abstract:
Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the distributed training of ultra-large models. However, directly deploying these systems often leads to sub-optimal training efficiency due to the complex model architecture…
▽ More
Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the distributed training of ultra-large models. However, directly deploying these systems often leads to sub-optimal training efficiency due to the complex model architectures and the strict device memory constraints. In this paper, we propose Optimal Sharded Data Parallel (OSDP), an automated parallel training system that combines the advantages from both data and model parallelism. Given the model description and the device information, OSDP makes trade-offs between the memory consumption and the hardware utilization, thus automatically generates the distributed computation graph and maximizes the overall system throughput. In addition, OSDP introduces operator splitting to further alleviate peak memory footprints during training with negligible overheads, which enables the trainability of larger models as well as the higher throughput. Extensive experimental results of OSDP on multiple different kinds of large-scale models demonstrate that the proposed strategy outperforms the state-of-the-art in multiple regards.
△ Less
Submitted 17 May, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
DiffBFR: Bootstrapping Diffusion Model Towards Blind Face Restoration
Authors:
Xinmin Qiu,
Congying Han,
Zicheng Zhang,
Bonan Li,
Tiande Guo,
Xuecheng Nie
Abstract:
Blind face restoration (BFR) is important while challenging. Prior works prefer to exploit GAN-based frameworks to tackle this task due to the balance of quality and efficiency. However, these methods suffer from poor stability and adaptability to long-tail distribution, failing to simultaneously retain source identity and restore detail. We propose DiffBFR to introduce Diffusion Probabilistic Mod…
▽ More
Blind face restoration (BFR) is important while challenging. Prior works prefer to exploit GAN-based frameworks to tackle this task due to the balance of quality and efficiency. However, these methods suffer from poor stability and adaptability to long-tail distribution, failing to simultaneously retain source identity and restore detail. We propose DiffBFR to introduce Diffusion Probabilistic Model (DPM) for BFR to tackle the above problem, given its superiority over GAN in aspects of avoiding training collapse and generating long-tail distribution. DiffBFR utilizes a two-step design, that first restores identity information from low-quality images and then enhances texture details according to the distribution of real faces. This design is implemented with two key components: 1) Identity Restoration Module (IRM) for preserving the face details in results. Instead of denoising from pure Gaussian random distribution with LQ images as the condition during the reverse process, we propose a novel truncated sampling method which starts from LQ images with part noise added. We theoretically prove that this change shrinks the evidence lower bound of DPM and then restores more original details. With theoretical proof, two cascade conditional DPMs with different input sizes are introduced to strengthen this sampling effect and reduce training difficulty in the high-resolution image generated directly. 2) Texture Enhancement Module (TEM) for polishing the texture of the image. Here an unconditional DPM, a LQ-free model, is introduced to further force the restorations to appear realistic. We theoretically proved that this unconditional DPM trained on pure HQ images contributes to justifying the correct distribution of inference images output from IRM in pixel-level space. Truncated sampling with fractional time step is utilized to polish pixel-level textures while preserving identity information.
△ Less
Submitted 8 August, 2023; v1 submitted 8 May, 2023;
originally announced May 2023.
-
Rearrangement inequalities of the one-dimensional maximal functions associated with general measures
Authors:
Xudong Nie,
Di Wu,
Panwang Wang
Abstract:
We prove a rearrangement inequality for the uncentered Hardy-Littlewood maximal function $M_μ$ associate to general measure $μ$ on $\mathbb{R}$. This inequality is analogous to the Stein's result
$cf^{**}(t)\leq(Mf)^{*}(t)\leq C f^{**}(t)$, where $f^*$ is the symmetric decreasing rearrangement function of $f$ and $f^{**}(t)=\int_0^tf^*(x)dx$.
Moreover, we compute the best constant of $M_μ$ on…
▽ More
We prove a rearrangement inequality for the uncentered Hardy-Littlewood maximal function $M_μ$ associate to general measure $μ$ on $\mathbb{R}$. This inequality is analogous to the Stein's result
$cf^{**}(t)\leq(Mf)^{*}(t)\leq C f^{**}(t)$, where $f^*$ is the symmetric decreasing rearrangement function of $f$ and $f^{**}(t)=\int_0^tf^*(x)dx$.
Moreover, we compute the best constant of $M_μ$ on
$L^{p,\infty}(\mathbb{R},dμ)$.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Authors:
Xiaonan Nie,
Xupeng Miao,
Zilong Wang,
Zichao Yang,
Jilong Xue,
Lingxiao Ma,
Gang Cao,
Bin Cui
Abstract:
With the increasing data volume, there is a trend of using large-scale pre-trained models to store the knowledge into an enormous number of model parameters. The training of these models is composed of lots of dense algebras, requiring a huge amount of hardware resources. Recently, sparsely-gated Mixture-of-Experts (MoEs) are becoming more popular and have demonstrated impressive pretraining scala…
▽ More
With the increasing data volume, there is a trend of using large-scale pre-trained models to store the knowledge into an enormous number of model parameters. The training of these models is composed of lots of dense algebras, requiring a huge amount of hardware resources. Recently, sparsely-gated Mixture-of-Experts (MoEs) are becoming more popular and have demonstrated impressive pretraining scalability in various downstream tasks. However, such a sparse conditional computation may not be effective as expected in practical systems due to the routing imbalance and fluctuation problems. Generally, MoEs are becoming a new data analytics paradigm in the data life cycle and suffering from unique challenges at scales, complexities, and granularities never before possible.
In this paper, we propose a novel DNN training framework, FlexMoE, which systematically and transparently address the inefficiency caused by dynamic dataflow. We first present an empirical analysis on the problems and opportunities of training MoE models, which motivates us to overcome the routing imbalance and fluctuation problems by a dynamic expert management and device placement mechanism. Then we introduce a novel scheduling module over the existing DNN runtime to monitor the data flow, make the scheduling plans, and dynamically adjust the model-to-hardware mapping guided by the real-time data traffic. A simple but efficient heuristic algorithm is exploited to dynamically optimize the device placement during training. We have conducted experiments on both NLP models (e.g., BERT and GPT) and vision models (e.g., Swin). And results show FlexMoE can achieve superior performance compared with existing systems on real-world workloads -- FlexMoE outperforms DeepSpeed by 1.70x on average and up to 2.10x, and outperforms FasterMoE by 1.30x on average and up to 1.45x.
△ Less
Submitted 8 April, 2023;
originally announced April 2023.
-
Secure and Multi-Step Computation Offloading and Resource Allocation in Ultra-Dense Multi-Task NOMA-Enabled IoT Networks
Authors:
Tianqing Zhou,
Yanyan Fu,
Dong Qin,
Xuefang Nie,
Nan Jiang,
Chunguo Li
Abstract:
Ultra-dense networks are widely regarded as a promising solution to explosively growing applications of Internet-of-Things (IoT) mobile devices (IMDs). However, complicated and severe interferences need to be tackled properly in such networks. To this end, both orthogonal multiple access (OMA) and non-orthogonal multiple access (NOMA) are utilized at first. Then, in order to attain a goal of green…
▽ More
Ultra-dense networks are widely regarded as a promising solution to explosively growing applications of Internet-of-Things (IoT) mobile devices (IMDs). However, complicated and severe interferences need to be tackled properly in such networks. To this end, both orthogonal multiple access (OMA) and non-orthogonal multiple access (NOMA) are utilized at first. Then, in order to attain a goal of green and secure computation offloading, under the proportional allocation of computational resources and the constraints of latency and security cost, joint device association, channel selection, security service assignment, power control and computation offloading are done for minimizing the overall energy consumed by all IMDs. It is noteworthy that multi-step computation offloading is concentrated to balance the network loads and utilize computing resources fully. Since the finally formulated problem is in a nonlinear mixed-integer form, it may be very difficult to find its closed-form solution. To solve it, an improved whale optimization algorithm (IWOA) is designed. As for this algorithm, the convergence, computational complexity and parallel implementation are analyzed in detail. Simulation results show that the designed algorithm may achieve lower energy consumption than other existing algorithms under the constraints of latency and security cost.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.
-
StyO: Stylize Your Face in Only One-Shot
Authors:
Bonan Li,
Zicheng Zhang,
Xuecheng Nie,
Congying Han,
Yinhan Hu,
Tiande Guo
Abstract:
This paper focuses on face stylization with a single artistic target. Existing works for this task often fail to retain the source content while achieving geometry variation. Here, we present a novel StyO model, ie. Stylize the face in only One-shot, to solve the above problem. In particular, StyO exploits a disentanglement and recombination strategy. It first disentangles the content and style of…
▽ More
This paper focuses on face stylization with a single artistic target. Existing works for this task often fail to retain the source content while achieving geometry variation. Here, we present a novel StyO model, ie. Stylize the face in only One-shot, to solve the above problem. In particular, StyO exploits a disentanglement and recombination strategy. It first disentangles the content and style of source and target images into identifiers, which are then recombined in a cross manner to derive the stylized face image. In this way, StyO decomposes complex images into independent and specific attributes, and simplifies one-shot face stylization as the combination of different attributes from input images, thus producing results better matching face geometry of target image and content of source one. StyO is implemented with latent diffusion models (LDM) and composed of two key modules: 1) Identifier Disentanglement Learner (IDL) for disentanglement phase. It represents identifiers as contrastive text prompts, ie. positive and negative descriptions. And it introduces a novel triple reconstruction loss to fine-tune the pre-trained LDM for encoding style and content into corresponding identifiers; 2) Fine-grained Content Controller (FCC) for the recombination phase. It recombines disentangled identifiers from IDL to form an augmented text prompt for generating stylized faces. In addition, FCC also constrains the cross-attention maps of latent and text features to preserve source face details in results. The extensive evaluation shows that StyO produces high-quality images on numerous paintings of various styles and outperforms the current state-of-the-art. Code will be released upon acceptance.
△ Less
Submitted 6 March, 2023; v1 submitted 6 March, 2023;
originally announced March 2023.
-
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Authors:
Xiaonan Nie,
Yi Liu,
Fangcheng Fu,
Jinbao Xue,
Dian Jiao,
Xupeng Miao,
Yangyu Tao,
Bin Cui
Abstract:
Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially the Transformer models. Many products and services in Tencent Inc., such as WeChat, QQ, and Tencent Advertisement, have been opted in to gain the power of pre-trained models. In this work, we present Angel-PTM, a productive deep learning system designed for pre-training and fine-tuning Transfor…
▽ More
Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially the Transformer models. Many products and services in Tencent Inc., such as WeChat, QQ, and Tencent Advertisement, have been opted in to gain the power of pre-trained models. In this work, we present Angel-PTM, a productive deep learning system designed for pre-training and fine-tuning Transformer models. Angel-PTM can train extremely large-scale models with hierarchical memory efficiently. The key designs of Angel-PTM are the fine-grained memory management via the Page abstraction and a unified scheduling method that coordinate the computations, data movements, and communications. Furthermore, Angel-PTM supports extreme model scaling with SSD storage and implements the lock-free updating mechanism to address the SSD I/O bandwidth bottlenecks. Experimental results demonstrate that Angel-PTM outperforms existing systems by up to 114.8% in terms of maximum model scale as well as up to 88.9% in terms of training throughput. Additionally, experiments on GPT3-175B and T5-MoE-1.2T models utilizing hundreds of GPUs verify the strong scalability of Angel-PTM.
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
On circle patterns and spherical conical metrics
Authors:
Xin Nie
Abstract:
The Koebe-Andreev-Thurston circle packing theorem, as well as its generalization to circle patterns due to Bobenko and Springborn, holds for Euclidean and hyperbolic metrics possibly with conical singularities, but fails for spherical metrics because of the non-uniqueness coming from Möbius transformations. In this paper, we show that a unique existence result for circle pattern with spherical con…
▽ More
The Koebe-Andreev-Thurston circle packing theorem, as well as its generalization to circle patterns due to Bobenko and Springborn, holds for Euclidean and hyperbolic metrics possibly with conical singularities, but fails for spherical metrics because of the non-uniqueness coming from Möbius transformations. In this paper, we show that a unique existence result for circle pattern with spherical conical metric holds if one prescribes the geodesic total curvature of each circle instead of the cone angles.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
Real-time Bidding Strategy in Display Advertising: An Empirical Analysis
Authors:
Mengjuan Liu,
Zhengning Hu,
Zhi Lai,
Daiwei Zheng,
Xuyun Nie
Abstract:
Bidding strategies that help advertisers determine bidding prices are receiving increasing attention as more and more ad impressions are sold through real-time bidding systems. This paper first describes the problem and challenges of optimizing bidding strategies for individual advertisers in real-time bidding display advertising. Then, several representative bidding strategies are introduced, esp…
▽ More
Bidding strategies that help advertisers determine bidding prices are receiving increasing attention as more and more ad impressions are sold through real-time bidding systems. This paper first describes the problem and challenges of optimizing bidding strategies for individual advertisers in real-time bidding display advertising. Then, several representative bidding strategies are introduced, especially the research advances and challenges of reinforcement learning-based bidding strategies. Further, we quantitatively evaluate the performance of several representative bidding strategies on the iPinYou dataset. Specifically, we examine the effects of state, action, and reward function on the performance of reinforcement learning-based bidding strategies. Finally, we summarize the general steps for optimizing bidding strategies using reinforcement learning algorithms and present our suggestions.
△ Less
Submitted 30 November, 2022;
originally announced December 2022.
-
Practical quantum simulation of small-scale non-Hermitian dynamics
Authors:
Hongfeng Liu,
Xiaodong Yang,
Kai Tang,
Liangyu Che,
Xinfang Nie,
Tao Xin,
Jun Li,
Dawei Lu
Abstract:
Non-Hermitian quantum systems have recently attracted considerable attention due to their exotic properties. Though many experimental realizations of non-Hermitian systems have been reported, the non-Hermiticity usually resorts to the hard-to-control environments and cannot last for too long times. An alternative approach is to use quantum simulation with the closed system, whereas how to simulate…
▽ More
Non-Hermitian quantum systems have recently attracted considerable attention due to their exotic properties. Though many experimental realizations of non-Hermitian systems have been reported, the non-Hermiticity usually resorts to the hard-to-control environments and cannot last for too long times. An alternative approach is to use quantum simulation with the closed system, whereas how to simulate non-Hermitian Hamiltonian dynamics remains a great challenge. To tackle this problem, we propose a protocol which combines a dilation method with the variational quantum algorithm. The dilation method is used to transform a non-Hermitian Hamiltonian into a Hermitian one through an exquisite quantum circuit, while the variational quantum algorithm is for efficiently approximating the complex entangled gates in this circuit. As a demonstration, we apply our protocol to simulate the dynamics of an Ising chain with nonlocal non-Hermitian perturbations, which is an important model to study quantum phase transition at nonzero temperatures. The numerical simulation results are highly consistent with the theoretical predictions, revealing the effectiveness of our protocol. The presented protocol paves the way for practically simulating small-scale non-Hermitian dynamics.
△ Less
Submitted 7 June, 2023; v1 submitted 27 November, 2022;
originally announced November 2022.
-
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Authors:
Xupeng Miao,
Yujie Wang,
Youhe Jiang,
Chunan Shi,
Xiaonan Nie,
Hailin Zhang,
Bin Cui
Abstract:
Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models. However, how to train these models over multiple GPUs efficiently is still challenging due to a large number of parallelism choices. Existing DL systems either rely on manual efforts to make distributed training plan…
▽ More
Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models. However, how to train these models over multiple GPUs efficiently is still challenging due to a large number of parallelism choices. Existing DL systems either rely on manual efforts to make distributed training plans or apply parallelism combinations within a very limited search space. In this approach, we propose Galvatron, a new system framework that incorporates multiple popular parallelism dimensions and automatically finds the most efficient hybrid parallelism strategy. To better explore such a rarely huge search space, we 1) involve a decision tree to make decomposition and pruning based on some reasonable intuitions, and then 2) design a dynamic programming search algorithm to generate the optimal plan. Evaluations on four representative Transformer workloads show that Galvatron could perform automatically distributed training with different GPU memory budgets. Among all evluated scenarios, Galvatron always achieves superior system throughput compared to previous work with limited parallelism.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
Masked Reconstruction Contrastive Learning with Information Bottleneck Principle
Authors:
Ziwen Liu,
Bonan Li,
Congying Han,
Tiande Guo,
Xuecheng Nie
Abstract:
Contrastive learning (CL) has shown great power in self-supervised learning due to its ability to capture insight correlations among large-scale data. Current CL models are biased to learn only the ability to discriminate positive and negative pairs due to the discriminative task setting. However, this bias would lead to ignoring its sufficiency for other downstream tasks, which we call the discri…
▽ More
Contrastive learning (CL) has shown great power in self-supervised learning due to its ability to capture insight correlations among large-scale data. Current CL models are biased to learn only the ability to discriminate positive and negative pairs due to the discriminative task setting. However, this bias would lead to ignoring its sufficiency for other downstream tasks, which we call the discriminative information overfitting problem. In this paper, we propose to tackle the above problems from the aspect of the Information Bottleneck (IB) principle, further pushing forward the frontier of CL. Specifically, we present a new perspective that CL is an instantiation of the IB principle, including information compression and expression. We theoretically analyze the optimal information situation and demonstrate that minimum sufficient augmentation and information-generalized representation are the optimal requirements for achieving maximum compression and generalizability to downstream tasks. Therefore, we propose the Masked Reconstruction Contrastive Learning~(MRCL) model to improve CL models. For implementation in practice, MRCL utilizes the masking operation for stronger augmentation, further eliminating redundant and noisy information. In order to alleviate the discriminative information overfitting problem effectively, we employ the reconstruction task to regularize the discriminative task. We conduct comprehensive experiments and show the superiority of the proposed model on multiple tasks, including image classification, semantic segmentation and objective detection.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Control-enhanced quantum metrology under Markovian noise
Authors:
Yue Zhai,
Xiaodong Yang,
Kai Tang,
Xinyue Long,
Xinfang Nie,
Tao Xin,
Dawei Lu,
Jun Li
Abstract:
Quantum metrology is supposed to significantly improve the precision of parameter estimation by utilizing suitable quantum resources. However, the predicted precision can be severely distorted by realistic noises. Here, we propose a control-enhanced quantum metrology scheme to defend against these noises for improving the metrology performance. Our scheme can automatically alter the parameter enco…
▽ More
Quantum metrology is supposed to significantly improve the precision of parameter estimation by utilizing suitable quantum resources. However, the predicted precision can be severely distorted by realistic noises. Here, we propose a control-enhanced quantum metrology scheme to defend against these noises for improving the metrology performance. Our scheme can automatically alter the parameter encoding dynamics with adjustable controls, thus leading to optimal resultant states that are less sensitive to the noises under consideration. As a demonstration, we numerically apply it to the problem of frequency estimation under several typical Markovian noise channels. Through comparing our control-enhanced scheme with the standard scheme and the ancilla-assisted scheme, we show that our scheme performs better and can improve the estimation precision up to around one order of magnitude. Furthermore, we conduct a proof-of-principle experiment in nuclear magnetic resonance system to verify the effectiveness of the proposed scheme. The research here is helpful for current quantum platforms to harness the power of quantum metrology in realistic noise environments.
△ Less
Submitted 6 February, 2023; v1 submitted 3 November, 2022;
originally announced November 2022.
-
Multidimensional Coherent Spectroscopy of Molecular Polaritons: Langevin Approach
Authors:
Zhedong Zhang,
Xiaoyu Nie,
Dangyuan Lei,
Shaul Mukame
Abstract:
We present a microscopic theory for nonlinear optical spectroscopy of N molecules in an optical cavity. A quantum Langevin analytical expression is derived for the time- and frequency-resolved signals accounting for arbitrary numbers of vibrational excitations. We identify clear signatures of the polariton-polaron interaction from multidimensional projections of the signal, e.g., pathways and time…
▽ More
We present a microscopic theory for nonlinear optical spectroscopy of N molecules in an optical cavity. A quantum Langevin analytical expression is derived for the time- and frequency-resolved signals accounting for arbitrary numbers of vibrational excitations. We identify clear signatures of the polariton-polaron interaction from multidimensional projections of the signal, e.g., pathways and timescales. Cooperative dynamics of cavity polaritons against intramolecular vibrations is revealed, along with a cross talk between long-range coherence and vibronic coupling that may lead to localization effects. Our results further characterize the polaritonic coherence and the population transfer that is slower.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Experimental realization of a topologically protected Hadamard gate via braiding Fibonacci anyons
Authors:
Yu-ang Fan,
Yingcheng Li,
Yuting Hu,
Yishan Li,
Xinyue Long,
Hongfeng Liu,
Xiaodong Yang,
Xinfang Nie,
Jun Li,
Tao Xin,
Dawei Lu,
Yidun Wan
Abstract:
Topological quantum computation (TQC) is one of the most striking architectures that can realize fault-tolerant quantum computers. In TQC, the logical space and the quantum gates are topologically protected, i.e., robust against local disturbances. The topological protection, however, requires rather complicated lattice models and hard-to-manipulate dynamics; even the simplest system that can real…
▽ More
Topological quantum computation (TQC) is one of the most striking architectures that can realize fault-tolerant quantum computers. In TQC, the logical space and the quantum gates are topologically protected, i.e., robust against local disturbances. The topological protection, however, requires rather complicated lattice models and hard-to-manipulate dynamics; even the simplest system that can realize universal TQC--the Fibonacci anyon system--lacks a physical realization, let alone braiding the non-Abelian anyons. Here, we propose a disk model that can realize the Fibonacci anyon system, and construct the topologically protected logical spaces with the Fibonacci anyons. Via braiding the Fibonacci anyons, we can implement universal quantum gates on the logical space. Our proposal is platform-independent. As a demonstration, we implement a topological Hadamard gate on a logical qubit through a sequence of $15$ braiding operations of three Fibonacci anyons with merely $2$ nuclear spin qubits. The gate fidelity reaches 97.18% by randomized benchmarking. We further prove by experiment that the logical space and Hadamard gate are topologically protected: local disturbances due to thermal fluctuations result in a global phase only. Our work is a proof of principle of TQC and paves the way towards fault-tolerant quantum computation.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Multi-view Human Body Mesh Translator
Authors:
Xiangjian Jiang,
Xuecheng Nie,
Zitian Wang,
Luoqi Liu,
Si Liu
Abstract:
Existing methods for human mesh recovery mainly focus on single-view frameworks, but they often fail to produce accurate results due to the ill-posed setup. Considering the maturity of the multi-view motion capture system, in this paper, we propose to solve the prior ill-posed problem by leveraging multiple images from different views, thus significantly enhancing the quality of recovered meshes.…
▽ More
Existing methods for human mesh recovery mainly focus on single-view frameworks, but they often fail to produce accurate results due to the ill-posed setup. Considering the maturity of the multi-view motion capture system, in this paper, we propose to solve the prior ill-posed problem by leveraging multiple images from different views, thus significantly enhancing the quality of recovered meshes. In particular, we present a novel \textbf{M}ulti-view human body \textbf{M}esh \textbf{T}ranslator (MMT) model for estimating human body mesh with the help of vision transformer. Specifically, MMT takes multi-view images as input and translates them to targeted meshes in a single-forward manner. MMT fuses features of different views in both encoding and decoding phases, leading to representations embedded with global information. Additionally, to ensure the tokens are intensively focused on the human pose and shape, MMT conducts cross-view alignment at the feature level by projecting 3D keypoint positions to each view and enforcing their consistency in geometry constraints. Comprehensive experiments demonstrate that MMT outperforms existing single or multi-view models by a large margin for human mesh recovery task, notably, 28.8\% improvement in MPVE over the current state-of-the-art method on the challenging HUMBI dataset. Qualitative evaluation also verifies the effectiveness of MMT in reconstructing high-quality human mesh. Codes will be made available upon acceptance.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
Ghost translation
Authors:
Wenhan Ren,
Xiaoyu Nie,
Tao Peng,
Marlan O. Scully
Abstract:
Artificial intelligence has recently been widely used in computational imaging. The deep neural network (DNN) improves the signal-to-noise ratio of the retrieved images, whose quality is otherwise corrupted due to the low sampling ratio or noisy environments. This work proposes a new computational imaging scheme based on the sequence transduction mechanism with the transformer network. The simulat…
▽ More
Artificial intelligence has recently been widely used in computational imaging. The deep neural network (DNN) improves the signal-to-noise ratio of the retrieved images, whose quality is otherwise corrupted due to the low sampling ratio or noisy environments. This work proposes a new computational imaging scheme based on the sequence transduction mechanism with the transformer network. The simulation database assists the network in achieving signal translation ability. The experimental single-pixel detector's signal will be `translated' into a 2D image in an end-to-end manner. High-quality images with no background noise can be retrieved at a sampling ratio as low as 2%. The illumination patterns can be either well-designed speckle patterns for sub-Nyquist imaging or random speckle patterns. Moreover, our method is robust to noise interference. This translation mechanism opens a new direction for DNN-assisted ghost imaging and can be used in various computational imaging scenarios.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Authors:
Youhe Jiang,
Fangcheng Fu,
Xupeng Miao,
Xiaonan Nie,
Bin Cui
Abstract:
Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the distributed training of ultra-large models. However, directly deploying these systems often leads to sub-optimal training efficiency due to the complex model architecture…
▽ More
Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the distributed training of ultra-large models. However, directly deploying these systems often leads to sub-optimal training efficiency due to the complex model architectures and the strict device memory constraints. In this paper, we propose Optimal Sharded Data Parallel (OSDP), an automated parallel training system that combines the advantages from both data and model parallelism. Given the model description and the device information, OSDP makes trade-offs between the memory consumption and the hardware utilization, thus automatically generates the distributed computation graph and maximizes the overall system throughput. In addition, OSDP introduces operator splitting to further alleviate peak memory footprints during training with negligible overheads, which enables the trainability of larger models as well as the higher throughput. Extensive experimental results of OSDP on multiple different kinds of large-scale models demonstrate that the proposed strategy outperforms the state-of-the-art in multiple regards. Our code is available at https://github.com/Youhe-Jiang/OptimalShardedDataParallel.
△ Less
Submitted 18 May, 2023; v1 submitted 27 September, 2022;
originally announced September 2022.
-
Measuring Quantum Entanglement from Local Information by Machine Learning
Authors:
Yulei Huang,
Liangyu Che,
Chao Wei,
Feng Xu,
Xinfang Nie,
Jun Li,
Dawei Lu,
Tao Xin
Abstract:
Entanglement is a key property in the development of quantum technologies and in the study of quantum many-body simulations. However, entanglement measurement typically requires quantum full-state tomography (FST). Here we present a neural network-assisted protocol for measuring entanglement in equilibrium and non-equilibrium states of local Hamiltonians. Instead of FST, it can learn comprehensive…
▽ More
Entanglement is a key property in the development of quantum technologies and in the study of quantum many-body simulations. However, entanglement measurement typically requires quantum full-state tomography (FST). Here we present a neural network-assisted protocol for measuring entanglement in equilibrium and non-equilibrium states of local Hamiltonians. Instead of FST, it can learn comprehensive entanglement quantities from single-qubit or two-qubit Pauli measurements, such as Rényi entropy, partially-transposed (PT) moments, and coherence. It is also exciting that our neural network is able to learn the future entanglement dynamics using only single-qubit traces from the previous time. In addition, we perform experiments using a nuclear spin quantum processor and train an adoptive neural network to study entanglement in the ground and dynamical states of a one-dimensional spin chain. Quantum phase transitions (QPT) are revealed by measuring static entanglement in ground states, and the entanglement dynamics beyond measurement time is accurately estimated in dynamical states. These precise results validate our neural network. Our work will have a wide range of applications in quantum many-body systems, from quantum phase transitions to intriguing non-equilibrium phenomena such as quantum thermalization.
△ Less
Submitted 18 September, 2022;
originally announced September 2022.