-
Electron bubbles in highly excited states of the lowest Landau level
Authors:
David D. Dai,
Liang Fu
Abstract:
We study the entire energy spectrum of an electron droplet in the lowest Landau level. By exact diagonalization calculations, we find highly excited states in the middle of the spectrum that display unexpected density distribution and pair correlation. We show that these exceptional excited states contain tightly bound electron bubbles with local filling $ν= 1$ that form various ordered structures…
▽ More
We study the entire energy spectrum of an electron droplet in the lowest Landau level. By exact diagonalization calculations, we find highly excited states in the middle of the spectrum that display unexpected density distribution and pair correlation. We show that these exceptional excited states contain tightly bound electron bubbles with local filling $ν= 1$ that form various ordered structures. Remarkably, these bubble excited states are shown to exist for both the $1/r$ Coulomb interaction and the $1/r^3$ dipole interaction. The experimental realization of bubble excited states in moiré materials under a magnetic field is also discussed.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
15M Multimodal Facial Image-Text Dataset
Authors:
Dawei Dai,
YuTang Li,
YingGe Liu,
Mingming Jia,
Zhang YuanHui,
Guoyin Wang
Abstract:
Currently, image-text-driven multi-modal deep learning models have demonstrated their outstanding potential in many fields. In practice, tasks centered around facial images have broad application prospects. This paper presents \textbf{FaceCaption-15M}, a large-scale, diverse, and high-quality dataset of facial images accompanied by their natural language descriptions (facial image-to-text). This d…
▽ More
Currently, image-text-driven multi-modal deep learning models have demonstrated their outstanding potential in many fields. In practice, tasks centered around facial images have broad application prospects. This paper presents \textbf{FaceCaption-15M}, a large-scale, diverse, and high-quality dataset of facial images accompanied by their natural language descriptions (facial image-to-text). This dataset aims to facilitate a study on face-centered tasks. FaceCaption-15M comprises over 15 million pairs of facial images and their corresponding natural language descriptions of facial features, making it the largest facial image-caption dataset to date. We conducted a comprehensive analysis of image quality, text naturalness, text complexity, and text-image relevance to demonstrate the superiority of FaceCaption-15M. To validate the effectiveness of FaceCaption-15M, we first trained a facial language-image pre-training model (FLIP, similar to CLIP) to align facial image with its corresponding captions in feature space. Subsequently, using both image and text encoders and fine-tuning only the linear layer, our FLIP-based models achieved state-of-the-art results on two challenging face-centered tasks. The purpose is to promote research in the field of face-related tasks through the availability of the proposed FaceCaption-15M dataset. All data, codes, and models are publicly available. https://huggingface.co/datasets/OpenFace-CQUPT/FaceCaption-15M
△ Less
Submitted 11 July, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Authors:
Zihan Wang,
Deli Chen,
Damai Dai,
Runxin Xu,
Zhuoshu Li,
Y. Wu
Abstract:
Parameter-efficient fine-tuning (PEFT) is crucial for customizing Large Language Models (LLMs) with constrained resources. Although there have been various PEFT methods for dense-architecture LLMs, PEFT for sparse-architecture LLMs is still underexplored. In this work, we study the PEFT method for LLMs with the Mixture-of-Experts (MoE) architecture and the contents of this work are mainly threefol…
▽ More
Parameter-efficient fine-tuning (PEFT) is crucial for customizing Large Language Models (LLMs) with constrained resources. Although there have been various PEFT methods for dense-architecture LLMs, PEFT for sparse-architecture LLMs is still underexplored. In this work, we study the PEFT method for LLMs with the Mixture-of-Experts (MoE) architecture and the contents of this work are mainly threefold: (1) We investigate the dispersion degree of the activated experts in customized tasks, and found that the routing distribution for a specific task tends to be highly concentrated, while the distribution of activated experts varies significantly across different tasks. (2) We propose Expert-Specialized Fine-Tuning, or ESFT, which tunes the experts most relevant to downstream tasks while freezing the other experts and modules; experimental results demonstrate that our method not only improves the tuning efficiency, but also matches or even surpasses the performance of full-parameter fine-tuning. (3) We further analyze the impact of the MoE architecture on expert-specialized fine-tuning. We find that MoE models with finer-grained experts are more advantageous in selecting the combination of experts that are most relevant to downstream tasks, thereby enhancing both the training efficiency and effectiveness. Our code is available at https://github.com/deepseek-ai/ESFT.
△ Less
Submitted 4 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Nonvolatile Silicon Photonic MEMS Switch Based on Centrally-Clamped Stepped Bistable Mechanical Beams
Authors:
Qian Ma,
Yinpeng Hu,
Ye Lu,
Yunzhi Liu,
Huan Li,
Daoxin Dai
Abstract:
High-performance photonic switches are essential for large-scale optical routing for AI large models and Internet of things. Realizing nonvolatility can further reduce power consumption and expand application scenarios. We propose a nonvolatile 2*2 silicon photonic micro-electromechanical system (MEMS) switch compatible with standard silicon photonic foundry processes. The switch employs electrost…
▽ More
High-performance photonic switches are essential for large-scale optical routing for AI large models and Internet of things. Realizing nonvolatility can further reduce power consumption and expand application scenarios. We propose a nonvolatile 2*2 silicon photonic micro-electromechanical system (MEMS) switch compatible with standard silicon photonic foundry processes. The switch employs electrostatic comb actuator to change the air gap of the compact horizontal adiabatic coupler and achieves nonvolatility with centrally-clamped stepped bistable mechanical beams. The photonic switch features a 10s us-scale switching speed and a 10s fJ-scale simulated switching energy within a 100*100 um2 footprint, with <=26 V driving voltages. This 2*2 switch can be used in a variety of topologies for large-scale photonic switches, and its nonvolatility can potentially support future photonic FPGA designs.
△ Less
Submitted 2 July, 2024; v1 submitted 19 June, 2024;
originally announced July 2024.
-
Simulating moiré quantum matter with neural network
Authors:
Di Luo,
David D. Dai,
Liang Fu
Abstract:
Moiré materials provide an ideal platform for exploring quantum phases of matter. However, solving the many-electron problem in moiré systems is challenging due to strong correlation effects. We introduce a powerful variational representation of quantum states, many-body neural Bloch wavefunction, to solve many-electron problems in moiré materials accurately and efficiently. Applying our method to…
▽ More
Moiré materials provide an ideal platform for exploring quantum phases of matter. However, solving the many-electron problem in moiré systems is challenging due to strong correlation effects. We introduce a powerful variational representation of quantum states, many-body neural Bloch wavefunction, to solve many-electron problems in moiré materials accurately and efficiently. Applying our method to the semiconductor heterobilayer WSe2/WS2 , we obtain a generalized Wigner crystal at filling factor n = 1/3, a Mott insulator n = 1, and a correlated insulator with local magnetic moments and antiferromagnetic spin correlation at n = 2. Our neural network approach improves the simulation accuracy of strongly interacting moiré materials and paves the way for discovery of new quantum phases with variational learning principle in a unified framework.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Authors:
DeepSeek-AI,
Qihao Zhu,
Daya Guo,
Zhihong Shao,
Dejian Yang,
Peiyi Wang,
Runxin Xu,
Y. Wu,
Yukun Li,
Huazuo Gao,
Shirong Ma,
Wangding Zeng,
Xiao Bi,
Zihui Gu,
Hanwei Xu,
Damai Dai,
Kai Dong,
Liyue Zhang,
Yishi Piao,
Zhibin Gou,
Zhenda Xie,
Zhewen Hao,
Bingxuan Wang,
Junxiao Song,
Deli Chen
, et al. (15 additional authors not shown)
Abstract:
We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe…
▽ More
We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Exploring Activation Patterns of Parameters in Language Models
Authors:
Yudong Wang,
Damai Dai,
Zhifang Sui
Abstract:
Most work treats large language models as black boxes without in-depth understanding of their internal working mechanism. In order to explain the internal representations of LLMs, we propose a gradient-based metric to assess the activation level of model parameters. Based on this metric, we obtain three preliminary findings. (1) When the inputs are in the same domain, parameters in the shallow lay…
▽ More
Most work treats large language models as black boxes without in-depth understanding of their internal working mechanism. In order to explain the internal representations of LLMs, we propose a gradient-based metric to assess the activation level of model parameters. Based on this metric, we obtain three preliminary findings. (1) When the inputs are in the same domain, parameters in the shallow layers will be activated densely, which means a larger portion of parameters will have great impacts on the outputs. In contrast, parameters in the deep layers are activated sparsely. (2) When the inputs are across different domains, parameters in shallow layers exhibit higher similarity in the activation behavior than deep layers. (3) In deep layers, the similarity of the distributions of activated parameters is positively correlated to the empirical data relevance. Further, we develop three validation experiments to solidify these findings. (1) Firstly, starting from the first finding, we attempt to configure different prune ratios for different layers, and find this method can benefit model pruning. (2) Secondly, we find that a pruned model based on one calibration set can better handle tasks related to the calibration task than those not related, which validate the second finding. (3) Thirdly, Based on the STS-B and SICK benchmark, we find that two sentences with consistent semantics tend to share similar parameter activation patterns in deep layers, which aligns with our third finding. Our work sheds light on the behavior of parameter activation in LLMs, and we hope these findings will have the potential to inspire more practical applications.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Studies on particle creation during the universe expansion with a laser system
Authors:
De-Chang Dai,
Changbo Fu
Abstract:
While two highly intensive laser beams collide, they create a region where the refractive index varies so quickly that photons are created. The variance of the refractive index is analog to the universe scale factor variance. Therefore, this laser system can be an analog to the expansion of the universe. We find that several hundreds of photons can be created under feasible conditions. This system…
▽ More
While two highly intensive laser beams collide, they create a region where the refractive index varies so quickly that photons are created. The variance of the refractive index is analog to the universe scale factor variance. Therefore, this laser system can be an analog to the expansion of the universe. We find that several hundreds of photons can be created under feasible conditions. This system can demonstrate the particle creation during inflation or other similar periods.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
On the superconducting gap structure of the miassite Rh17S15: Nodal or nodeless?
Authors:
J. Y. Nie,
C. C. Zhao,
C. Q. Xu,
B. Li,
C. P. Tu,
X. Zhang,
D. Z. Dai,
H. R. Wang,
S. Xu,
Wenhe Jiao,
B. M. Wang,
Zhu'an Xu,
Xiaofeng Xu,
S. Y. Li
Abstract:
Recent penetration depth measurement claimed the observation of unconventional superconductivity in the miassite Rh$_{17}$S$_{15}$ single crystals, evidenced by the linear-in-temperature penetration depth at low temperatures, thereby arguing for the presence of the lines of node in its superconducting gap structure. Here we measure the thermal conductivity of Rh$_{17}$S$_{15}$ single crystals down…
▽ More
Recent penetration depth measurement claimed the observation of unconventional superconductivity in the miassite Rh$_{17}$S$_{15}$ single crystals, evidenced by the linear-in-temperature penetration depth at low temperatures, thereby arguing for the presence of the lines of node in its superconducting gap structure. Here we measure the thermal conductivity of Rh$_{17}$S$_{15}$ single crystals down to 110 mK and up to a field of 8 T ($\simeq 0.4H{\rm_{c2}}$). In marked contrast to the penetration depth measurement, we observe a negligible residual linear term $κ_0/T$ in zero field, in line with the nodeless gap structure. The field dependence of $κ_0(H)/T$ shows a profile that is more consistent with either a highly anisotropic gap structure or multiple nodeless gaps with significantly different magnitudes. Moreover, first-principles calculations give two electronic bands with complex shape of Fermi surfaces. These results suggest multigap nodeless superconductivity in this multiband Rh$_{17}$S$_{15}$ superconductor.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Hybrid thin-film lithium niobate micro-ring acousto-optic modulator for microwave-to-optical conversion
Authors:
Lei Wan,
Jiying Huang,
Meixun Wen,
Huan Li,
Wenfeng Zhou,
Zhiqiang Yang,
Yuping Chen,
Huilong Liu,
Siqing Zeng,
Dong Liu,
Shuixian Yang,
Daoxin Dai,
Zhaohui Li
Abstract:
Highly efficient acousto-optic modulation plays a vital role in the microwave-to-optical conversion. Herein, we demonstrate a hybrid thin-film lithium niobate (TFLN) racetrack micro-ring acousto-optic modulator (AOM) implemented with low-loss chalcogenide (ChG) waveguide. By engineering the electrode configuration of the interdigital transducer, the double-arm micro-ring acousto-optic modulation i…
▽ More
Highly efficient acousto-optic modulation plays a vital role in the microwave-to-optical conversion. Herein, we demonstrate a hybrid thin-film lithium niobate (TFLN) racetrack micro-ring acousto-optic modulator (AOM) implemented with low-loss chalcogenide (ChG) waveguide. By engineering the electrode configuration of the interdigital transducer, the double-arm micro-ring acousto-optic modulation is experimentally confirmed in nonsuspended ChG loaded TFLN waveguide platform. Varying the position of blue-detuned bias point, the half-wave-voltage-length product VpaiL of the hybrid TFLN micro-ring AOM is as small as 9 mVcm. Accordingly, the acousto-optic coupling strength is estimated to be 0.48 Hz s1/2 at acoustic frequency of 0.84 GHz. By analyzing the generation of phonon number from the piezoelectric transducer, the microwave-to-optical conversion efficiency is calculated to be 0.05%, approximately one order of magnitude larger than that of the state-of-the-art suspended counterpart. Efficient microwave-to-optical conversion thus provides new opportunities for low-power-consumption quantum information transduction using the TFLN-ChG hybrid piezo-optomechanical devices.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Authors:
DeepSeek-AI,
Aixin Liu,
Bei Feng,
Bin Wang,
Bingxuan Wang,
Bo Liu,
Chenggang Zhao,
Chengqi Dengr,
Chong Ruan,
Damai Dai,
Daya Guo,
Dejian Yang,
Deli Chen,
Dongjie Ji,
Erhang Li,
Fangyun Lin,
Fuli Luo,
Guangbo Hao,
Guanting Chen,
Guowei Li,
H. Zhang,
Hanwei Xu,
Hao Yang,
Haowei Zhang,
Honghui Ding
, et al. (132 additional authors not shown)
Abstract:
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference…
▽ More
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.
△ Less
Submitted 19 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs
Authors:
Elliot Kolker-Hicks,
Di Zhang,
Dong Dai
Abstract:
High Performance Computing (HPC) systems are used across a wide range of disciplines for both large and complex computations. HPC systems often receive many thousands of computational tasks at a time, colloquially referred to as jobs. These jobs must then be scheduled as optimally as possible so they can be completed within a reasonable timeframe. HPC scheduling systems often employ a technique ca…
▽ More
High Performance Computing (HPC) systems are used across a wide range of disciplines for both large and complex computations. HPC systems often receive many thousands of computational tasks at a time, colloquially referred to as jobs. These jobs must then be scheduled as optimally as possible so they can be completed within a reasonable timeframe. HPC scheduling systems often employ a technique called backfilling, wherein low-priority jobs are scheduled earlier to use the available resources that are waiting for the pending high-priority jobs. To make it work, backfilling largely relies on job runtime to calculate the start time of the ready-to-schedule jobs and avoid delaying them. It is a common belief that better estimations of job runtime will lead to better backfilling and more effective scheduling. However, our experiments show a different conclusion: there is a missing trade-off between prediction accuracy and backfilling opportunities. To learn how to achieve the best trade-off, we believe reinforcement learning (RL) can be effectively leveraged. Reinforcement Learning relies on an agent which makes decisions from observing the environment, and gains rewards or punishments based on the quality of its decision-making. Based on this idea, we designed RLBackfilling, a reinforcement learning-based backfilling algorithm. We show how RLBackfilling can learn effective backfilling strategies via trial-and-error on existing job traces. Our evaluation results show up to 59% better scheduling performance (based on average bounded job slowdown) compared to EASY backfilling using user-provided job runtime and 30% better performance compared with EASY using the ideal predicted job runtime (the actual job runtime).
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Large Language Models Are Unconscious of Unreasonability in Math Problems
Authors:
Jingyuan Ma,
Damai Dai,
Lei Sha,
Zhifang Sui
Abstract:
Large language models (LLMs) demonstrate substantial capabilities in solving math problems. However, they tend to produce hallucinations when given questions containing unreasonable errors. In this paper, we study the behavior of LLMs when faced with unreasonable math problems and further explore their potential to address these problems. We construct the Unreasonable Math Problem (UMP) benchmark…
▽ More
Large language models (LLMs) demonstrate substantial capabilities in solving math problems. However, they tend to produce hallucinations when given questions containing unreasonable errors. In this paper, we study the behavior of LLMs when faced with unreasonable math problems and further explore their potential to address these problems. We construct the Unreasonable Math Problem (UMP) benchmark to examine the error detection ability of LLMs. Experiments show that LLMs are able to detect unreasonable errors, but still fail in generating non-hallucinatory content. In order to improve their ability of error detection and correction, we further design a strategic prompt template called Critical Calculation and Conclusion(CCC). With CCC, LLMs can better self-evaluate and detect unreasonable errors in math questions, making them more reliable and safe in practical application scenarios.
△ Less
Submitted 16 April, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Convert laser light into single photons via interference
Authors:
Yanfeng Li,
Manman Wang,
Guoqi Huang,
Li Liu,
Wenyan Wang,
Weijie Ji,
Hanqing Liu,
Xiangbin Su,
Shulun Li,
Deyan Dai,
Xiangjun Shang,
Haiqiao Ni,
Zhichuan Niu,
Chengyong Hu
Abstract:
Laser light possesses perfect coherence, but cannot be attenuated to single photons via linear optics. An elegant route to convert laser light into single photons is based on photon blockade in a cavity with a single atom in the strong coupling regime. However, the single-photon purity achieved by this method remains relatively low. Here we propose an interference-based approach where laser light…
▽ More
Laser light possesses perfect coherence, but cannot be attenuated to single photons via linear optics. An elegant route to convert laser light into single photons is based on photon blockade in a cavity with a single atom in the strong coupling regime. However, the single-photon purity achieved by this method remains relatively low. Here we propose an interference-based approach where laser light can be transformed into single photons by destructively interfering with a weak but super-bunched incoherent field emitted from a cavity coupling to a single quantum emitter. We demonstrate this idea by measuring the reflected light of a laser field which drives a double-sided optical microcavity containing a single artificial atom-quantum dot (QD) in the Purcell regime. The reflected light consists of a superposition of the driving field with the cavity output field. We achieve the second-order autocorrelation g2(0)=0.030+-0.002 and the two-photon interference visibility 94.3%+-0.2. By separating the coherent and incoherent fields in the reflected light, we observe that the incoherent field from the cavity exhibits super-bunching with g2(0)=41+-2 while the coherent field remains Poissonian statistics. By controlling the relative amplitude of coherent and incoherent fields, we verify that photon statistics of reflected light is tuneable from perfect anti-bunching to super-bunching in agreement with our predictions. Our results demonstrate photon statistics of light as a quantum interference phenomenon that a single QD can scatter two photons simultaneously at low driving fields in contrast to the common picture that a single two-level quantum emitter can only scatter (or absorb and emit) single photons. This work opens the door to tailoring photon statistics of laser light via cavity or waveguide quantum electrodynamics and interference.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Asymptotics of the confluent hypergeometric process with a varying external potential in the super-exponential region
Authors:
Dan Dai,
Luming Yao,
Yu Zhai
Abstract:
In this paper, we investigate a determinantal point process on the interval $(-s,s)$, associated with the confluent hypergeometric kernel. Let $\mathcal{K}^{(α,β)}_s$ denote the trace class integral operator acting on $L^2(-s, s)$ with the confluent hypergeometric kernel. Our focus is on deriving the asymptotics of the Fredholm determinant $\det(I-γ\mathcal{K}^{(α,β)}_s)$ as $s \to +\infty$, while…
▽ More
In this paper, we investigate a determinantal point process on the interval $(-s,s)$, associated with the confluent hypergeometric kernel. Let $\mathcal{K}^{(α,β)}_s$ denote the trace class integral operator acting on $L^2(-s, s)$ with the confluent hypergeometric kernel. Our focus is on deriving the asymptotics of the Fredholm determinant $\det(I-γ\mathcal{K}^{(α,β)}_s)$ as $s \to +\infty$, while simultaneously $γ\to 1^-$ in a super-exponential region. In this regime of double scaling limit, our asymptotic result also gives us asymptotics of the eigenvalues $λ^{(α, β)}_k(s)$ of the integral operator $\mathcal{K}^{(α,β)}_s$ as $s \to +\infty$. Based on the integrable structure of the confluent hypergeometric kernel, we derive our asymptotic results by applying the Deift-Zhou nonlinear steepest descent method to analyze the related Riemann-Hilbert problem.
△ Less
Submitted 5 May, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Authors:
Peng Liu,
Dongyang Dai,
Zhiyong Wu
Abstract:
Recent advancements in generative modeling have significantly enhanced the reconstruction of audio waveforms from various representations. While diffusion models are adept at this task, they are hindered by latency issues due to their operation at the individual sample point level and the need for numerous sampling steps. In this study, we introduce RFWave, a cutting-edge multi-band Rectified Flow…
▽ More
Recent advancements in generative modeling have significantly enhanced the reconstruction of audio waveforms from various representations. While diffusion models are adept at this task, they are hindered by latency issues due to their operation at the individual sample point level and the need for numerous sampling steps. In this study, we introduce RFWave, a cutting-edge multi-band Rectified Flow approach designed to reconstruct high-fidelity audio waveforms from Mel-spectrograms or discrete tokens. RFWave uniquely generates complex spectrograms and operates at the frame level, processing all subbands simultaneously to boost efficiency. Leveraging Rectified Flow, which targets a flat transport trajectory, RFWave achieves reconstruction with just 10 sampling steps. Our empirical evaluations show that RFWave not only provides outstanding reconstruction quality but also offers vastly superior computational efficiency, enabling audio generation at speeds up to 97 times faster than real-time on a GPU. An online demonstration is available at: https://rfwave-demo.github.io/rfwave/.
△ Less
Submitted 2 June, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
DIFNet: SAR RFI suppression based on domain invariant features
Authors:
Fuping Fang,
Wenhao Lv,
Dahai Dai
Abstract:
Synthetic aperture radar is a high-resolution two-dimensional imaging radar, however, during the imaging process, SAR is susceptible to intentional and unintentional interference, with radio frequency interference (RFI) being the most common type, leading to a severe degradation in image quality. Although inpainting networks have achieved excellent results, their generalization is unclear, and whe…
▽ More
Synthetic aperture radar is a high-resolution two-dimensional imaging radar, however, during the imaging process, SAR is susceptible to intentional and unintentional interference, with radio frequency interference (RFI) being the most common type, leading to a severe degradation in image quality. Although inpainting networks have achieved excellent results, their generalization is unclear, and whether they still work effectively in cross-sensor experiments needs further verification. Through time-frequency analysis of interference signals, we find that interference holds domain invariant features between different sensors. Therefore, this paper reconstructs the loss function and extracts the domain invariant features to improve the generalization. Ultimately, this paper proposes a SAR RFI suppression method based on domain invariant features, and embeds the RFI suppression into SAR imaging process. Compared to traditional notch filtering methods, the proposed approach not only removes interference but also effectively preserves strong scattering targets. Compared to PISNet, our method can extract domain invariant features and holds better generalization ability, and even in the cross-sensor experiment, our method can still achieve excellent results.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
DGAP: Efficient Dynamic Graph Analysis on Persistent Memory
Authors:
Abdullah Al Raqibul Islam,
Dong Dai
Abstract:
Dynamic graphs, featuring continuously updated vertices and edges, have grown in importance for numerous real-world applications. To accommodate this, graph frameworks, particularly their internal data structures, must support both persistent graph updates and rapid graph analysis simultaneously, leading to complex designs to orchestrate `fast but volatile' and `persistent but slow' storage device…
▽ More
Dynamic graphs, featuring continuously updated vertices and edges, have grown in importance for numerous real-world applications. To accommodate this, graph frameworks, particularly their internal data structures, must support both persistent graph updates and rapid graph analysis simultaneously, leading to complex designs to orchestrate `fast but volatile' and `persistent but slow' storage devices. Emerging persistent memory technologies, such as Optane DCPMM, offer a promising alternative to simplify the designs by providing data persistence, low latency, and high IOPS together. In light of this, we propose DGAP, a framework for efficient dynamic graph analysis on persistent memory. Unlike traditional dynamic graph frameworks, which combine multiple graph data structures (e.g., edge list or adjacency list) to achieve the required performance, DGAP utilizes a single mutable Compressed Sparse Row (CSR) graph structure with new designs for persistent memory to construct the framework. Specifically, DGAP introduces a \textit{per-section edge log} to reduce write amplification on persistent memory; a \textit{per-thread undo log} to enable high-performance, crash-consistent rebalancing operations; and a data placement schema to minimize in-place updates on persistent memory. Our extensive evaluation results demonstrate that DGAP can achieve up to $3.2\times$ better graph update performance and up to $3.77\times$ better graph analysis performance compared to state-of-the-art dynamic graph frameworks for persistent memory, such as XPGraph, LLAMA, and GraphOne.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization
Authors:
Xiangdi Meng,
Damai Dai,
Weiyao Luo,
Zhe Yang,
Shaoxiang Wu,
Xiaochen Wang,
Peiyi Wang,
Qingxiu Dong,
Liang Chen,
Zhifang Sui
Abstract:
Supervised fine-tuning is the most common method to adapt large language models (LLMs) to downstream tasks, but full fine-tuning LLMs requires massive computational resources. Recently, parameter-efficient fine-tuning (PEFT) methods have been widely studied due to its cost-effectiveness. LoRA is one of the most widely used methods, which assumes that the optimization process is essentially low-dim…
▽ More
Supervised fine-tuning is the most common method to adapt large language models (LLMs) to downstream tasks, but full fine-tuning LLMs requires massive computational resources. Recently, parameter-efficient fine-tuning (PEFT) methods have been widely studied due to its cost-effectiveness. LoRA is one of the most widely used methods, which assumes that the optimization process is essentially low-dimensional. Although LoRA fine-tuning is effective, there is still a performance gap compared to full fine-tuning, since its weight update is limited to low-rank matrices. In order to break the low-rank bottleneck in LoRA Optimization, we propose PeriodicLoRA (PLoRA), which accumulates low-rank update matrices multiple times to achieve a higher update rank. PLoRA has multiple training stages. During each stage, we still update only the LoRA weights. However, at the end of each stage, we unload the LoRA weights into the backbone parameters and then reinitialize the LoRA states. Experimental results show that PLoRA has stronger learning ability, approximately 1.8 times that of LoRA's learning ability at most, but it does not increase memory usage. Further, we introduce a momentum-based unloading strategy for PLoRA to mitigate the training instability.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Four-Channel WDM Graphene Optical Receiver
Authors:
Laiwen Yu,
Yurui Li,
Hengtai Xiang,
Yuanrong Li,
Hengzhen Cao,
Zhongyang Ji,
Liu Liu,
Xi Xiao,
Jianbo Yin,
Jingshu Guo,
Daoxin Dai
Abstract:
Silicon photonics with the advantages of low power consumption, low cost, and high yield is a crucial technology for facilitating high-capacity optical communications and interconnects. The graphene photodetectors (GPDs) featuring broadband operation, high speed, and low integration cost can be good additions to the conventional SiGe photodetectors, supporting silicon-integrated on-chip photodetec…
▽ More
Silicon photonics with the advantages of low power consumption, low cost, and high yield is a crucial technology for facilitating high-capacity optical communications and interconnects. The graphene photodetectors (GPDs) featuring broadband operation, high speed, and low integration cost can be good additions to the conventional SiGe photodetectors, supporting silicon-integrated on-chip photodetection in new wavelength bands beyond 1.6 microns (e.g., U-band and 2 microns). Here we realize a silicon-integrated four-channel wavelength division multiplexing (WDM) optical receiver based on a micro-ring resonator (MRR) array and four p-n homojunction GPDs. These GPDs based on the photo-thermoelectric (PTE) effect operating under zero (current) bias exhibit responsivities of about 1.1 V/W and flat frequency responses up to 67 GHz which is set-up limited. The GPDs show good consistence benefiting from the compact active region array (0.006 mm^2) covered by a single mechanically exfoliated hBN/graphene/hBN stack. Moreover, the WDM graphene optical receiver realized the 4 x 16 Gbps non-return to zero (NRZ) optical signal transmission. To the best of our knowledge, it is the first GPD-array-based optical receiver using high-quality mechanically exfoliated graphene and edge graphene-metal conduct with low resistance. Apparently, our design is also compatible with CVD-grown graphene, which can also result in a good consistence of the GPDs. This work shed light on the large-scale integration of GPDs with high consistency and uniformity, enabling the application of high-quality mechanically exfoliated graphene, and promoting the development of the graphene photonic integrated circuits.
△ Less
Submitted 2 March, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
A Geometric VOF Method for Interface Flow Simulations
Authors:
Dezhi Dai,
Haomin Yuan,
Albert Y. Tong,
Adrian Tentner
Abstract:
A novel numerical technique designed for interface flow simulations using the Volume of Fluid (VOF) method on arbitrary unstructured meshes has been introduced. The method is called SimPLIC, which seamlessly integrates Piecewise Linear Interface Calculation (PLIC) and Simpson's rule. The main focus of the proposed method is to compute the volume of the primary phase that moves across a mesh face w…
▽ More
A novel numerical technique designed for interface flow simulations using the Volume of Fluid (VOF) method on arbitrary unstructured meshes has been introduced. The method is called SimPLIC, which seamlessly integrates Piecewise Linear Interface Calculation (PLIC) and Simpson's rule. The main focus of the proposed method is to compute the volume of the primary phase that moves across a mesh face within a single time step. This is achieved by reconstructing the interface and assessing how the submerged face area evolves over time. Simpson's rule is employed to integrate the time evolution of this submerged face area, ensuring an accurate estimation of the volume of the transported primary phase. The method's robustness was validated by solving a spherical interface advection problem in a non-uniform three-dimensional flow across unstructured meshes with diverse cell types and dimensions. Key metrics such as volume conservation, shape retention, friction boundedness and solving efficiency were meticulously monitored and juxtaposed. Numerical outcomes underscored the precision and adequacy of the PLIC-VOF technique when complemented with Simpson's rule in advecting the interface. Furthermore, the SimPLIC method has been integrated into OpenFOAM v2312 as an unofficial extension and is now accessible to the community.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs
Authors:
Dingyi Dai,
Yichi Zhang,
Jiahao Zhang,
Zhanqiu Hu,
Yaohui Cai,
Qi Sun,
Zhiru Zhang
Abstract:
Quantization is a crucial technique for deploying deep learning models on resource-constrained devices, such as embedded FPGAs. Prior efforts mostly focus on quantizing matrix multiplications, leaving other layers like BatchNorm or shortcuts in floating-point form, even though fixed-point arithmetic is more efficient on FPGAs. A common practice is to fine-tune a pre-trained model to fixed-point fo…
▽ More
Quantization is a crucial technique for deploying deep learning models on resource-constrained devices, such as embedded FPGAs. Prior efforts mostly focus on quantizing matrix multiplications, leaving other layers like BatchNorm or shortcuts in floating-point form, even though fixed-point arithmetic is more efficient on FPGAs. A common practice is to fine-tune a pre-trained model to fixed-point for FPGA deployment, but potentially degrading accuracy.
This work presents QFX, a novel trainable fixed-point quantization approach that automatically learns the binary-point position during model training. Additionally, we introduce a multiplier-free quantization strategy within QFX to minimize DSP usage. QFX is implemented as a PyTorch-based library that efficiently emulates fixed-point arithmetic, supported by FPGA HLS, in a differentiable manner during backpropagation. With minimal effort, models trained with QFX can readily be deployed through HLS, producing the same numerical results as their software counterparts. Our evaluation shows that compared to post-training quantization, QFX can quantize models trained with element-wise layers quantized to fewer bits and achieve higher accuracy on both CIFAR-10 and ImageNet datasets. We further demonstrate the efficacy of multiplier-free quantization using a state-of-the-art binarized neural network accelerator designed for an embedded FPGA (AMD Xilinx Ultra96 v2). We plan to release QFX in open-source format.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Authors:
Xu Yan,
Haiming Zhang,
Yingjie Cai,
Jingming Guo,
Weichao Qiu,
Bin Gao,
Kaiqiang Zhou,
Yue Zhao,
Huan Jin,
Jiantao Gao,
Zhen Li,
Lihui Jiang,
Wei Zhang,
Hongbo Zhang,
Dengxin Dai,
Bingbing Liu
Abstract:
The rise of large foundation models, trained on extensive datasets, is revolutionizing the field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by extracting intricate patterns and performing effectively across diverse tasks, thereby serving as potent building blocks for a wide range of AI applications. Autonomous driving, a vibrant front in AI applications, remains chal…
▽ More
The rise of large foundation models, trained on extensive datasets, is revolutionizing the field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by extracting intricate patterns and performing effectively across diverse tasks, thereby serving as potent building blocks for a wide range of AI applications. Autonomous driving, a vibrant front in AI applications, remains challenged by the lack of dedicated vision foundation models (VFMs). The scarcity of comprehensive training data, the need for multi-sensor integration, and the diverse task-specific architectures pose significant obstacles to the development of VFMs in this field. This paper delves into the critical challenge of forging VFMs tailored specifically for autonomous driving, while also outlining future directions. Through a systematic analysis of over 250 papers, we dissect essential techniques for VFM development, including data preparation, pre-training strategies, and downstream task adaptation. Moreover, we explore key advancements such as NeRF, diffusion models, 3D Gaussian Splatting, and world models, presenting a comprehensive roadmap for future research. To empower researchers, we have built and maintained https://github.com/zhanghm1995/Forge_VFM4AD, an open-access repository constantly updated with the latest advancements in forging VFMs for autonomous driving.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Authors:
Damai Dai,
Chengqi Deng,
Chenggang Zhao,
R. X. Xu,
Huazuo Gao,
Deli Chen,
Jiashi Li,
Wangding Zeng,
Xingkai Yu,
Y. Wu,
Zhenda Xie,
Y. K. Li,
Panpan Huang,
Fuli Luo,
Chong Ruan,
Zhifang Sui,
Wenfeng Liang
Abstract:
In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for managing computational costs when scaling up model parameters. However, conventional MoE architectures like GShard, which activate the top-$K$ out of $N$ experts, face challenges in ensuring expert specialization, i.e. each expert acquires non-overlapping and focused knowledge. In response, we propose the…
▽ More
In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for managing computational costs when scaling up model parameters. However, conventional MoE architectures like GShard, which activate the top-$K$ out of $N$ experts, face challenges in ensuring expert specialization, i.e. each expert acquires non-overlapping and focused knowledge. In response, we propose the DeepSeekMoE architecture towards ultimate expert specialization. It involves two principal strategies: (1) finely segmenting the experts into $mN$ ones and activating $mK$ from them, allowing for a more flexible combination of activated experts; (2) isolating $K_s$ experts as shared ones, aiming at capturing common knowledge and mitigating redundancy in routed experts. Starting from a modest scale with 2B parameters, we demonstrate that DeepSeekMoE 2B achieves comparable performance with GShard 2.9B, which has 1.5 times the expert parameters and computation. In addition, DeepSeekMoE 2B nearly approaches the performance of its dense counterpart with the same number of total parameters, which set the upper bound of MoE models. Subsequently, we scale up DeepSeekMoE to 16B parameters and show that it achieves comparable performance with LLaMA2 7B, with only about 40% of computations. Further, our preliminary efforts to scale up DeepSeekMoE to 145B parameters consistently validate its substantial advantages over the GShard architecture, and show its performance comparable with DeepSeek 67B, using only 28.5% (maybe even 18.2%) of computations.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Language Models Know the Value of Numbers
Authors:
Fangwei Zhu,
Damai Dai,
Zhifang Sui
Abstract:
Large language models (LLMs) have exhibited impressive competence in various tasks, but their internal mechanisms on mathematical problems are still under-explored. In this paper, we study a fundamental question: whether language models know the value of numbers, a basic element in math. To study the question, we construct a synthetic dataset comprising addition problems and utilize linear probes…
▽ More
Large language models (LLMs) have exhibited impressive competence in various tasks, but their internal mechanisms on mathematical problems are still under-explored. In this paper, we study a fundamental question: whether language models know the value of numbers, a basic element in math. To study the question, we construct a synthetic dataset comprising addition problems and utilize linear probes to read out input numbers from the hidden states. Experimental results support the existence of encoded number values in LLMs on different layers, and these values can be extracted via linear probes. Further experiments show that LLMs store their calculation results in a similar manner, and we can intervene the output via simple vector additions, proving the causal connection between encoded numbers and language model outputs. Our research provides evidence that LLMs know the value of numbers, thus offering insights for better exploring, designing, and utilizing numeric information in LLMs.
△ Less
Submitted 9 June, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Authors:
DeepSeek-AI,
:,
Xiao Bi,
Deli Chen,
Guanting Chen,
Shanhuang Chen,
Damai Dai,
Chengqi Deng,
Honghui Ding,
Kai Dong,
Qiushi Du,
Zhe Fu,
Huazuo Gao,
Kaige Gao,
Wenjun Gao,
Ruiqi Ge,
Kang Guan,
Daya Guo,
Jianzhong Guo,
Guangbo Hao,
Zhewen Hao,
Ying He,
Wenjie Hu,
Panpan Huang,
Erhang Li
, et al. (63 additional authors not shown)
Abstract:
The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B…
▽ More
The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Multi-Granularity Representation Learning for Sketch-based Dynamic Face Image Retrieval
Authors:
Liang Wang,
Dawei Dai,
Shiyu Fu,
Guoyin Wang
Abstract:
In specific scenarios, face sketch can be used to identify a person. However, drawing a face sketch often requires exceptional skill and is time-consuming, limiting its widespread applications in actual scenarios. The new framework of sketch less face image retrieval (SLFIR)[1] attempts to overcome the barriers by providing a means for humans and machines to interact during the drawing process. Co…
▽ More
In specific scenarios, face sketch can be used to identify a person. However, drawing a face sketch often requires exceptional skill and is time-consuming, limiting its widespread applications in actual scenarios. The new framework of sketch less face image retrieval (SLFIR)[1] attempts to overcome the barriers by providing a means for humans and machines to interact during the drawing process. Considering SLFIR problem, there is a large gap between a partial sketch with few strokes and any whole face photo, resulting in poor performance at the early stages. In this study, we propose a multigranularity (MG) representation learning (MGRL) method to address the SLFIR problem, in which we learn the representation of different granularity regions for a partial sketch, and then, by combining all MG regions of the sketches and images, the final distance was determined. In the experiments, our method outperformed state-of-the-art baselines in terms of early retrieval on two accessible datasets. Codes are available at https://github.com/ddw2AIGROUP2CQUPT/MGRL.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Authors:
Peiyi Wang,
Lei Li,
Zhihong Shao,
R. X. Xu,
Damai Dai,
Yifei Li,
Deli Chen,
Y. Wu,
Zhifang Sui
Abstract:
In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of…
▽ More
In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for reranking multiple outputs generated by Large Language Models (LLMs); 2) \textit{Reinforcement Learning}: Math-Shepherd is employed to reinforce LLMs with step-by-step Proximal Policy Optimization (PPO). With Math-Shepherd, a series of open-source LLMs demonstrates exceptional performance. For instance, the step-by-step PPO with Math-Shepherd significantly improves the accuracy of Mistral-7B (77.9\%$\to$84.1\% on GSM8K and 28.6\%$\to$33.0\% on MATH). The accuracy can be further enhanced to 89.1\% and 43.5\% on GSM8K and MATH with the verification of Math-Shepherd, respectively. We believe that automatic process supervision holds significant potential for the future evolution of LLMs.
△ Less
Submitted 19 February, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
Electrically controlled interlayer trion fluid in electron-hole bilayers
Authors:
Ruishi Qi,
Qize Li,
Zuocheng Zhang,
Sudi Chen,
Jingxu Xie,
Yunbo Ou,
Zhiyuan Cui,
David D. Dai,
Andrew Y. Joe,
Takashi Taniguchi,
Kenji Watanabe,
Sefaattin Tongay,
Alex Zettl,
Liang Fu,
Feng Wang
Abstract:
The combination of repulsive and attractive Coulomb interactions in a quantum electron(e)-hole(h) fluid can give rise to novel correlated phases of multiparticle charge complexes such as excitons, trions and biexcitons. Here we report the first experimental realization of an electrically controlled interlayer trion fluid in two-dimensional van der Waals heterostructures. We demonstrate that in the…
▽ More
The combination of repulsive and attractive Coulomb interactions in a quantum electron(e)-hole(h) fluid can give rise to novel correlated phases of multiparticle charge complexes such as excitons, trions and biexcitons. Here we report the first experimental realization of an electrically controlled interlayer trion fluid in two-dimensional van der Waals heterostructures. We demonstrate that in the strong coupling regime of electron-hole bilayers, electrons and holes in separate layers can spontaneously form three-particle trion bound states that resemble positronium ions in high energy physics. The interlayer trions can assume 1e-2h and 2e-1h configurations, where electrons and holes are confined in different transition metal dichalcogenide layers. We show that the two correlated holes in 1e-2h trions form a spin-singlet state with a spin gap of ~1meV. By electrostatic gating, the equilibrium state of our system can be continuously tuned into an exciton fluid, a trion fluid, an exciton-trion mixture, a trion-charge mixture or an electron-hole plasma. Upon optical excitation, the system can host novel high-order multiparticle charge complexes including interlayer four-particle complex (tetrons) and five-particle complex (pentons). Our work demonstrates a unique platform to study novel correlated phases of tunable Bose-Fermi mixtures and opens up new opportunities to realize artificial ions/molecules in electronic devices.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic Segmentation
Authors:
Ozan Unal,
Dengxin Dai,
Lukas Hoyer,
Yigit Baran Can,
Luc Van Gool
Abstract:
As 3D perception problems grow in popularity and the need for large-scale labeled datasets for LiDAR semantic segmentation increase, new methods arise that aim to reduce the necessity for dense annotations by employing weakly-supervised training. However these methods continue to show weak boundary estimation and high false negative rates for small objects and distant sparse regions. We argue that…
▽ More
As 3D perception problems grow in popularity and the need for large-scale labeled datasets for LiDAR semantic segmentation increase, new methods arise that aim to reduce the necessity for dense annotations by employing weakly-supervised training. However these methods continue to show weak boundary estimation and high false negative rates for small objects and distant sparse regions. We argue that such weaknesses can be compensated by using RGB images which provide a denser representation of the scene. We propose an image-guidance network (IGNet) which builds upon the idea of distilling high level feature information from a domain adapted synthetically trained 2D semantic segmentation network. We further utilize a one-way contrastive learning scheme alongside a novel mixing strategy called FOVMix, to combat the horizontal field-of-view mismatch between the two sensors and enhance the effects of image guidance. IGNet achieves state-of-the-art results for weakly-supervised LiDAR semantic segmentation on ScribbleKITTI, boasting up to 98% relative performance to fully supervised training with only 8% labeled points, while introducing no additional annotation burden or computational/memory cost during inference. Furthermore, we show that our contributions also prove effective for semi-supervised training, where IGNet claims state-of-the-art results on both ScribbleKITTI and SemanticKITTI.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning
Authors:
Yue Fan,
Anna Kukleva,
Dengxin Dai,
Bernt Schiele
Abstract:
Semi-supervised learning (SSL) methods effectively leverage unlabeled data to improve model generalization. However, SSL models often underperform in open-set scenarios, where unlabeled data contain outliers from novel categories that do not appear in the labeled set. In this paper, we study the challenging and realistic open-set SSL setting, where the goal is to both correctly classify inliers an…
▽ More
Semi-supervised learning (SSL) methods effectively leverage unlabeled data to improve model generalization. However, SSL models often underperform in open-set scenarios, where unlabeled data contain outliers from novel categories that do not appear in the labeled set. In this paper, we study the challenging and realistic open-set SSL setting, where the goal is to both correctly classify inliers and to detect outliers. Intuitively, the inlier classifier should be trained on inlier data only. However, we find that inlier classification performance can be largely improved by incorporating high-confidence pseudo-labeled data, regardless of whether they are inliers or outliers. Also, we propose to utilize non-linear transformations to separate the features used for inlier classification and outlier detection in the multi-task learning framework, preventing adverse effects between them. Additionally, we introduce pseudo-negative mining, which further boosts outlier detection performance. The three ingredients lead to what we call Simple but Strong Baseline (SSB) for open-set SSL. In experiments, SSB greatly improves both inlier classification and outlier detection performance, outperforming existing methods by a large margin. Our code will be released at https://github.com/YUE-FAN/SSB.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Object-centric Cross-modal Feature Distillation for Event-based Object Detection
Authors:
Lei Li,
Alexander Liniger,
Mario Millhaeusler,
Vagia Tsiminaki,
Yuanyou Li,
Dengxin Dai
Abstract:
Event cameras are gaining popularity due to their unique properties, such as their low latency and high dynamic range. One task where these benefits can be crucial is real-time object detection. However, RGB detectors still outperform event-based detectors due to the sparsity of the event data and missing visual details. In this paper, we develop a novel knowledge distillation approach to shrink t…
▽ More
Event cameras are gaining popularity due to their unique properties, such as their low latency and high dynamic range. One task where these benefits can be crucial is real-time object detection. However, RGB detectors still outperform event-based detectors due to the sparsity of the event data and missing visual details. In this paper, we develop a novel knowledge distillation approach to shrink the performance gap between these two modalities. To this end, we propose a cross-modality object detection distillation method that by design can focus on regions where the knowledge distillation works best. We achieve this by using an object-centric slot attention mechanism that can iteratively decouple features maps into object-centric features and corresponding pixel-features used for distillation. We evaluate our novel distillation approach on a synthetic and a real event dataset with aligned grayscale images as a teacher modality. We show that object-centric distillation allows to significantly improve the performance of the event-based student object detector, nearly halving the performance gap with respect to the teacher.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
PRED: Pre-training via Semantic Rendering on LiDAR Point Clouds
Authors:
Hao Yang,
Haiyang Wang,
Di Dai,
Liwei Wang
Abstract:
Pre-training is crucial in 3D-related fields such as autonomous driving where point cloud annotation is costly and challenging. Many recent studies on point cloud pre-training, however, have overlooked the issue of incompleteness, where only a fraction of the points are captured by LiDAR, leading to ambiguity during the training phase. On the other hand, images offer more comprehensive information…
▽ More
Pre-training is crucial in 3D-related fields such as autonomous driving where point cloud annotation is costly and challenging. Many recent studies on point cloud pre-training, however, have overlooked the issue of incompleteness, where only a fraction of the points are captured by LiDAR, leading to ambiguity during the training phase. On the other hand, images offer more comprehensive information and richer semantics that can bolster point cloud encoders in addressing the incompleteness issue inherent in point clouds. Yet, incorporating images into point cloud pre-training presents its own challenges due to occlusions, potentially causing misalignments between points and pixels. In this work, we propose PRED, a novel image-assisted pre-training framework for outdoor point clouds in an occlusion-aware manner. The main ingredient of our framework is a Birds-Eye-View (BEV) feature map conditioned semantic rendering, leveraging the semantics of images for supervision through neural rendering. We further enhance our model's performance by incorporating point-wise masking with a high mask ratio (95%). Extensive experiments demonstrate PRED's superiority over prior point cloud pre-training methods, providing significant improvements on various large-scale datasets for 3D perception tasks. Codes will be available at https://github.com/PRED4pc/PRED.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Pairing-based graph neural network for simulating quantum materials
Authors:
Di Luo,
David D. Dai,
Liang Fu
Abstract:
We develop a pairing-based graph neural network for simulating quantum many-body systems. Our architecture augments a BCS-type geminal wavefunction with a generalized pair amplitude parameterized by a graph neural network. Variational Monte Carlo with our neural network simultaneously provides an accurate, flexible, and scalable method for simulating many-electron systems. We apply this method to…
▽ More
We develop a pairing-based graph neural network for simulating quantum many-body systems. Our architecture augments a BCS-type geminal wavefunction with a generalized pair amplitude parameterized by a graph neural network. Variational Monte Carlo with our neural network simultaneously provides an accurate, flexible, and scalable method for simulating many-electron systems. We apply this method to two-dimensional semiconductor electron-hole bilayers and obtain accurate results on a variety of interaction-induced phases, including the exciton Bose-Einstein condensate, electron-hole superconductor, and bilayer Wigner crystal. Our study demonstrates the potential of physically-motivated neural network wavefunctions for quantum materials simulations.
△ Less
Submitted 21 November, 2023; v1 submitted 3 November, 2023;
originally announced November 2023.
-
U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization
Authors:
Andrea Boscolo Camiletto,
Alfredo Bochicchio,
Alexander Liniger,
Dengxin Dai,
Abel Gawel
Abstract:
Efficient relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails. Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance and in turn, can benefit the relocalization of the vehicle. However, one downside of BEV methods is the heavy computation required to leverage the geometric…
▽ More
Efficient relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails. Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance and in turn, can benefit the relocalization of the vehicle. However, one downside of BEV methods is the heavy computation required to leverage the geometric constraints. This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features. We show that this extension boosts the performance of the U-BEV by up to 4.11 IoU. Additionally, we combine the encoded neural BEV with a differentiable template matcher to perform relocalization on neural SD-map data. The model is fully end-to-end trainable and outperforms transformer-based BEV methods of similar computational complexity by 1.7 to 2.8 mIoU and BEV-based relocalization by over 26% Recall Accuracy on the nuScenes dataset.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Higher-dimensional symmetric informationally complete measurement via programmable photonic integrated optics
Authors:
Lan-Tian Feng,
Xiao-Min Hu,
Ming Zhang,
Yu-Jie Cheng,
Chao Zhang,
Yu Guo,
Yu-Yang Ding,
Zhibo Hou,
Fang-Wen Sun,
Guang-Can Guo,
Dao-Xin Dai,
Armin Tavakoli,
Xi-Feng Ren,
Bi-Heng Liu
Abstract:
Symmetric informationally complete measurements are both important building blocks in many quantum information protocols and the seminal example of a generalised, non-orthogonal, quantum measurement. In higher-dimensional systems, these measurements become both increasingly interesting and increasingly complex to implement. Here, we demonstrate an integrated quantum photonic platform to realize su…
▽ More
Symmetric informationally complete measurements are both important building blocks in many quantum information protocols and the seminal example of a generalised, non-orthogonal, quantum measurement. In higher-dimensional systems, these measurements become both increasingly interesting and increasingly complex to implement. Here, we demonstrate an integrated quantum photonic platform to realize such a measurement on three-level quantum systems. The device operates at the high fidelities necessary for verifying a genuine many-outcome quantum measurement, performing near-optimal quantum state discrimination, and beating the projective limit in quantum random number generation. Moreover, it is programmable and can readily implement other quantum measurements at similarly high quality. Our work paves the way for the implementation of sophisticated higher-dimensional quantum measurements that go beyond the traditional orthogonal projections.
△ Less
Submitted 16 October, 2023; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Multigap nodeless superconductivity in the topological semimetal PdTe
Authors:
Chengcheng Zhao,
Xiangqi Liu,
Jinjin Wang,
Chunqiang Xu,
Baomin Wang,
Wei Xia,
Zhenhai Yu,
Xiaobo Jin,
Xu Zhang,
Jing Wang,
Dongzhe Dai,
Chengpeng Tu,
Jiaying Nie,
Hanru Wang,
Yihan Jiao,
Daniel Duong,
Silu Huang,
Rongying Jin,
Zhu'an Xu,
Yanfeng Guo,
Xiaofeng Xu,
Shiyan Li
Abstract:
Recently PdTe was identified as a spin-orbit coupled topological Dirac semimetal and was claimed to exhibit both bulk-nodal and surface-nodeless superconducting gaps. Here we report the ultralow-temperature thermal conductivity measurements on PdTe single crystals with $T_c$ = 4.5 K to investigate its superconducting gap structure. It is found that the residual linear term $κ_0/T$ is negligible in…
▽ More
Recently PdTe was identified as a spin-orbit coupled topological Dirac semimetal and was claimed to exhibit both bulk-nodal and surface-nodeless superconducting gaps. Here we report the ultralow-temperature thermal conductivity measurements on PdTe single crystals with $T_c$ = 4.5 K to investigate its superconducting gap structure. It is found that the residual linear term $κ_0/T$ is negligible in zero magnetic field. Furthermore, the field dependence of $κ_0(H)/T$ exhibits an $\sf S$-shaped curve. These results suggest that PdTe has multiple nodeless superconducting gaps, which is at odds with the claimed bulk-nodal gap. The reason for the discrepancy is likely that previous angle-resolved photoemission spectroscopy measurements were only performed down to 2 K and cannot observe the smaller nodeless gap. The fully gapped superconducting state in PdTe is compatible with it being a topological superconductor candidate.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning
Authors:
Zhe Yang,
Damai Dai,
Peiyi Wang,
Zhifang Sui
Abstract:
Large Language Models (LLMs) have recently gained the In-Context Learning (ICL) ability with the models scaling up, allowing them to quickly adapt to downstream tasks with only a few demonstration examples prepended in the input sequence. Nonetheless, the current practice of ICL treats all demonstration examples equally, which still warrants improvement, as the quality of examples is usually uneve…
▽ More
Large Language Models (LLMs) have recently gained the In-Context Learning (ICL) ability with the models scaling up, allowing them to quickly adapt to downstream tasks with only a few demonstration examples prepended in the input sequence. Nonetheless, the current practice of ICL treats all demonstration examples equally, which still warrants improvement, as the quality of examples is usually uneven. In this paper, we investigate how to determine approximately optimal weights for demonstration examples and how to apply them during ICL. To assess the quality of weights in the absence of additional validation data, we design a masked self-prediction (MSP) score that exhibits a strong correlation with the final ICL performance. To expedite the weight-searching process, we discretize the continuous weight space and adopt beam search. With approximately optimal weights obtained, we further propose two strategies to apply them to demonstrations at different model positions. Experimental results on 8 text classification tasks show that our approach outperforms conventional ICL by a large margin. Our code are publicly available at https:github.com/Zhe-Young/WICL.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Energy Stable and Structure-Preserving Schemes for the Stochastic Galerkin Shallow Water Equations
Authors:
Dihan Dai,
Yekaterina Epshteyn,
Akil Narayan
Abstract:
The shallow water flow model is widely used to describe water flows in rivers, lakes, and coastal areas. Accounting for uncertainty in the corresponding transport-dominated nonlinear PDE models presents theoretical and numerical challenges that motivate the central advances of this paper. Starting with a spatially one-dimensional hyperbolicity-preserving, positivity-preserving stochastic Galerkin…
▽ More
The shallow water flow model is widely used to describe water flows in rivers, lakes, and coastal areas. Accounting for uncertainty in the corresponding transport-dominated nonlinear PDE models presents theoretical and numerical challenges that motivate the central advances of this paper. Starting with a spatially one-dimensional hyperbolicity-preserving, positivity-preserving stochastic Galerkin formulation of the parametric/uncertain shallow water equations, we derive an entropy-entropy flux pair for the system. We exploit this entropy-entropy flux pair to construct structure-preserving second-order energy conservative, and first- and second-order energy stable finite volume schemes for the stochastic Galerkin shallow water system. The performance of the methods is illustrated on several numerical experiments.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Assessing the alignment accuracy of state-of-the-art deterministic fabrication methods for single quantum dot devices
Authors:
Abdulmalik A. Madigawa,
Jan N. Donges,
Benedek Gaál,
Shulun Li,
Martin Arentoft Jacobsen,
Hanqing Liu,
Deyan Dai,
Xiangbin Su,
Xiangjun Shang,
Haiqiao Ni,
Johannes Schall,
Sven Rodt,
Zhichuan Niu,
Niels Gregersen,
Stephan Reitzenstein,
Battulga Munkhbat
Abstract:
The realization of efficient quantum light sources relies on the integration of self-assembled quantum dots (QDs) into photonic nanostructures with high spatial positioning accuracy. In this work, we present a comprehensive investigation of the QD position accuracy, obtained using two marker-based QD positioning techniques, photoluminescence (PL) and cathodoluminescence (CL) imaging, as well as us…
▽ More
The realization of efficient quantum light sources relies on the integration of self-assembled quantum dots (QDs) into photonic nanostructures with high spatial positioning accuracy. In this work, we present a comprehensive investigation of the QD position accuracy, obtained using two marker-based QD positioning techniques, photoluminescence (PL) and cathodoluminescence (CL) imaging, as well as using a marker-free in-situ electron beam lithography (in-situ EBL) technique. We employ four PL imaging configurations with three different image processing approaches and compare them with CL imaging. We fabricate circular mesa structures based on the obtained QD coordinates from both PL and CL image processing to evaluate the final positioning accuracy. This yields final position offset of the QD relative to the mesa center of $μ_x$ = (-40$\pm$58) nm and $μ_y$ = (-39$\pm$85) nm with PL imaging and $μ_x$ = (-39$\pm$30) nm and $μ_y$ = (25$\pm$77) nm with CL imaging, which are comparable to the offset $μ_x$ = (20$\pm$40) nm and $μ_y$ = (-14$\pm$39) nm obtained using the in-situ EBL method. We discuss the possible causes of the observed offsets, which are significantly larger than the QD localization uncertainty obtained from simply imaging the QD light emission from an unstructured wafer. Our study highlights the influences of the image processing technique and the subsequent fabrication process on the final positioning accuracy for a QD placed inside a photonic nanostructure.
△ Less
Submitted 29 January, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.
-
Shedding new light on the absence of fermionic superradiance and maximal infalling rate of fermions into a black hole
Authors:
De-Chang Dai,
Dejan Stojkovic
Abstract:
Using the complete classification of the bases in the rotating black hole background we separate superradiance from the Hawking effect. We first find that there is spontaneous particle creation for fermions by the potential outside the black hole horizon for the frequencies inside the superradiant regime, i.e. $ω<kΩ_H$. However, these particles do not enhance the total flux from the black hole. Fo…
▽ More
Using the complete classification of the bases in the rotating black hole background we separate superradiance from the Hawking effect. We first find that there is spontaneous particle creation for fermions by the potential outside the black hole horizon for the frequencies inside the superradiant regime, i.e. $ω<kΩ_H$. However, these particles do not enhance the total flux from the black hole. For the superradiance particle to became real, its negative energy counterpart has to be canceled by the positive energy Hawking radiation mode at the horizon. Since due to the Pauli's principle this cancellation must be one-to-one, the superradiance effect cannot add anything to the total black hole flux. For an extremal black hole, the Hawking temperature is zero, horizon is not populated with thermal modes, and fermions can be emitted through the superradiance mechanism. On the other hand, a macroscopic flux of fermions infalling to the black hole is the opposite process of Hawking radiation. A positive energy-infalling particle must cancel out a negative energy thermal mode at the horizon, which leaves a net positive energy mode that crosses the horizon. Since there is finite thermal particle density at the horizon, this implies that there is a maximal fermion infalling rate which is also controlled by the Hawking temperature.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
Discwise Active Learning for LiDAR Semantic Segmentation
Authors:
Ozan Unal,
Dengxin Dai,
Ali Tamer Unal,
Luc Van Gool
Abstract:
While LiDAR data acquisition is easy, labeling for semantic segmentation remains highly time consuming and must therefore be done selectively. Active learning (AL) provides a solution that can iteratively and intelligently label a dataset while retaining high performance and a low budget. In this work we explore AL for LiDAR semantic segmentation. As a human expert is a component of the pipeline,…
▽ More
While LiDAR data acquisition is easy, labeling for semantic segmentation remains highly time consuming and must therefore be done selectively. Active learning (AL) provides a solution that can iteratively and intelligently label a dataset while retaining high performance and a low budget. In this work we explore AL for LiDAR semantic segmentation. As a human expert is a component of the pipeline, a practical framework must consider common labeling techniques such as sequential labeling that drastically improve annotation times. We therefore propose a discwise approach (DiAL), where in each iteration, we query the region a single frame covers on global coordinates, labeling all frames simultaneously. We then tackle the two major challenges that emerge with discwise AL. Firstly we devise a new acquisition function that takes 3D point density changes into consideration which arise due to location changes or ego-vehicle motion. Next we solve a mixed-integer linear program that provides a general solution to the selection of multiple frames while taking into consideration the possibilities of disc intersections. Finally we propose a semi-supervised learning approach to utilize all frames within our dataset and improve performance.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
Pressure-induced double-dome superconductivity in kagome metal CsTi3Bi5
Authors:
J. Y. Nie,
X. F. Yang,
X. Zhang,
X. Q. Liu,
W. Xia,
D. Z. Dai,
C. C. Zhao,
C. P. Tu,
X. M. Kong,
X. B. Jin,
Y. F. Guo,
S. Y. Li
Abstract:
We present high-pressure resistance measurements up to 40 GPa on recently discovered titanium-based kagome metal CsTi$_3$Bi$_5$. At ambient pressure, CsTi$_3$Bi$_5$ shows no evidence of superconductivity in resistivity and specific heat. By applying pressure, superconductivity emerges and the superconducting transition temperature ${\it T}_{\rm c}$ reaches its first maximum of 1.2 K at $\sim$5 GPa…
▽ More
We present high-pressure resistance measurements up to 40 GPa on recently discovered titanium-based kagome metal CsTi$_3$Bi$_5$. At ambient pressure, CsTi$_3$Bi$_5$ shows no evidence of superconductivity in resistivity and specific heat. By applying pressure, superconductivity emerges and the superconducting transition temperature ${\it T}_{\rm c}$ reaches its first maximum of 1.2 K at $\sim$5 GPa. Then the ${\it T}_{\rm c}$ is suppressed by pressure and cannot be detected around 10 GPa, manifesting as a superconducting dome. Remarkably, upon further increasing pressure above $\sim$13 GPa, another superconducting dome shows up, with the maximum ${\it T}_{\rm c}$ of 0.6 K and ending pressure at $\sim$36 GPa. The variation of ${\it T}_{\rm c}$ displays a clear double-dome shape in the superconducting phase diagram. Our work demonstrates the similarity between CsTi$_3$Bi$_5$ and CsV$_3$Sb$_5$, providing valuable insights into the rich physics of these novel kagome metals.
△ Less
Submitted 19 August, 2023;
originally announced August 2023.
-
PROV-IO+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems
Authors:
Runzhou Han,
Mai Zheng,
Suren Byna,
Houjun Tang,
Bin Dong,
Dong Dai,
Yong Chen,
Dongkyun Kim,
Joseph Hassoun,
David Thorsley,
Matthew Wolf
Abstract:
Data provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins of data products, usage patterns of datasets). Unfortunately, existing provenance solutions cannot address the challenges due to their incompatible provenance models and/or system implementations. In this paper, we analyze four represen…
▽ More
Data provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins of data products, usage patterns of datasets). Unfortunately, existing provenance solutions cannot address the challenges due to their incompatible provenance models and/or system implementations. In this paper, we analyze four representative scientific workflows in collaboration with the domain scientists to identify concrete provenance needs. Based on the first-hand analysis, we propose a provenance framework called PROV-IO+, which includes an I/O-centric provenance model for describing scientific data and the associated I/O operations and environments precisely. Moreover, we build a prototype of PROV-IO+ to enable end-to-end provenance support on real HPC systems with little manual effort. The PROV-IO+ framework can support both containerized and non-containerized workflows on different HPC platforms with flexibility in selecting various classes of provenance. Our experiments with realistic workflows show that PROV-IO+ can address the provenance needs of the domain scientists effectively with reasonable performance (e.g., less than 3.5% tracking overhead for most experiments). Moreover, PROV-IO+ outperforms a state-of-the-art system (i.e., ProvLake) in our experiments.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Strong-coupling phases of trions and excitons in electron-hole bilayers at commensurate densities
Authors:
David D. Dai,
Liang Fu
Abstract:
We introduce density imbalanced electron-hole bilayers at a commensurate 2 : 1 density ratio as a platform for realizing novel phases involving electrons, excitons and trions. Three length scales are identified which characterize the interplay between kinetic energy, intralayer repulsion, and interlayer attraction. By a combination of theoretical analysis and numerical calculation, we find a varie…
▽ More
We introduce density imbalanced electron-hole bilayers at a commensurate 2 : 1 density ratio as a platform for realizing novel phases involving electrons, excitons and trions. Three length scales are identified which characterize the interplay between kinetic energy, intralayer repulsion, and interlayer attraction. By a combination of theoretical analysis and numerical calculation, we find a variety of strong-coupling phases in different parameter regions, including quantum crystals of electrons, excitons, and trions. We also propose an "excitonic supersolid" phase that features electron crystallization and exciton superfluidity simultaneously. The material realization and experimental signature of these phases are discussed in the context of semiconductor transition metal dichalcogenide bilayers.
△ Less
Submitted 26 May, 2024; v1 submitted 1 August, 2023;
originally announced August 2023.
-
LiDAR Meta Depth Completion
Authors:
Wolfgang Boettcher,
Lukas Hoyer,
Ozan Unal,
Ke Li,
Dengxin Dai
Abstract:
Depth estimation is one of the essential tasks to be addressed when creating mobile autonomous systems. While monocular depth estimation methods have improved in recent times, depth completion provides more accurate and reliable depth maps by additionally using sparse depth information from other sensors such as LiDAR. However, current methods are specifically trained for a single LiDAR sensor. As…
▽ More
Depth estimation is one of the essential tasks to be addressed when creating mobile autonomous systems. While monocular depth estimation methods have improved in recent times, depth completion provides more accurate and reliable depth maps by additionally using sparse depth information from other sensors such as LiDAR. However, current methods are specifically trained for a single LiDAR sensor. As the scanning pattern differs between sensors, every new sensor would require re-training a specialized depth completion model, which is computationally inefficient and not flexible. Therefore, we propose to dynamically adapt the depth completion model to the used sensor type enabling LiDAR adaptive depth completion. Specifically, we propose a meta depth completion network that uses data patterns derived from the data to learn a task network to alter weights of the main depth completion network to solve a given depth completion task effectively. The method demonstrates a strong capability to work on multiple LiDAR scanning patterns and can also generalize to scanning patterns that are unseen during training. While using a single model, our method yields significantly better results than a non-adaptive baseline trained on different LiDAR patterns. It outperforms LiDAR-specific expert models for very sparse cases. These advantages allow flexible deployment of a single depth completion model on different sensors, which could also prove valuable to process the input of nascent LiDAR technology with adaptive instead of fixed scanning patterns.
△ Less
Submitted 16 August, 2023; v1 submitted 24 July, 2023;
originally announced July 2023.
-
Enabling Real-time Neural Recovery for Cloud Gaming on Mobile Devices
Authors:
Zhaoyuan He,
Yifan Yang,
Shuozhe Li,
Diyuan Dai,
Lili Qiu,
Yuqing Yang
Abstract:
Cloud gaming is a multi-billion dollar industry. A client in cloud gaming sends its movement to the game server on the Internet, which renders and transmits the resulting video back. In order to provide a good gaming experience, a latency below 80 ms is required. This means that video rendering, encoding, transmission, decoding, and display have to finish within that time frame, which is especiall…
▽ More
Cloud gaming is a multi-billion dollar industry. A client in cloud gaming sends its movement to the game server on the Internet, which renders and transmits the resulting video back. In order to provide a good gaming experience, a latency below 80 ms is required. This means that video rendering, encoding, transmission, decoding, and display have to finish within that time frame, which is especially challenging to achieve due to server overload, network congestion, and losses. In this paper, we propose a new method for recovering lost or corrupted video frames in cloud gaming. Unlike traditional video frame recovery, our approach uses game states to significantly enhance recovery accuracy and utilizes partially decoded frames to recover lost portions. We develop a holistic system that consists of (i) efficiently extracting game states, (ii) modifying H.264 video decoder to generate a mask to indicate which portions of video frames need recovery, and (iii) designing a novel neural network to recover either complete or partial video frames. Our approach is extensively evaluated using iPhone 12 and laptop implementations, and we demonstrate the utility of game states in the game video recovery and the effectiveness of our overall design.
△ Less
Submitted 22 October, 2023; v1 submitted 15 July, 2023;
originally announced July 2023.
-
Pressure-induced superconductivity in the van der Waals semiconductor violet phosphorus
Authors:
Y. Y. Wu,
L. Mu,
X. Zhang,
D. Z. Dai,
L. Xin,
X. M. Kong,
S. Y. Huang,
K. Meng,
X. F. Yang,
C. P. Tu,
J. M. Ni,
H. G. Yan,
S. Y. Li
Abstract:
The van der Waals (vdW) semiconductor black phosphorus has been widely studied, especially after the discovery of phosphorene. On the contrary, its sister compound violet phosphorus, also a vdW semiconductor, has been rarely studied. Here we report the pressure-induced superconductivity in violet phosphorus up to $\sim$40 GPa. The superconductivity emerges at 2.75 GPa, which is well below the stru…
▽ More
The van der Waals (vdW) semiconductor black phosphorus has been widely studied, especially after the discovery of phosphorene. On the contrary, its sister compound violet phosphorus, also a vdW semiconductor, has been rarely studied. Here we report the pressure-induced superconductivity in violet phosphorus up to $\sim$40 GPa. The superconductivity emerges at 2.75 GPa, which is well below the structural transition from monoclinic ($M$) to rhombohedral ($R$) structure at 8.5 GPa. The superconducting transition temperature ($T$$\rm_c$) shows a plateau of $\sim$7 K from 3.6 to 15 GPa, across the $M$ to $R$ structural transition, then jumps to another plateau of $\sim$10 K in the simple cubic ($C$) structure above 15 GPa. The temperature-pressure superconducting phase diagram of violet phosphorus is established, which is different from that of black phosphorus at low pressure. For black phosphorus, the superconductivity emerges until the structural transition from orthorhombic ($O$) to $R$ structure at $\sim$5 GPa, with a lower $T$$\rm_c$ than violet phosphorus. The pressure-induced superconductivity in violet phosphorus demonstrates its tunable electronic properties, and more electronics and optoelectronic applications are expected from this stable vdW semiconductor at ambient conditions.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and Guided Intention Querying
Authors:
Shaoshuai Shi,
Li Jiang,
Dengxin Dai,
Bernt Schiele
Abstract:
Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions. However, this task is challenging due to the diverse behaviors of traffic participants and complex environmental contexts. In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges. The initial MTR framework utilizes a transformer encoder-…
▽ More
Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions. However, this task is challenging due to the diverse behaviors of traffic participants and complex environmental contexts. In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges. The initial MTR framework utilizes a transformer encoder-decoder structure with learnable intention queries, enabling efficient and accurate prediction of future trajectories. By customizing intention queries for distinct motion modalities, MTR improves multimodal motion prediction while reducing reliance on dense goal candidates. The framework comprises two essential processes: global intention localization, identifying the agent's intent to enhance overall efficiency, and local movement refinement, adaptively refining predicted trajectories for improved accuracy. Moreover, we introduce an advanced MTR++ framework, extending the capability of MTR to simultaneously predict multimodal motion for multiple agents. MTR++ incorporates symmetric context modeling and mutually-guided intention querying modules to facilitate future behavior interaction among multiple agents, resulting in scene-compliant future trajectories. Extensive experimental results demonstrate that the MTR framework achieves state-of-the-art performance on the highly-competitive motion prediction benchmarks, while the MTR++ framework surpasses its precursor, exhibiting enhanced performance and efficiency in predicting accurate multimodal future trajectories for multiple agents.
△ Less
Submitted 9 March, 2024; v1 submitted 30 June, 2023;
originally announced June 2023.
-
Separating the superradiant emission from the Hawking radiation from a rotating black hole
Authors:
De-Chang Dai,
Dejan Stojkovic
Abstract:
Emission of particles created in the background of a rotating black hole can be greatly amplified taking away rotational energy of a black hole. This amplification affects both particles created near the horizon (due to the Hawing effect), and particles created near the potential barrier far from the horizon. Only the latter effect is called the superradiance in the strict sense. We explicitly cal…
▽ More
Emission of particles created in the background of a rotating black hole can be greatly amplified taking away rotational energy of a black hole. This amplification affects both particles created near the horizon (due to the Hawing effect), and particles created near the potential barrier far from the horizon. Only the latter effect is called the superradiance in the strict sense. We explicitly calculate the superradiant emission for scalar particles and compare it with the total scalar particle emission (Hawking radiation plus superradiance) to clarify some confusion in the literature. We clearly show that these two emissions are not the same. In particular, superradiance persists even for extremal black holes whose Hawking temperature is zero.
△ Less
Submitted 15 July, 2023; v1 submitted 30 June, 2023;
originally announced June 2023.