-
MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning
Authors:
Yichuan Li,
Xiyao Ma,
Sixing Lu,
Kyumin Lee,
Xiaohu Liu,
Chenlei Guo
Abstract:
Large Language models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities, where a LLM makes predictions for a given test input together with a few input-output pairs (demonstrations). Nevertheless, the inclusion of demonstrations leads to a quadratic increase in the computational overhead of the self-attention mechanism. Existing solutions attempt to distill lengthy demonst…
▽ More
Large Language models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities, where a LLM makes predictions for a given test input together with a few input-output pairs (demonstrations). Nevertheless, the inclusion of demonstrations leads to a quadratic increase in the computational overhead of the self-attention mechanism. Existing solutions attempt to distill lengthy demonstrations into compact vectors. However, they often require task-specific retraining or compromise LLM's in-context learning performance. To mitigate these challenges, we present Meta dEmonstratioN Distillation (MEND), where a language model learns to distill any lengthy demonstrations into vectors without retraining for a new downstream task. We exploit the knowledge distillation to enhance alignment between MEND and LLM, achieving both efficiency and effectiveness simultaneously. MEND is endowed with the meta-knowledge of distilling demonstrations through a two-stage training process, which includes meta-distillation pretraining and fine-tuning. Comprehensive evaluations across seven diverse ICL task partitions using decoder-only (GPT-2) and encoder-decoder (T5) attest to MEND's prowess. It not only matches but often outperforms the Vanilla ICL as well as other state-of-the-art distillation models, while significantly reducing the computational demands. This innovation promises enhanced scalability and efficiency for the practical deployment of large language models
△ Less
Submitted 12 March, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
PR-NET: Leveraging Pathway Refined Network Structures for Prostate Cancer Patient Condition Prediction
Authors:
R. Li,
J. Liu,
X. L. Deng,
X. Liu,
J. C. Guo,
W. Y. Wu,
L. Yang
Abstract:
The diagnosis and monitoring of Castrate Resistant Prostate Cancer (CRPC) are crucial for cancer patients, but the current models (such as P-NET) have limitations in terms of parameter count, generalization, and cost. To address the issue, we develop a more accurate and efficient Prostate Cancer patient condition prediction model, named PR-NET. By compressing and optimizing the network structure o…
▽ More
The diagnosis and monitoring of Castrate Resistant Prostate Cancer (CRPC) are crucial for cancer patients, but the current models (such as P-NET) have limitations in terms of parameter count, generalization, and cost. To address the issue, we develop a more accurate and efficient Prostate Cancer patient condition prediction model, named PR-NET. By compressing and optimizing the network structure of P-NET, the model complexity is reduced while maintaining high accuracy and interpretability. The PR-NET demonstrated superior performance in predicting prostate cancer patient outcomes, outshining P-NET and six other traditional models with a significant margin. In our rigorous evaluation, PR-NET not only achieved impressive average AUC and Recall scores of 0.94 and 0.83, respectively, on known data but also maintained robust generalizability on five unknown datasets with a higher average AUC of 0.73 and Recall of 0.72, compared to P-NET's 0.68 and 0.5. PR-NET's efficiency was evidenced by its shorter average training and inference times, and its gene-level analysis revealed 46 key genes, demonstrating its enhanced predictive power and efficiency in identifying critical biomarkers for prostate cancer. Future research can further expand its application domains and optimize the model's performance and reliability.
△ Less
Submitted 12 March, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
Algorithmic progress in language models
Authors:
Anson Ho,
Tamay Besiroglu,
Ege Erdil,
David Owen,
Robi Rahman,
Zifan Carl Guo,
David Atkinson,
Neil Thompson,
Jaime Sevilla
Abstract:
We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months,…
▽ More
We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months, substantially faster than hardware gains per Moore's Law. We estimate augmented scaling laws, which enable us to quantify algorithmic progress and determine the relative contributions of scaling models versus innovations in training algorithms. Despite the rapid pace of algorithmic progress and the development of new architectures such as the transformer, our analysis reveals that the increase in compute made an even larger contribution to overall performance improvements over this time period. Though limited by noisy benchmark data, our analysis quantifies the rapid progress in language modeling, shedding light on the relative contributions from compute and algorithms.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Privacy Amplification for the Gaussian Mechanism via Bounded Support
Authors:
Shengyuan Hu,
Saeed Mahloujifar,
Virginia Smith,
Kamalika Chaudhuri,
Chuan Guo
Abstract:
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset. These guarantees can be desirable compared to vanilla DP in real world settings as they tightly upper-bound the privacy leakage for a $\textit{specific}$ individual in an $\textit{actual}$…
▽ More
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset. These guarantees can be desirable compared to vanilla DP in real world settings as they tightly upper-bound the privacy leakage for a $\textit{specific}$ individual in an $\textit{actual}$ dataset, rather than considering worst-case datasets. While these frameworks are beginning to gain popularity, to date, there is a lack of private mechanisms that can fully leverage advantages of data-dependent accounting. To bridge this gap, we propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting. Experiments on model training with DP-SGD show that using bounded support Gaussian mechanisms can provide a reduction of the pDP bound $ε$ by as much as 30% without negative effects on model utility.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Scalable and Versatile Linear Computation with Minimalistic Photonic Matrix Processor
Authors:
Zhaoang Deng,
Zhenhua Li,
Jie Liu,
Chuyao Bian,
Jiaqing Li,
Ranfeng Gan,
Zihao Chen,
Kaixuan Chen,
Changjian Guo,
Liu Liu,
Siyuan Yu
Abstract:
The advancement of artificial intelligence demands flexible multimodal data processing with high throughput and energy efficiency. Photonic integrated circuits (PIC) has demonstrated promising potentials in terms of low latency and low power consumption per operation for linear operations such as matrix-vector multiplication. However, the existing schemes face challenges in their scalability due t…
▽ More
The advancement of artificial intelligence demands flexible multimodal data processing with high throughput and energy efficiency. Photonic integrated circuits (PIC) has demonstrated promising potentials in terms of low latency and low power consumption per operation for linear operations such as matrix-vector multiplication. However, the existing schemes face challenges in their scalability due to the use of photonic circuits that expand with the scale of the operants, despite efforts of exploiting the multiple optical parameter dimensions such as time, wavelength and spatial parallelism. They also lacked flexibility and efficiency in switching between different types of operations or tasks and adapting to multimodal data. In this article, we introduce an optical matrix processor (MP) with a minimalistic recursive structure for both multiplications and accumulations. The MP consists of an eletro-optic ring-modulator implemented as a thin-film lithium niobate PIC that allows flexible configurability and time-division multiplexed scheduling. The MP supports not only versatile linear operations including vector/matrix-vector multiplication and single/multi-kernel convolution but also ultrafast task switching and adaptability to data of different sizes, by simply adjusting the data baud rate relative to the ring delay without structural modifications. We demonstrate its capabilities in a optic-electronic convolutional neural network with a computing throughput up to 73.4 billion operations per second. The MP further supports high scalability through appropriate allocation of wavelength and space resources,extending computing parallelism to handle higher data volumes with higher energy efficiency. This novel scheme paves the way for a new class of photonic processorscapable of managing escalating data workloads with unprecedented flexibility, efficiency and scalability.
△ Less
Submitted 21 May, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Differentially Private Representation Learning via Image Captioning
Authors:
Tom Sander,
Yaodong Yu,
Maziar Sanjabi,
Alain Durmus,
Yi Ma,
Kamalika Chaudhuri,
Chuan Guo
Abstract:
Differentially private (DP) machine learning is considered the gold-standard solution for training a model from sensitive data while still preserving privacy. However, a major barrier to achieving this ideal is its sub-optimal privacy-accuracy trade-off, which is particularly visible in DP representation learning. Specifically, it has been shown that under modest privacy budgets, most models learn…
▽ More
Differentially private (DP) machine learning is considered the gold-standard solution for training a model from sensitive data while still preserving privacy. However, a major barrier to achieving this ideal is its sub-optimal privacy-accuracy trade-off, which is particularly visible in DP representation learning. Specifically, it has been shown that under modest privacy budgets, most models learn representations that are not significantly better than hand-crafted features. In this work, we show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets. Through a series of engineering tricks, we successfully train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch using a reasonable amount of computation, and obtaining unprecedented high-quality image features that can be used in a variety of downstream vision and vision-language tasks. For example, under a privacy budget of $\varepsilon=8$, a linear classifier trained on top of learned DP-Cap features attains 65.8% accuracy on ImageNet-1K, considerably improving the previous SOTA of 56.5%. Our work challenges the prevailing sentiment that high-utility DP representation learning cannot be achieved by training from scratch.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Dual-Context Aggregation for Universal Image Matting
Authors:
Qinglin Liu,
Xiaoqian Lv,
Wei Yu,
Changyong Guo,
Shengping Zhang
Abstract:
Natural image matting aims to estimate the alpha matte of the foreground from a given image. Various approaches have been explored to address this problem, such as interactive matting methods that use guidance such as click or trimap, and automatic matting methods tailored to specific objects. However, existing matting methods are designed for specific objects or guidance, neglecting the common re…
▽ More
Natural image matting aims to estimate the alpha matte of the foreground from a given image. Various approaches have been explored to address this problem, such as interactive matting methods that use guidance such as click or trimap, and automatic matting methods tailored to specific objects. However, existing matting methods are designed for specific objects or guidance, neglecting the common requirement of aggregating global and local contexts in image matting. As a result, these methods often encounter challenges in accurately identifying the foreground and generating precise boundaries, which limits their effectiveness in unforeseen scenarios. In this paper, we propose a simple and universal matting framework, named Dual-Context Aggregation Matting (DCAM), which enables robust image matting with arbitrary guidance or without guidance. Specifically, DCAM first adopts a semantic backbone network to extract low-level features and context features from the input image and guidance. Then, we introduce a dual-context aggregation network that incorporates global object aggregators and local appearance aggregators to iteratively refine the extracted context features. By performing both global contour segmentation and local boundary refinement, DCAM exhibits robustness to diverse types of guidance and objects. Finally, we adopt a matting decoder network to fuse the low-level features and the refined context features for alpha matte estimation. Experimental results on five matting datasets demonstrate that the proposed DCAM outperforms state-of-the-art matting methods in both automatic matting and interactive matting tasks, which highlights the strong universality and high performance of DCAM. The source code is available at \url{https://github.com/Windaway/DCAM}.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
How to Sustain a Scientific Open-Source Software Ecosystem: Learning from the Astropy Project
Authors:
Jiayi Sun,
Aarya Patil,
Youhai Li,
Jin L. C. Guo,
Shurui Zhou
Abstract:
Scientific open-source software (OSS) has greatly benefited research communities through its transparent and collaborative nature. Given its critical role in scientific research, ensuring the sustainability of such software has become vital. Earlier studies have proposed sustainability strategies for conventional scientific software and open-source communities. However, it remains unclear whether…
▽ More
Scientific open-source software (OSS) has greatly benefited research communities through its transparent and collaborative nature. Given its critical role in scientific research, ensuring the sustainability of such software has become vital. Earlier studies have proposed sustainability strategies for conventional scientific software and open-source communities. However, it remains unclear whether these solutions can be easily adapted to the integrated framework of scientific OSS and its larger ecosystem. This study examines the challenges and opportunities to enhance the sustainability of scientific OSS in the context of interdisciplinary collaboration, open-source community, and multi-project ecosystem. We conducted a case study on a widely-used software ecosystem in the astrophysics domain, the Astropy Project, using a mixed-methods design approach. This approach includes an interview with core contributors regarding their participation in an interdisciplinary team, a survey of disengaged contributors about their motivations for contribution, reasons for disengagement, and suggestions for sustaining the communities, and finally, an analysis of cross-referenced issues and pull requests to understand best practices for collaboration on the ecosystem level. Our study reveals the implications of major challenges for sustaining scientific OSS and proposes concrete suggestions for tackling these challenges.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Efficient construction of the Feynman-Vernon influence functional as matrix product states
Authors:
Chu Guo,
Ruofan Chen
Abstract:
The time-evolving matrix product operator (TEMPO) method has become a very competitive numerical method for studying the real-time dynamics of quantum impurity problems. For small impurities, the most challenging calculation in TEMPO is to construct the matrix product state representation of the Feynman-Vernon influence functional. In this work we propose an efficient method for this task, which e…
▽ More
The time-evolving matrix product operator (TEMPO) method has become a very competitive numerical method for studying the real-time dynamics of quantum impurity problems. For small impurities, the most challenging calculation in TEMPO is to construct the matrix product state representation of the Feynman-Vernon influence functional. In this work we propose an efficient method for this task, which exploits the time-translationally invariant property of the influence functional. The required number of matrix product state multiplication in our method is almost independent of the total evolution time, as compared to the method originally used in TEMPO which requires a linearly scaling number of multiplications. The accuracy and efficiency of this method are demonstrated for the Toulouse model and the single impurity Anderson model.
△ Less
Submitted 21 June, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Developing a $μ$Bq/m$^{3}$ level $^{226}$Ra concentration in water measurement system for the Jiangmen Underground Neutrino Observatory
Authors:
C. Li,
B. Wang,
Y. Liu,
C. Guo,
Y. P. Zhang,
J. C. Liu,
Q. Tang,
T. Y. Guan,
C. G. Yang
Abstract:
The Jiangmen Underground Neutrino Observatory (JUNO), a 20~kton multi-purpose low background Liquid Scintillator (LS) detector, was proposed primarily to determine the neutrino mass ordering. To suppress the radioactivity from the surrounding rocks and tag cosmic muons, the JUNO central detector is submerged in a Water Cherenkov Detector (WCD). In addition to being used in the WCD, ultrapure water…
▽ More
The Jiangmen Underground Neutrino Observatory (JUNO), a 20~kton multi-purpose low background Liquid Scintillator (LS) detector, was proposed primarily to determine the neutrino mass ordering. To suppress the radioactivity from the surrounding rocks and tag cosmic muons, the JUNO central detector is submerged in a Water Cherenkov Detector (WCD). In addition to being used in the WCD, ultrapure water is used in LS filling, for which the $^{226}$Ra concentration in water needs to be less than 50~$μ$Bq/m$^3$. To precisely measure the $^{226}$Ra concentration in water, a 6.0~$μ$Bq/m$^3$ $^{226}$Ra concentration in water measurement system has been developed. In this paper, the detail of the measurement system as well as the $^{226}$Ra concentration measurement result in regular EWII ultrapure water will be presented.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
TransGOP: Transformer-Based Gaze Object Prediction
Authors:
Binglu Wang,
Chenxi Guo,
Yang Jin,
Haisheng Xia,
Nian Liu
Abstract:
Gaze object prediction aims to predict the location and category of the object that is watched by a human. Previous gaze object prediction works use CNN-based object detectors to predict the object's location. However, we find that Transformer-based object detectors can predict more accurate object location for dense objects in retail scenarios. Moreover, the long-distance modeling capability of t…
▽ More
Gaze object prediction aims to predict the location and category of the object that is watched by a human. Previous gaze object prediction works use CNN-based object detectors to predict the object's location. However, we find that Transformer-based object detectors can predict more accurate object location for dense objects in retail scenarios. Moreover, the long-distance modeling capability of the Transformer can help to build relationships between the human head and the gaze object, which is important for the GOP task. To this end, this paper introduces Transformer into the fields of gaze object prediction and proposes an end-to-end Transformer-based gaze object prediction method named TransGOP. Specifically, TransGOP uses an off-the-shelf Transformer-based object detector to detect the location of objects and designs a Transformer-based gaze autoencoder in the gaze regressor to establish long-distance gaze relationships. Moreover, to improve gaze heatmap regression, we propose an object-to-gaze cross-attention mechanism to let the queries of the gaze autoencoder learn the global-memory position knowledge from the object detector. Finally, to make the whole framework end-to-end trained, we propose a Gaze Box loss to jointly optimize the object detector and gaze regressor by enhancing the gaze heatmap energy in the box of the gaze object. Extensive experiments on the GOO-Synth and GOO-Real datasets demonstrate that our TransGOP achieves state-of-the-art performance on all tracks, i.e., object detection, gaze estimation, and gaze object prediction. Our code will be available at https://github.com/chenxi-Guo/TransGOP.git.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Accelerating Sparse DNNs Based on Tiled GEMM
Authors:
Cong Guo,
Fengchen Xue,
Jingwen Leng,
Yuxian Qiu,
Yue Guan,
Weihao Cui,
Quan Chen,
Minyi Guo
Abstract:
Network pruning can reduce the computation cost of deep neural network (DNN) models. However, sparse models often produce randomly-distributed weights to maintain accuracy, leading to irregular computations. Consequently, unstructured sparse models cannot achieve meaningful speedup on commodity hardware built for dense matrix computations. Accelerators are usually modified or designed with structu…
▽ More
Network pruning can reduce the computation cost of deep neural network (DNN) models. However, sparse models often produce randomly-distributed weights to maintain accuracy, leading to irregular computations. Consequently, unstructured sparse models cannot achieve meaningful speedup on commodity hardware built for dense matrix computations. Accelerators are usually modified or designed with structured sparsity-optimized architectures for exploiting sparsity. For example, the Ampere architecture introduces a sparse tensor core, which adopts the 2:4 sparsity pattern.
We propose a pruning method that builds upon the insight that matrix multiplication generally breaks the large matrix into multiple smaller tiles for parallel execution. We present the tile-wise sparsity pattern, which maintains a structured sparsity pattern at the tile level for efficient execution but allows for irregular pruning at the global scale to maintain high accuracy. In addition, the tile-wise sparsity is implemented at the global memory level, and the 2:4 sparsity executes at the register level inside the sparse tensor core. We can combine these two patterns into a tile-vector-wise (TVW) sparsity pattern to explore more fine-grained sparsity and further accelerate the sparse DNN models. We evaluate the TVW on the GPU, achieving averages of $1.85\times$, $2.75\times$, and $22.18\times$ speedups over the dense model, block sparsity, and unstructured sparsity.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Uncertainty, Calibration, and Membership Inference Attacks: An Information-Theoretic Perspective
Authors:
Meiyi Zhu,
Caili Guo,
Chunyan Feng,
Osvaldo Simeone
Abstract:
In a membership inference attack (MIA), an attacker exploits the overconfidence exhibited by typical machine learning models to determine whether a specific data point was used to train a target model. In this paper, we analyze the performance of the state-of-the-art likelihood ratio attack (LiRA) within an information-theoretical framework that allows the investigation of the impact of the aleato…
▽ More
In a membership inference attack (MIA), an attacker exploits the overconfidence exhibited by typical machine learning models to determine whether a specific data point was used to train a target model. In this paper, we analyze the performance of the state-of-the-art likelihood ratio attack (LiRA) within an information-theoretical framework that allows the investigation of the impact of the aleatoric uncertainty in the true data generation process, of the epistemic uncertainty caused by a limited training data set, and of the calibration level of the target model. We compare three different settings, in which the attacker receives decreasingly informative feedback from the target model: confidence vector (CV) disclosure, in which the output probability vector is released; true label confidence (TLC) disclosure, in which only the probability assigned to the true label is made available by the model; and decision set (DS) disclosure, in which an adaptive prediction set is produced as in conformal prediction. We derive bounds on the advantage of an MIA adversary with the aim of offering insights into the impact of uncertainty and calibration on the effectiveness of MIAs. Simulation results demonstrate that the derived analytical bounds predict well the effectiveness of MIAs.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Pathformer: Multi-scale Transformers with Adaptive Pathways for Time Series Forecasting
Authors:
Peng Chen,
Yingying Zhang,
Yunyao Cheng,
Yang Shu,
Yihang Wang,
Qingsong Wen,
Bin Yang,
Chenjuan Guo
Abstract:
Transformers for time series forecasting mainly model time series from limited or fixed scales, making it challenging to capture different characteristics spanning various scales. We propose Pathformer, a multi-scale Transformer with adaptive pathways. It integrates both temporal resolution and temporal distance for multi-scale modeling. Multi-scale division divides the time series into different…
▽ More
Transformers for time series forecasting mainly model time series from limited or fixed scales, making it challenging to capture different characteristics spanning various scales. We propose Pathformer, a multi-scale Transformer with adaptive pathways. It integrates both temporal resolution and temporal distance for multi-scale modeling. Multi-scale division divides the time series into different temporal resolutions using patches of various sizes. Based on the division of each scale, dual attention is performed over these patches to capture global correlations and local details as temporal dependencies. We further enrich the multi-scale Transformer with adaptive pathways, which adaptively adjust the multi-scale modeling process based on the varying temporal dynamics of the input, improving the accuracy and generalization of Pathformer. Extensive experiments on eleven real-world datasets demonstrate that Pathformer not only achieves state-of-the-art performance by surpassing all current models but also exhibits stronger generalization abilities under various transfer scenarios. The code is made available at https://github.com/decisionintelligence/pathformer.
△ Less
Submitted 6 March, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Opening the AI black box: program synthesis via mechanistic interpretability
Authors:
Eric J. Michaud,
Isaac Liao,
Vedang Lad,
Ziming Liu,
Anish Mudide,
Chloe Loughridge,
Zifan Carl Guo,
Tara Rezaei Kheirkhah,
Mateja Vukelić,
Max Tegmark
Abstract:
We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by G…
▽ More
We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by GPT-4 (which also solves 30). MIPS uses an integer autoencoder to convert the RNN into a finite state machine, then applies Boolean or integer symbolic regression to capture the learned algorithm. As opposed to large language models, this program synthesis technique makes no use of (and is therefore not limited by) human training data such as algorithms and code from GitHub. We discuss opportunities and challenges for scaling up this approach to make machine-learned models more interpretable and trustworthy.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
CoFiNet: Unveiling Camouflaged Objects with Multi-Scale Finesse
Authors:
Cunhan Guo,
Heyan Huang
Abstract:
Camouflaged Object Detection (COD) is a critical aspect of computer vision aimed at identifying concealed objects, with applications spanning military, industrial, medical and monitoring domains. To address the problem of poor detail segmentation effect, we introduce a novel method for camouflage object detection, named CoFiNet. Our approach primarily focuses on multi-scale feature fusion and extr…
▽ More
Camouflaged Object Detection (COD) is a critical aspect of computer vision aimed at identifying concealed objects, with applications spanning military, industrial, medical and monitoring domains. To address the problem of poor detail segmentation effect, we introduce a novel method for camouflage object detection, named CoFiNet. Our approach primarily focuses on multi-scale feature fusion and extraction, with special attention to the model's segmentation effectiveness for detailed features, enhancing its ability to effectively detect camouflaged objects. CoFiNet adopts a coarse-to-fine strategy. A multi-scale feature integration module is laveraged to enhance the model's capability of fusing context feature. A multi-activation selective kernel module is leveraged to grant the model the ability to autonomously alter its receptive field, enabling it to selectively choose an appropriate receptive field for camouflaged objects of different sizes. During mask generation, we employ the dual-mask strategy for image segmentation, separating the reconstruction of coarse and fine masks, which significantly enhances the model's learning capacity for details. Comprehensive experiments were conducted on four different datasets, demonstrating that CoFiNet achieves state-of-the-art performance across all datasets. The experiment results of CoFiNet underscore its effectiveness in camouflage object detection and highlight its potential in various practical application scenarios.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Déjà Vu Memorization in Vision-Language Models
Authors:
Bargav Jayaraman,
Chuan Guo,
Kamalika Chaudhuri
Abstract:
Vision-Language Models (VLMs) have emerged as the state-of-the-art representation learning solution, with myriads of downstream applications such as image classification, retrieval and generation. A natural question is whether these models memorize their training data, which also has implications for generalization. We propose a new method for measuring memorization in VLMs, which we call déjà vu…
▽ More
Vision-Language Models (VLMs) have emerged as the state-of-the-art representation learning solution, with myriads of downstream applications such as image classification, retrieval and generation. A natural question is whether these models memorize their training data, which also has implications for generalization. We propose a new method for measuring memorization in VLMs, which we call déjà vu memorization. For VLMs trained on image-caption pairs, we show that the model indeed retains information about individual objects in the training images beyond what can be inferred from correlations or the image caption. We evaluate déjà vu memorization at both sample and population level, and show that it is significant for OpenCLIP trained on as many as 50M image-caption pairs. Finally, we show that text randomization considerably mitigates memorization while only moderately impacting the model's downstream task performance.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Multiple intermediate phases in the interpolating Aubry-André-Fibonacci model
Authors:
Chenyue Guo
Abstract:
We investigate a generalized interpolating Aubry-André-Fibonacci (IAAF) model with p-wave superconducting pairing. In the Aubry-André limit, we demonstrate that the system experiences transitions from a pure phase, either extended or critical, to a variety of intermediate phases and ultimately enters a localized phase with increasing potential strength. These intermediate phases include those with…
▽ More
We investigate a generalized interpolating Aubry-André-Fibonacci (IAAF) model with p-wave superconducting pairing. In the Aubry-André limit, we demonstrate that the system experiences transitions from a pure phase, either extended or critical, to a variety of intermediate phases and ultimately enters a localized phase with increasing potential strength. These intermediate phases include those with coexisting extended and localized states, extended and critical states, localized and critical states and a mix of extended, critical and localized states. Each intermediate phase exhibits at least one type of mobility edge separating different states. As the system approaches the Fibonacci limit, both the extended and localized phases diminish, and the system tends towards a critical phase.
△ Less
Submitted 7 April, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
A Survey on Data Augmentation in Large Model Era
Authors:
Yue Zhou,
Chenlu Guo,
Xu Wang,
Yi Chang,
Yuan Wu
Abstract:
Large models, encompassing large language and diffusion models, have shown exceptional promise in approximating human-level intelligence, garnering significant interest from both academic and industrial spheres. However, the training of these large models necessitates vast quantities of high-quality data, and with continuous updates to these models, the existing reservoir of high-quality data may…
▽ More
Large models, encompassing large language and diffusion models, have shown exceptional promise in approximating human-level intelligence, garnering significant interest from both academic and industrial spheres. However, the training of these large models necessitates vast quantities of high-quality data, and with continuous updates to these models, the existing reservoir of high-quality data may soon be depleted. This challenge has catalyzed a surge in research focused on data augmentation methods. Leveraging large models, these data augmentation techniques have outperformed traditional approaches. This paper offers an exhaustive review of large model-driven data augmentation methods, adopting a comprehensive perspective. We begin by establishing a classification of relevant studies into three main categories: image augmentation, text augmentation, and paired data augmentation. Following this, we delve into various data post-processing techniques pertinent to large model-based data augmentation. Our discussion then expands to encompass the array of applications for these data augmentation methods within natural language processing, computer vision, and audio signal processing. We proceed to evaluate the successes and limitations of large model-based data augmentation across different scenarios. Concluding our review, we highlight prospective challenges and avenues for future exploration in the field of data augmentation. Our objective is to furnish researchers with critical insights, ultimately contributing to the advancement of more sophisticated large models. We consistently maintain the related open-source materials at: https://github.com/MLGroup-JLU/LLM-data-aug-survey.
△ Less
Submitted 4 March, 2024; v1 submitted 27 January, 2024;
originally announced January 2024.
-
A Survey on Indoor Visible Light Positioning Systems: Fundamentals, Applications, and Challenges
Authors:
Zhiyu Zhu,
Yang Yang,
Mingzhe Chen,
Caili Guo,
Julian Cheng,
Shuguang Cui
Abstract:
The growing demand for location-based services in areas like virtual reality, robot control, and navigation has intensified the focus on indoor localization. Visible light positioning (VLP), leveraging visible light communications (VLC), becomes a promising indoor positioning technology due to its high accuracy and low cost. This paper provides a comprehensive survey of VLP systems. In particular,…
▽ More
The growing demand for location-based services in areas like virtual reality, robot control, and navigation has intensified the focus on indoor localization. Visible light positioning (VLP), leveraging visible light communications (VLC), becomes a promising indoor positioning technology due to its high accuracy and low cost. This paper provides a comprehensive survey of VLP systems. In particular, since VLC lays the foundation for VLP, we first present a detailed overview of the principles of VLC. The performance of each positioning algorithm is also compared in terms of various metrics such as accuracy, coverage, and orientation limitation. Beyond the physical layer studies, the network design for a VLP system is also investigated, including multi-access technologies resource allocation, and light-emitting diode (LED) placements. Next, the applications of the VLP systems are overviewed. Finally, this paper outlines open issues, challenges, and future research directions for the research field. In a nutshell, this paper constitutes the first holistic survey on VLP from state-of-the-art studies to practical uses.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Generative Human Motion Stylization in Latent Space
Authors:
Chuan Guo,
Yuxuan Mu,
Xinxin Zuo,
Peng Dai,
Youliang Yan,
Juwei Lu,
Li Cheng
Abstract:
Human motion stylization aims to revise the style of an input motion while keeping its content unaltered. Unlike existing works that operate directly in pose space, we leverage the latent space of pretrained autoencoders as a more expressive and robust representation for motion extraction and infusion. Building upon this, we present a novel generative model that produces diverse stylization result…
▽ More
Human motion stylization aims to revise the style of an input motion while keeping its content unaltered. Unlike existing works that operate directly in pose space, we leverage the latent space of pretrained autoencoders as a more expressive and robust representation for motion extraction and infusion. Building upon this, we present a novel generative model that produces diverse stylization results of a single motion (latent) code. During training, a motion code is decomposed into two coding components: a deterministic content code, and a probabilistic style code adhering to a prior distribution; then a generator massages the random combination of content and style codes to reconstruct the corresponding motion codes. Our approach is versatile, allowing the learning of probabilistic style space from either style labeled or unlabeled motions, providing notable flexibility in stylization as well. In inference, users can opt to stylize a motion using style cues from a reference motion or a label. Even in the absence of explicit style input, our model facilitates novel re-stylization by sampling from the unconditional style prior distribution. Experimental results show that our proposed stylization models, despite their lightweight design, outperform the state-of-the-art in style reenactment, content preservation, and generalization across various applications and settings. Project Page: https://murrol.github.io/GenMoStyle
△ Less
Submitted 23 February, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
EL-VIT: Probing Vision Transformer with Interactive Visualization
Authors:
Hong Zhou,
Rui Zhang,
Peifeng Lai,
Chaoran Guo,
Yong Wang,
Zhida Sun,
Junjie Li
Abstract:
Nowadays, Vision Transformer (ViT) is widely utilized in various computer vision tasks, owing to its unique self-attention mechanism. However, the model architecture of ViT is complex and often challenging to comprehend, leading to a steep learning curve. ViT developers and users frequently encounter difficulties in interpreting its inner workings. Therefore, a visualization system is needed to as…
▽ More
Nowadays, Vision Transformer (ViT) is widely utilized in various computer vision tasks, owing to its unique self-attention mechanism. However, the model architecture of ViT is complex and often challenging to comprehend, leading to a steep learning curve. ViT developers and users frequently encounter difficulties in interpreting its inner workings. Therefore, a visualization system is needed to assist ViT users in understanding its functionality. This paper introduces EL-VIT, an interactive visual analytics system designed to probe the Vision Transformer and facilitate a better understanding of its operations. The system consists of four layers of visualization views. The first three layers include model overview, knowledge background graph, and model detail view. These three layers elucidate the operation process of ViT from three perspectives: the overall model architecture, detailed explanation, and mathematical operations, enabling users to understand the underlying principles and the transition process between layers. The fourth interpretation view helps ViT users and experts gain a deeper understanding by calculating the cosine similarity between patches. Our two usage scenarios demonstrate the effectiveness and usability of EL-VIT in helping ViT users understand the working mechanism of ViT.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Comments on finite termination of the generalized Newton method for absolute value equations
Authors:
Chun-Hua Guo
Abstract:
We consider the generalized Newton method (GNM) for the absolute value equation (AVE) $Ax-|x|=b$. The method has finite termination property whenever it is convergent, no matter whether the AVE has a unique solution. We prove that GNM is convergent whenever $ρ(|A^{-1}|)<1/3$. We also present new results for the case where $A-I$ is a nonsingular $M$-matrix or an irreducible singular $M$-matrix. Whe…
▽ More
We consider the generalized Newton method (GNM) for the absolute value equation (AVE) $Ax-|x|=b$. The method has finite termination property whenever it is convergent, no matter whether the AVE has a unique solution. We prove that GNM is convergent whenever $ρ(|A^{-1}|)<1/3$. We also present new results for the case where $A-I$ is a nonsingular $M$-matrix or an irreducible singular $M$-matrix. When $A-I$ is an irreducible singular $M$-matrix, the AVE may have infinitely many solutions. In this case, we show that GNM always terminates with a uniquely identifiable solution, as long as the initial guess has at least one nonpositive component.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Effects of inlet and secondary flow conditions on the flow field of rotating detonation engines with film cooling
Authors:
Jingtian Yu,
Songbai Yao,
Jingzhe Li,
Yihui Huang,
Chunhai Guo,
Wenwu Zhang
Abstract:
A three-dimensional simulation of the rotating detonation engine (RDE) with film cooling is conducted. The aim of this study is to analyze the fluid dynamics and heat transfer of the detonation flow field under the influence of cooling flow from the film holes. Results suggest that when the rotating detonation wave sweeps the film holes, the shape of the wave structure will deform, and the detonat…
▽ More
A three-dimensional simulation of the rotating detonation engine (RDE) with film cooling is conducted. The aim of this study is to analyze the fluid dynamics and heat transfer of the detonation flow field under the influence of cooling flow from the film holes. Results suggest that when the rotating detonation wave sweeps the film holes, the shape of the wave structure will deform, and the detonation products will invade and block the outflow from the film holes; however, this only occurs temporarily. The structure of the detonation wave will quickly restore to its stable form and, meanwhile, the cooling flow also recovers rapidly and provides adequate protected area on the wall surface and effective thermal protection time in a full propagation cycle of the detonation wave. A parametric analysis indicates that the effective outflow time improves with the increase of the mass flow rate of the cooling flow; on the other hand, the cooling efficiency is more significant downstream from the inlet of the combustor to the outlet. In addition, the thrust and specific impulse of the RDE are also examined under the influence of film cooling.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Universal Neurons in GPT2 Language Models
Authors:
Wes Gurnee,
Theo Horsley,
Zifan Carl Guo,
Tara Rezaei Kheirkhah,
Qinyi Sun,
Will Hathaway,
Neel Nanda,
Dimitris Bertsimas
Abstract:
A basic question within the emerging field of mechanistic interpretability is the degree to which neural networks learn the same underlying mechanisms. In other words, are neural mechanisms universal across different models? In this work, we study the universality of individual neurons across GPT2 models trained from different initial random seeds, motivated by the hypothesis that universal neuron…
▽ More
A basic question within the emerging field of mechanistic interpretability is the degree to which neural networks learn the same underlying mechanisms. In other words, are neural mechanisms universal across different models? In this work, we study the universality of individual neurons across GPT2 models trained from different initial random seeds, motivated by the hypothesis that universal neurons are likely to be interpretable. In particular, we compute pairwise correlations of neuron activations over 100 million tokens for every neuron pair across five different seeds and find that 1-5\% of neurons are universal, that is, pairs of neurons which consistently activate on the same inputs. We then study these universal neurons in detail, finding that they usually have clear interpretations and taxonomize them into a small number of neuron families. We conclude by studying patterns in neuron weights to establish several universal functional roles of neurons in simple circuits: deactivating attention heads, changing the entropy of the next token distribution, and predicting the next token to (not) be within a particular set.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Optimal higher regularity for biharmonic maps via quantitative stratification
Authors:
Chang-Yu Guo,
Gui-Chun Jiang,
Chang-Lin Xiang,
Gao-Feng Zheng
Abstract:
This little note is devoted to refining the almost optimal regularity results of Breiner and Lamm \cite{Breiner-Lamm-2015} on minimizing and stationary biharmonic maps via the powerful quantitative stratification method introduced by Cheeger and Naber \cite{Cheeger-Naber-2013} and further developed by Naber and Valtorta \cite{Naber-V-2017,Naber-V-2018} for harmonic maps. In particular, we obtain a…
▽ More
This little note is devoted to refining the almost optimal regularity results of Breiner and Lamm \cite{Breiner-Lamm-2015} on minimizing and stationary biharmonic maps via the powerful quantitative stratification method introduced by Cheeger and Naber \cite{Cheeger-Naber-2013} and further developed by Naber and Valtorta \cite{Naber-V-2017,Naber-V-2018} for harmonic maps. In particular, we obtain an optimal regularity results for minimizing biharmonic maps.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
MotionMix: Weakly-Supervised Diffusion for Controllable Motion Generation
Authors:
Nhat M. Hoang,
Kehong Gong,
Chuan Guo,
Michael Bi Mi
Abstract:
Controllable generation of 3D human motions becomes an important topic as the world embraces digital transformation. Existing works, though making promising progress with the advent of diffusion models, heavily rely on meticulously captured and annotated (e.g., text) high-quality motion corpus, a resource-intensive endeavor in the real world. This motivates our proposed MotionMix, a simple yet eff…
▽ More
Controllable generation of 3D human motions becomes an important topic as the world embraces digital transformation. Existing works, though making promising progress with the advent of diffusion models, heavily rely on meticulously captured and annotated (e.g., text) high-quality motion corpus, a resource-intensive endeavor in the real world. This motivates our proposed MotionMix, a simple yet effective weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences. Specifically, we separate the denoising objectives of a diffusion model into two stages: obtaining conditional rough motion approximations in the initial $T-T^*$ steps by learning the noisy annotated motions, followed by the unconditional refinement of these preliminary motions during the last $T^*$ steps using unannotated motions. Notably, though learning from two sources of imperfect data, our model does not compromise motion generation quality compared to fully supervised approaches that access gold data. Extensive experiments on several benchmarks demonstrate that our MotionMix, as a versatile framework, consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks. Project page: https://nhathoang2002.github.io/MotionMix-page/
△ Less
Submitted 24 January, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
Authors:
Cong Guo,
Rui Zhang,
Jiale Xu,
Jingwen Leng,
Zihan Liu,
Ziyu Huang,
Minyi Guo,
Hao Wu,
Shouren Zhao,
Junping Zhao,
Ke Zhang
Abstract:
Large-scale deep neural networks (DNNs), such as large language models (LLMs), have revolutionized the artificial intelligence (AI) field and become increasingly popular. However, training or fine-tuning such models requires substantial computational power and resources, where the memory capacity of a single acceleration device like a GPU is one of the most important bottlenecks. Owing to the proh…
▽ More
Large-scale deep neural networks (DNNs), such as large language models (LLMs), have revolutionized the artificial intelligence (AI) field and become increasingly popular. However, training or fine-tuning such models requires substantial computational power and resources, where the memory capacity of a single acceleration device like a GPU is one of the most important bottlenecks. Owing to the prohibitively large overhead (e.g., $10 \times$) of GPUs' native memory allocator, DNN frameworks like PyTorch and TensorFlow adopt a caching allocator that maintains a memory pool with a splitting mechanism for fast memory (de)allocation. Unfortunately, the caching allocator's efficiency degrades quickly for popular memory reduction techniques such as recomputation, offloading, distributed training, and low-rank adaptation. The primary reason is that those memory reduction techniques introduce frequent and irregular memory (de)allocation requests, leading to severe fragmentation problems for the splitting-based caching allocator. To mitigate this fragmentation problem, we propose a novel memory allocation framework based on low-level GPU virtual memory management called GPU memory lake (GMLake). GMLake employs a novel virtual memory stitching (VMS) mechanism, which can fuse or combine non-contiguous memory blocks with a virtual memory address mapping. GMLake can reduce an average of 9.2 GB (up to 25 GB) GPU memory usage and 15% (up to 33% ) fragmentation among eight LLM models on GPU A100 with 80 GB memory. GMLake is completely transparent to the DNN models and memory reduction techniques and ensures the seamless execution of resource-intensive deep-learning tasks. We have open-sourced GMLake at https://github.com/intelligent-machine-learning/glake/tree/main/GMLake.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and Aggregation
Authors:
Xinni Jiang,
Zengsheng Kuang,
Chunle Guo,
Ruixun Zhang,
Lei Cai,
Xiao Fan,
Chongyi Li
Abstract:
Guided depth super-resolution (GDSR) involves restoring missing depth details using the high-resolution RGB image of the same scene. Previous approaches have struggled with the heterogeneity and complementarity of the multi-modal inputs, and neglected the issues of modal misalignment, geometrical misalignment, and feature selection. In this study, we rethink some essential components in GDSR netwo…
▽ More
Guided depth super-resolution (GDSR) involves restoring missing depth details using the high-resolution RGB image of the same scene. Previous approaches have struggled with the heterogeneity and complementarity of the multi-modal inputs, and neglected the issues of modal misalignment, geometrical misalignment, and feature selection. In this study, we rethink some essential components in GDSR networks and propose a simple yet effective Dynamic Dual Alignment and Aggregation network (D2A2). D2A2 mainly consists of 1) a dynamic dual alignment module that adapts to alleviate the modal misalignment via a learnable domain alignment block and geometrically align cross-modal features by learning the offset; and 2) a mask-to-pixel feature aggregate module that uses the gated mechanism and pixel attention to filter out irrelevant texture noise from RGB features and combine the useful features with depth features. By combining the strengths of RGB and depth features while minimizing disturbance introduced by the RGB image, our method with simple reuse and redesign of basic components achieves state-of-the-art performance on multiple benchmark datasets. The code is available at https://github.com/JiangXinni/D2A2.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Online Action Recognition for Human Risk Prediction with Anticipated Haptic Alert via Wearables
Authors:
Cheng Guo,
Lorenzo Rapetti,
Kourosh Darvish,
Riccardo Grieco,
Francesco Draicchio,
Daniele Pucci
Abstract:
This paper proposes a framework that combines online human state estimation, action recognition and motion prediction to enable early assessment and prevention of worker biomechanical risk during lifting tasks. The framework leverages the NIOSH index to perform online risk assessment, thus fitting real-time applications. In particular, the human state is retrieved via inverse kinematics/dynamics a…
▽ More
This paper proposes a framework that combines online human state estimation, action recognition and motion prediction to enable early assessment and prevention of worker biomechanical risk during lifting tasks. The framework leverages the NIOSH index to perform online risk assessment, thus fitting real-time applications. In particular, the human state is retrieved via inverse kinematics/dynamics algorithms from wearable sensor data. Human action recognition and motion prediction are achieved by implementing an LSTM-based Guided Mixture of Experts architecture, which is trained offline and inferred online. With the recognized actions, a single lifting activity is divided into a series of continuous movements and the Revised NIOSH Lifting Equation can be applied for risk assessment. Moreover, the predicted motions enable anticipation of future risks. A haptic actuator, embedded in the wearable system, can alert the subject of potential risk, acting as an active prevention device. The performance of the proposed framework is validated by executing real lifting tasks, while the subject is equipped with the iFeel wearable system.
△ Less
Submitted 14 December, 2023;
originally announced January 2024.
-
Real-time Impurity Solver Using Grassmann Time-Evolving Matrix Product Operators
Authors:
Ruofan Chen,
Xiansong Xu,
Chu Guo
Abstract:
An emergent and promising tensor-network-based impurity solver is to represent the path integral as a matrix product state, where the bath is analytically integrated out using Feynman-Vernon influence functional. Here we present an approach to calculate the equilibrium impurity spectral function based on the recently proposed Grassmann time-evolving matrix product operators method. The central ide…
▽ More
An emergent and promising tensor-network-based impurity solver is to represent the path integral as a matrix product state, where the bath is analytically integrated out using Feynman-Vernon influence functional. Here we present an approach to calculate the equilibrium impurity spectral function based on the recently proposed Grassmann time-evolving matrix product operators method. The central idea is to perform a quench from a separable impurity-bath initial state as in the non-equilibrium scenario. The retarded Green's function $G(t+t_0, t'+t_0)$ is then calculated after an equilibration time $t_0$ such that the impurity and bath are approximately in thermal equilibrium. There are two major advantages of this method. First, since we focus on real-time dynamics, we do not need to perform the numerically ill-posed analytic continuation in the continuous-time quantum Monte Carlo case that relies on imaginary-time evolution. Second, the entanglement growth of the matrix product states in real-time calculations is observed to be much slower than that in imaginary-time calculations, leading to a significant improvement in numerical efficiency. The accuracy of this method is demonstrated in the single-orbital Anderson impurity model and benchmarked against the continuous-time quantum Monte Carlo method.
△ Less
Submitted 2 April, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.
-
OFDM-Based Digital Semantic Communication with Importance Awareness
Authors:
Chuanhong Liu,
Caili Guo,
Yang Yang,
Wanli Ni,
Tony Q. S. Quek
Abstract:
Semantic communication (SemCom) has received considerable attention for its ability to reduce data transmission size while maintaining task performance. However, existing works mainly focus on analog SemCom with simple channel models, which may limit its practical application. To reduce this gap, we propose an orthogonal frequency division multiplexing (OFDM)-based SemCom system that is compatible…
▽ More
Semantic communication (SemCom) has received considerable attention for its ability to reduce data transmission size while maintaining task performance. However, existing works mainly focus on analog SemCom with simple channel models, which may limit its practical application. To reduce this gap, we propose an orthogonal frequency division multiplexing (OFDM)-based SemCom system that is compatible with existing digital communication infrastructures. In the considered system, the extracted semantics is quantized by scalar quantizers, transformed into OFDM signal, and then transmitted over the frequency-selective channel. Moreover, we propose a semantic importance measurement method to build the relationship between target task and semantic features. Based on semantic importance, we formulate a sub-carrier and bit allocation problem to maximize communication performance. However, the optimization objective function cannot be accurately characterized using a mathematical expression due to the neural network-based semantic codec. Given the complex nature of the problem, we first propose a low-complexity sub-carrier allocation method that assigns sub-carriers with better channel conditions to more critical semantics. Then, we propose a deep reinforcement learning-based bit allocation algorithm with dynamic action space. Simulation results demonstrate that the proposed system achieves 9.7% and 28.7% performance gains compared to analog SemCom and conventional bit-based communication systems, respectively.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
SLP-Net:An efficient lightweight network for segmentation of skin lesions
Authors:
Bo Yang,
Hong Peng,
Chenggang Guo,
Xiaohui Luo,
Jun Wang,
Xianzhong Long
Abstract:
Prompt treatment for melanoma is crucial. To assist physicians in identifying lesion areas precisely in a quick manner, we propose a novel skin lesion segmentation technique namely SLP-Net, an ultra-lightweight segmentation network based on the spiking neural P(SNP) systems type mechanism. Most existing convolutional neural networks achieve high segmentation accuracy while neglecting the high hard…
▽ More
Prompt treatment for melanoma is crucial. To assist physicians in identifying lesion areas precisely in a quick manner, we propose a novel skin lesion segmentation technique namely SLP-Net, an ultra-lightweight segmentation network based on the spiking neural P(SNP) systems type mechanism. Most existing convolutional neural networks achieve high segmentation accuracy while neglecting the high hardware cost. SLP-Net, on the contrary, has a very small number of parameters and a high computation speed. We design a lightweight multi-scale feature extractor without the usual encoder-decoder structure. Rather than a decoder, a feature adaptation module is designed to replace it and implement multi-scale information decoding. Experiments at the ISIC2018 challenge demonstrate that the proposed model has the highest Acc and DSC among the state-of-the-art methods, while experiments on the PH2 dataset also demonstrate a favorable generalization ability. Finally, we compare the computational complexity as well as the computational speed of the models in experiments, where SLP-Net has the highest overall superiority
△ Less
Submitted 4 January, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Randomised benchmarking for characterizing and forecasting correlated processes
Authors:
Xinfang Zhang,
Zhihao Wu,
Gregory A. L. White,
Zhongcheng Xiang,
Shun Hu,
Zhihui Peng,
Yong Liu,
Dongning Zheng,
Xiang Fu,
Anqi Huang,
Dario Poletti,
Kavan Modi,
Junjie Wu,
Mingtang Deng,
Chu Guo
Abstract:
The development of fault-tolerant quantum processors relies on the ability to control noise. A particularly insidious form of noise is temporally correlated or non-Markovian noise. By combining randomized benchmarking with supervised machine learning algorithms, we develop a method to learn the details of temporally correlated noise. In particular, we can learn the time-independent evolution opera…
▽ More
The development of fault-tolerant quantum processors relies on the ability to control noise. A particularly insidious form of noise is temporally correlated or non-Markovian noise. By combining randomized benchmarking with supervised machine learning algorithms, we develop a method to learn the details of temporally correlated noise. In particular, we can learn the time-independent evolution operator of system plus bath and this leads to (i) the ability to characterize the degree of non-Markovianity of the dynamics and (ii) the ability to predict the dynamics of the system even beyond the times we have used to train our model. We exemplify this by implementing our method on a superconducting quantum processor. Our experimental results show a drastic change between the Markovian and non-Markovian regimes for the learning accuracies.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
A novel feature selection framework for incomplete data
Authors:
Cong Guo
Abstract:
Feature selection on incomplete datasets is an exceptionally challenging task. Existing methods address this challenge by first employing imputation methods to complete the incomplete data and then conducting feature selection based on the imputed data. Since imputation and feature selection are entirely independent steps, the importance of features cannot be considered during imputation. However,…
▽ More
Feature selection on incomplete datasets is an exceptionally challenging task. Existing methods address this challenge by first employing imputation methods to complete the incomplete data and then conducting feature selection based on the imputed data. Since imputation and feature selection are entirely independent steps, the importance of features cannot be considered during imputation. However, in real-world scenarios or datasets, different features have varying degrees of importance. To address this, we propose a novel incomplete data feature selection framework that considers feature importance. The framework mainly consists of two alternating iterative stages: the M-stage and the W-stage. In the M-stage, missing values are imputed based on a given feature importance vector and multiple initial imputation results. In the W-stage, an improved reliefF algorithm is employed to learn the feature importance vector based on the imputed data. Specifically, the feature importance vector obtained in the current iteration of the W-stage serves as input for the next iteration of the M-stage. Experimental results on both artificially generated and real incomplete datasets demonstrate that the proposed method outperforms other approaches significantly.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping
Authors:
Zihan Liu,
Wentao Ni,
Jingwen Leng,
Yu Feng,
Cong Guo,
Quan Chen,
Chao Li,
Minyi Guo,
Yuhao Zhu
Abstract:
Approximate nearest neighbor (ANN) search is a widely applied technique in modern intelligent applications, such as recommendation systems and vector databases. Therefore, efficient and high-throughput execution of ANN search has become increasingly important. In this paper, we first characterize the state-of-the-art product quantization-based method of ANN search and identify a significant source…
▽ More
Approximate nearest neighbor (ANN) search is a widely applied technique in modern intelligent applications, such as recommendation systems and vector databases. Therefore, efficient and high-throughput execution of ANN search has become increasingly important. In this paper, we first characterize the state-of-the-art product quantization-based method of ANN search and identify a significant source of inefficiency in the form of unnecessary pairwise distance calculations and accumulations. To improve efficiency, we propose JUNO, an end-to-end ANN search system that adopts a carefully designed sparsity- and locality-aware search algorithm. We also present an efficient hardware mapping that utilizes ray tracing cores in modern GPUs with pipelined execution on tensor cores to execute our sparsity-aware ANN search algorithm. Our evaluations on four datasets ranging in size from 1 to 100 million search points demonstrate 2.2x-8.5x improvements in search throughput. Moreover, our algorithmic enhancements alone achieve a maximal 2.6x improvement on the hardware without the acceleration of the RT core.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams
Authors:
Liao Wang,
Kaixin Yao,
Chengcheng Guo,
Zhirui Zhang,
Qiang Hu,
Jingyi Yu,
Lan Xu,
Minye Wu
Abstract:
Neural Radiance Fields (NeRFs) excel in photorealistically rendering static scenes. However, rendering dynamic, long-duration radiance fields on ubiquitous devices remains challenging, due to data storage and computational constraints. In this paper, we introduce VideoRF, the first approach to enable real-time streaming and rendering of dynamic radiance fields on mobile platforms. At the core is a…
▽ More
Neural Radiance Fields (NeRFs) excel in photorealistically rendering static scenes. However, rendering dynamic, long-duration radiance fields on ubiquitous devices remains challenging, due to data storage and computational constraints. In this paper, we introduce VideoRF, the first approach to enable real-time streaming and rendering of dynamic radiance fields on mobile platforms. At the core is a serialized 2D feature image stream representing the 4D radiance field all in one. We introduce a tailored training scheme directly applied to this 2D domain to impose the temporal and spatial redundancy of the feature image stream. By leveraging the redundancy, we show that the feature image stream can be efficiently compressed by 2D video codecs, which allows us to exploit video hardware accelerators to achieve real-time decoding. On the other hand, based on the feature image stream, we propose a novel rendering pipeline for VideoRF, which has specialized space mappings to query radiance properties efficiently. Paired with a deferred shading model, VideoRF has the capability of real-time rendering on mobile devices thanks to its efficiency. We have developed a real-time interactive player that enables online streaming and rendering of dynamic scenes, offering a seamless and immersive free-viewpoint experience across a range of devices, from desktops to mobile phones.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Evidence for a compact stellar merger origin for GRB 230307A from Fermi-LAT and multi-wavelength afterglow observations
Authors:
Cui-Yuan Dai,
Chen-Lei Guo,
Hai-Ming Zhang,
Ruo-Yu Liu,
Xiang-Yu Wang
Abstract:
GRB 230307A is the second brightest gamma-ray burst (GRB) ever detected over 50 years of observations and has a long duration in the prompt emission. Two galaxies are found to be close to the position of GRB 230307A: 1) a distant ($z \sim 3.87$) star-forming galaxy, located at an offset of $\sim 0.2\operatorname{-}0.3$ arcsec from the GRB position (with a projected distance of…
▽ More
GRB 230307A is the second brightest gamma-ray burst (GRB) ever detected over 50 years of observations and has a long duration in the prompt emission. Two galaxies are found to be close to the position of GRB 230307A: 1) a distant ($z \sim 3.87$) star-forming galaxy, located at an offset of $\sim 0.2\operatorname{-}0.3$ arcsec from the GRB position (with a projected distance of $\sim 1\operatorname{-}2 \, \rm kpc$); 2) a nearby ($z= 0.065$) spiral galaxy, located at an offset of 30 arcsec (with a projected distance of $\sim 40 \, \rm kpc$). Though it has been found that the brightest GRBs are readily detected in GeV emission by the Fermi Large Area Telescope (LAT), we find no GeV afterglow emission from GRB 230307A. Combining this with the optical and X-ray afterglow data, we find that a circum-burst density as low as $\sim 10^{-5} \operatorname{-} 10^{-4}~{\rm cm^{-3}}$ is needed to explain the non-detection of GeV emission and the multi-wavelength afterglow data, regardless of the redshift of this GRB. Such a low-density disfavors the association of GRB 230307A with the high-redshift star-forming galaxy, since the proximity of the GRB position to this galaxy would imply a higher-density environment. Instead, the low-density medium is consistent with the circumgalactic medium, which agrees with the large offset between GRB 230307A and the low-redshift galaxy. This points to the compact stellar merger origin for GRB 230307A, consistent with the detection of an associated kilonova.
△ Less
Submitted 18 February, 2024; v1 submitted 2 December, 2023;
originally announced December 2023.
-
MoMask: Generative Masked Modeling of 3D Human Motions
Authors:
Chuan Guo,
Yuxuan Mu,
Muhammad Gohar Javed,
Sen Wang,
Li Cheng
Abstract:
We introduce MoMask, a novel masked modeling framework for text-driven 3D human motion generation. In MoMask, a hierarchical quantization scheme is employed to represent human motion as multi-layer discrete motion tokens with high-fidelity details. Starting at the base layer, with a sequence of motion tokens obtained by vector quantization, the residual tokens of increasing orders are derived and…
▽ More
We introduce MoMask, a novel masked modeling framework for text-driven 3D human motion generation. In MoMask, a hierarchical quantization scheme is employed to represent human motion as multi-layer discrete motion tokens with high-fidelity details. Starting at the base layer, with a sequence of motion tokens obtained by vector quantization, the residual tokens of increasing orders are derived and stored at the subsequent layers of the hierarchy. This is consequently followed by two distinct bidirectional transformers. For the base-layer motion tokens, a Masked Transformer is designated to predict randomly masked motion tokens conditioned on text input at training stage. During generation (i.e. inference) stage, starting from an empty sequence, our Masked Transformer iteratively fills up the missing tokens; Subsequently, a Residual Transformer learns to progressively predict the next-layer tokens based on the results from current layer. Extensive experiments demonstrate that MoMask outperforms the state-of-art methods on the text-to-motion generation task, with an FID of 0.045 (vs e.g. 0.141 of T2M-GPT) on the HumanML3D dataset, and 0.228 (vs 0.514) on KIT-ML, respectively. MoMask can also be seamlessly applied in related tasks without further model fine-tuning, such as text-guided temporal inpainting.
△ Less
Submitted 29 November, 2023;
originally announced December 2023.
-
Risk-Aware Security-Constrained Unit Commitment: Taming the Curse of Real-Time Volatility and Consumer Exposure
Authors:
Daniel Bienstock,
Yury Dvorkin,
Cheng Guo,
Robert Mieth,
Jiayi Wang
Abstract:
We propose an enhancement to wholesale electricity markets whereby the exposure of consumers to increasingly large and volatile consumer payments arising as a byproduct of volatile real-time net loads -- i.e., loads minus renewable outputs -- and prices, both compared to day-ahead cleared values. We incorporate a robust estimate of such excess payments into the day-ahead computation and specifical…
▽ More
We propose an enhancement to wholesale electricity markets whereby the exposure of consumers to increasingly large and volatile consumer payments arising as a byproduct of volatile real-time net loads -- i.e., loads minus renewable outputs -- and prices, both compared to day-ahead cleared values. We incorporate a robust estimate of such excess payments into the day-ahead computation and specifically seek to account for volatility in real-time net loads and renewable generation. Our model features a data-driven uncertainty set based on principal component analysis, which accommodates both load and wind production volatility and captures locational correlation of uncertain data. To solve the model more efficiently, we develop a decomposition algorithm that can handle nonconvex subproblems. Our extensive experiments on a realistic NYISO data set show that the risk-aware model protects the consumers from potential high costs caused by adverse circumstances.
△ Less
Submitted 17 April, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Digital Transformation of High Voltage Isolation Control and Monitoring System for HVE-400 Ion Implanter
Authors:
Chengbo Li,
Xuepeng Sun,
Zhiguo Liu,
Chungang Guo,
Xiaoming Li
Abstract:
HVE-400 ion implanter is special ion implantation equipment for semiconductor materials boron and phosphorus doping. The ion source and extraction deflection system are at high voltage platform, while the corresponding control system is at ground voltage position. The control signals and measurement signals of various parameters at the high-voltage end need to be transmitted between ground voltage…
▽ More
HVE-400 ion implanter is special ion implantation equipment for semiconductor materials boron and phosphorus doping. The ion source and extraction deflection system are at high voltage platform, while the corresponding control system is at ground voltage position. The control signals and measurement signals of various parameters at the high-voltage end need to be transmitted between ground voltage and high voltage through optical fibers to isolate high voltage. Upgrading is carried out due to the aging of the optical fiber transmission control and monitoring system, which cannot work stably. The transformation replaces the original distributed single-point control method with an advanced distributed centralized control method, and integrates all control and monitoring functions into an industrial control computer for digital operation and display. In the computer software, two kinds of automatic calculation of ion mass number are designed. After upgrading, the implanter high-voltage platform control and monitoring system features digitalization, centralized control, high reliability, strong anti-interference, fast communication speed, and easy operation.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Phonon collapse and anharmonic melting of the 3D charge-density wave in kagome metals
Authors:
Martin Gutierrez-Amigo,
Ðorđe Dangić,
Chunyu Guo,
Claudia Felser,
Philip J. W. Moll,
Maia G. Vergniory,
Ion Errea
Abstract:
The charge-density wave (CDW) mechanism and resulting structure of the AV3Sb5 family of kagome metals has posed a puzzling challenge since their discovery four years ago. In fact, the lack of consensus on the origin and structure of the CDW hinders the understanding of the emerging phenomena. Here, by employing a non-perturbative treatment of anharmonicity from first-principles calculations, we re…
▽ More
The charge-density wave (CDW) mechanism and resulting structure of the AV3Sb5 family of kagome metals has posed a puzzling challenge since their discovery four years ago. In fact, the lack of consensus on the origin and structure of the CDW hinders the understanding of the emerging phenomena. Here, by employing a non-perturbative treatment of anharmonicity from first-principles calculations, we reveal that the charge-density transition in CsV3Sb5 is driven by the large electron-phonon coupling of the material and that the melting of the CDW state is attributed to ionic entropy and lattice anharmonicity. The calculated transition temperature is in very good agreement with experiments, implying that soft mode physics are at the core of the charge-density wave transition. Contrary to the standard assumption associated with a pure kagome lattice, the CDW is essentially three-dimensional as it is triggered by an unstable phonon at the L point. The absence of involvement of phonons at the M point enables us to constrain the resulting symmetries to six possible space groups. The unusually large electron-phonon linewidth of the soft mode explains why inelastic scattering experiments did not observe any softened phonon. We foresee that large anharmonic effects are ubiquitous and could be fundamental to understand the observed phenomena also in other kagome families.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Traffic Sign Interpretation in Real Road Scene
Authors:
Chuang Yang,
Kai Zhuang,
Mulin Chen,
Haozhao Ma,
Xu Han,
Tao Han,
Changxing Guo,
Han Han,
Bingxuan Zhao,
Qi Wang
Abstract:
Most existing traffic sign-related works are dedicated to detecting and recognizing part of traffic signs individually, which fails to analyze the global semantic logic among signs and may convey inaccurate traffic instruction. Following the above issues, we propose a traffic sign interpretation (TSI) task, which aims to interpret global semantic interrelated traffic signs (e.g.,~driving instructi…
▽ More
Most existing traffic sign-related works are dedicated to detecting and recognizing part of traffic signs individually, which fails to analyze the global semantic logic among signs and may convey inaccurate traffic instruction. Following the above issues, we propose a traffic sign interpretation (TSI) task, which aims to interpret global semantic interrelated traffic signs (e.g.,~driving instruction-related texts, symbols, and guide panels) into a natural language for providing accurate instruction support to autonomous or assistant driving. Meanwhile, we design a multi-task learning architecture for TSI, which is responsible for detecting and recognizing various traffic signs and interpreting them into a natural language like a human. Furthermore, the absence of a public TSI available dataset prompts us to build a traffic sign interpretation dataset, namely TSI-CN. The dataset consists of real road scene images, which are captured from the highway and the urban way in China from a driver's perspective. It contains rich location labels of texts, symbols, and guide panels, and the corresponding natural language description labels. Experiments on TSI-CN demonstrate that the TSI task is achievable and the TSI architecture can interpret traffic signs from scenes successfully even if there is a complex semantic logic among signs. The TSI-CN dataset and the source code of the TSI architecture will be publicly available after the revision process.
△ Less
Submitted 28 November, 2023; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Few-shot Message-Enhanced Contrastive Learning for Graph Anomaly Detection
Authors:
Fan Xu,
Nan Wang,
Xuezhi Wen,
Meiqi Gao,
Chaoqun Guo,
Xibin Zhao
Abstract:
Graph anomaly detection plays a crucial role in identifying exceptional instances in graph data that deviate significantly from the majority. It has gained substantial attention in various domains of information security, including network intrusion, financial fraud, and malicious comments, et al. Existing methods are primarily developed in an unsupervised manner due to the challenge in obtaining…
▽ More
Graph anomaly detection plays a crucial role in identifying exceptional instances in graph data that deviate significantly from the majority. It has gained substantial attention in various domains of information security, including network intrusion, financial fraud, and malicious comments, et al. Existing methods are primarily developed in an unsupervised manner due to the challenge in obtaining labeled data. For lack of guidance from prior knowledge in unsupervised manner, the identified anomalies may prove to be data noise or individual data instances. In real-world scenarios, a limited batch of labeled anomalies can be captured, making it crucial to investigate the few-shot problem in graph anomaly detection. Taking advantage of this potential, we propose a novel few-shot Graph Anomaly Detection model called FMGAD (Few-shot Message-Enhanced Contrastive-based Graph Anomaly Detector). FMGAD leverages a self-supervised contrastive learning strategy within and across views to capture intrinsic and transferable structural representations. Furthermore, we propose the Deep-GNN message-enhanced reconstruction module, which extensively exploits the few-shot label information and enables long-range propagation to disseminate supervision signals to deeper unlabeled nodes. This module in turn assists in the training of self-supervised contrastive learning. Comprehensive experimental results on six real-world datasets demonstrate that FMGAD can achieve better performance than other state-of-the-art methods, regardless of artificially injected anomalies or domain-organic anomalies.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Role of the isospin diffusion on cluster transfer in $^{12,14}$C + $^{209}$Bi reactions
Authors:
Zepeng Gao,
Yinu Zhang,
Long Zhu,
Zehong Liao,
Yu Yang,
Chenchen Guo,
Jun Su
Abstract:
Heavy-ion collisions at near-barrier energies provide a crucial pathway for investigating nucleon correlations and clustering structures.
Recent experimental results showed that the valence neutrons in light projectiles obviously enhance the $α$ transfer. This finding is extremely puzzled and fascinating, because it violates the ground-state $Q$ value systematics unexpectedly. In this work, the…
▽ More
Heavy-ion collisions at near-barrier energies provide a crucial pathway for investigating nucleon correlations and clustering structures.
Recent experimental results showed that the valence neutrons in light projectiles obviously enhance the $α$ transfer. This finding is extremely puzzled and fascinating, because it violates the ground-state $Q$ value systematics unexpectedly. In this work, the time-dependent Hartree-Fock approach is utilized to investigate the cluster transfer. By comparing the reactions $^{12,14}$C + $^{209}$Bi, we discover that above puzzling behavior is because of the strong correlation between isospin diffusion and clustering. Our calculations clearly show that the equilibrium of neutron-to-proton ratio strongly inhibits the clustering. This work opens a prospect for investigating the clustering in open quantum system.
△ Less
Submitted 16 November, 2023; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Iterative missing value imputation based on feature importance
Authors:
Cong Guo,
Chun Liu,
Wei Yang
Abstract:
Many datasets suffer from missing values due to various reasons,which not only increases the processing difficulty of related tasks but also reduces the accuracy of classification. To address this problem, the mainstream approach is to use missing value imputation to complete the dataset. Existing imputation methods estimate the missing parts based on the observed values in the original feature sp…
▽ More
Many datasets suffer from missing values due to various reasons,which not only increases the processing difficulty of related tasks but also reduces the accuracy of classification. To address this problem, the mainstream approach is to use missing value imputation to complete the dataset. Existing imputation methods estimate the missing parts based on the observed values in the original feature space, and they treat all features as equally important during data completion, while in fact different features have different importance. Therefore, we have designed an imputation method that considers feature importance. This algorithm iteratively performs matrix completion and feature importance learning, and specifically, matrix completion is based on a filling loss that incorporates feature importance. Our experimental analysis involves three types of datasets: synthetic datasets with different noisy features and missing values, real-world datasets with artificially generated missing values, and real-world datasets originally containing missing values. The results on these datasets consistently show that the proposed method outperforms the existing five imputation algorithms.To the best of our knowledge, this is the first work that considers feature importance in the imputation model.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Accurate estimates of dynamical statistics using memory
Authors:
Chatipat Lorpaiboon,
Spencer C. Guo,
John Strahan,
Jonathan Weare,
Aaron R. Dinner
Abstract:
Many chemical reactions and molecular processes occur on timescales that are significantly longer than those accessible by direct simulation. One successful approach to estimating dynamical statistics for such processes is to use many short time series observations of the system to construct a Markov state model (MSM), which approximates the dynamics of the system as memoryless transitions between…
▽ More
Many chemical reactions and molecular processes occur on timescales that are significantly longer than those accessible by direct simulation. One successful approach to estimating dynamical statistics for such processes is to use many short time series observations of the system to construct a Markov state model (MSM), which approximates the dynamics of the system as memoryless transitions between a set of discrete states. The dynamical Galerkin approximation (DGA) generalizes MSMs for the problem of calculating dynamical statistics, such as committors and mean first passage times, by replacing the set of discrete states with a projection onto a basis. Because the projected dynamics are generally not memoryless, the Markov approximation can result in significant systematic error. Inspired by quasi-Markov state models, which employ the generalized master equation to encode memory resulting from the projection, we reformulate DGA to account for memory and analyze its performance on two systems: a two-dimensional triple well and helix-to-helix transitions of the AIB$_9$ peptide. We demonstrate that our method is robust to the choice of basis and can decrease the time series length required to obtain accurate kinetics by an order of magnitude.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Incentivizing Investment and Reliability: A Study on Electricity Capacity Markets
Authors:
Cheng Guo,
Christian Kroer,
Yury Dvorkin,
Daniel Bienstock
Abstract:
The capacity market, a marketplace to exchange available generation capacity for electricity production, provides a major revenue stream for generators and is adopted in several U.S. regions. A subject of ongoing debate, the capacity market is viewed by its proponents as a crucial mechanism to ensure system reliability, while critics highlight its drawbacks such as market distortion. Under a novel…
▽ More
The capacity market, a marketplace to exchange available generation capacity for electricity production, provides a major revenue stream for generators and is adopted in several U.S. regions. A subject of ongoing debate, the capacity market is viewed by its proponents as a crucial mechanism to ensure system reliability, while critics highlight its drawbacks such as market distortion. Under a novel analytical framework, we rigorously evaluate the impact of the capacity market on generators' revenue and system reliability. More specifically, based on market designs at New York Independent System Operator (NYISO), we propose market equilibrium-based models to capture salient aspects of the capacity market and its interaction with the energy market. We also develop a leader-follower model to study market power. We show that the capacity market incentivizes the investment of generators with lower net cost of new entry. It also facilitates reliability by preventing significant physical withholding when the demand is relatively high. Nevertheless, the capacity market may not provide enough revenue for peaking plants. Moreover, it is susceptible to market power, which necessitates tailored market power mitigation measures depending on market dynamics. We provide further insights via large-scale experiments on data from NYISO markets.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Winding number criterion for the origin to belong to the numerical range of a matrix on a loop of matrices
Authors:
Cheng Guo,
Shanhui Fan
Abstract:
Let $A:[0,1]\to GL(n,\mathbb{C})$ be continuous with $A(0)=A(1)$, thus the winding number of $\det A$ is well-defined. If the winding number is not divisible by $n$, then the origin belongs to the numerical range of $A(φ)$ for some $φ\in [0,1]$.
Let $A:[0,1]\to GL(n,\mathbb{C})$ be continuous with $A(0)=A(1)$, thus the winding number of $\det A$ is well-defined. If the winding number is not divisible by $n$, then the origin belongs to the numerical range of $A(φ)$ for some $φ\in [0,1]$.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
The limit theory of the energy-critical complex Ginzburg-Landau equation
Authors:
Xing. Cheng,
Chang-yu Guo,
Yunrui. Zheng
Abstract:
We study the limit behavior of the solutions to energy-critical complex Ginzburg-Landau equation. We give a rigorous theory of the zero-dispersion limit from energy-critical complex Ginzburg-Landau equation to energy-critical nonlinear heat equation for dimensions 3 and 4 in both the defocusing and focusing cases by energy method. Furthermore, we also show the invisicid limit of energy-critical co…
▽ More
We study the limit behavior of the solutions to energy-critical complex Ginzburg-Landau equation. We give a rigorous theory of the zero-dispersion limit from energy-critical complex Ginzburg-Landau equation to energy-critical nonlinear heat equation for dimensions 3 and 4 in both the defocusing and focusing cases by energy method. Furthermore, we also show the invisicid limit of energy-critical complex Ginzburg-Landau equation to energy-critical nonlinear Schrödinger equation for dimension 4 in the focusing case.
△ Less
Submitted 22 April, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.