subscribe to arXiv mailings

StatuScale: Status-aware and Elastic Scaling Strategy for Microservice Applications

Authors: Linfeng Wen, Minxian Xu, Sukhpal Singh Gill, Muhammad Hafizhuddin Hilman, Satish Narayana Srirama, Kejiang Ye, Chengzhong Xu

Abstract: Microservice architecture has transformed traditional monolithic applications into lightweight components. Scaling these lightweight microservices is more efficient than scaling servers. However, scaling microservices still faces the challenges resulted from the unexpected spikes or bursts of requests, which are difficult to detect and can degrade performance instantaneously. To address this chall… ▽ More Microservice architecture has transformed traditional monolithic applications into lightweight components. Scaling these lightweight microservices is more efficient than scaling servers. However, scaling microservices still faces the challenges resulted from the unexpected spikes or bursts of requests, which are difficult to detect and can degrade performance instantaneously. To address this challenge and ensure the performance of microservice-based applications, we propose a status-aware and elastic scaling framework called StatuScale, which is based on load status detector that can select appropriate elastic scaling strategies for differentiated resource scheduling in vertical scaling. Additionally, StatuScale employs a horizontal scaling controller that utilizes comprehensive evaluation and resource reduction to manage the number of replicas for each microservice. We also present a novel metric named correlation factor to evaluate the resource usage efficiency. Finally, we use Kubernetes, an open-source container orchestration and management platform, and realistic traces from Alibaba to validate our approach. The experimental results have demonstrated that the proposed framework can reduce the average response time in the Sock-Shop application by 8.59% to 12.34%, and in the Hotel-Reservation application by 7.30% to 11.97%, decrease service level objective violations, and offer better performance in resource usage compared to baselines. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 26 pages

Journal ref: ACM Transactions on Autonomous and Adaptive Systems, 2024

arXiv:2407.10169 [pdf, other]

DRPC: Distributed Reinforcement Learning Approach for Scalable Resource Provisioning in Container-based Clusters

Authors: Haoyu Bai, Minxian Xu, Kejiang Ye, Rajkumar Buyya, Chengzhong Xu

Abstract: Microservices have transformed monolithic applications into lightweight, self-contained, and isolated application components, establishing themselves as a dominant paradigm for application development and deployment in public clouds such as Google and Alibaba. Autoscaling emerges as an efficient strategy for managing resources allocated to microservices' replicas. However, the dynamic and intricat… ▽ More Microservices have transformed monolithic applications into lightweight, self-contained, and isolated application components, establishing themselves as a dominant paradigm for application development and deployment in public clouds such as Google and Alibaba. Autoscaling emerges as an efficient strategy for managing resources allocated to microservices' replicas. However, the dynamic and intricate dependencies within microservice chains present challenges to the effective management of scaled microservices. Additionally, the centralized autoscaling approach can encounter scalability issues, especially in the management of large-scale microservice-based clusters. To address these challenges and enhance scalability, we propose an innovative distributed resource provisioning approach for microservices based on the Twin Delayed Deep Deterministic Policy Gradient algorithm. This approach enables effective autoscaling decisions and decentralizes responsibilities from a central node to distributed nodes. Comparative results with state-of-the-art approaches, obtained from a realistic testbed and traces, indicate that our approach reduces the average response time by 15% and the number of failed requests by 24%, validating improved scalability as the number of requests increases. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 12 pages

Journal ref: IEEE Transactions on Service Computing, 2024

arXiv:2407.04053 [pdf, other]

Edge AI: A Taxonomy, Systematic Review and Future Directions

Authors: Sukhpal Singh Gill, Muhammed Golec, Jianmin Hu, Minxian Xu, Junhui Du, Huaming Wu, Guneet Kaur Walia, Subramaniam Subramanian Murugesan, Babar Ali, Mohit Kumar, Kejiang Ye, Prabal Verma, Surendra Kumar, Felix Cuadrado, Steve Uhlig

Abstract: Edge Artificial Intelligence (AI) incorporates a network of interconnected systems and devices that receive, cache, process, and analyse data in close communication with the location where the data is captured with AI technology. Recent advancements in AI efficiency, the widespread use of Internet of Things (IoT) devices, and the emergence of edge computing have unlocked the enormous scope of Edge… ▽ More Edge Artificial Intelligence (AI) incorporates a network of interconnected systems and devices that receive, cache, process, and analyse data in close communication with the location where the data is captured with AI technology. Recent advancements in AI efficiency, the widespread use of Internet of Things (IoT) devices, and the emergence of edge computing have unlocked the enormous scope of Edge AI. The goal of Edge AI is to optimize data processing efficiency and velocity while ensuring data confidentiality and integrity. Despite being a relatively new field of research, spanning from 2014 to the present, it has shown significant and rapid development over the last five years. In this article, we present a systematic literature review for Edge AI to discuss the existing research, recent advancements, and future research directions. We created a collaborative edge AI learning system for cloud and edge computing analysis, including an in-depth study of the architectures that facilitate this mechanism. The taxonomy for Edge AI facilitates the classification and configuration of Edge AI systems while also examining its potential influence across many fields through compassing infrastructure, cloud computing, fog computing, services, use cases, ML and deep learning, and resource management. This study highlights the significance of Edge AI in processing real-time data at the edge of the network. Additionally, it emphasizes the research challenges encountered by Edge AI systems, including constraints on resources, vulnerabilities to security threats, and problems with scalability. Finally, this study highlights the potential future research directions that aim to address the current limitations of Edge AI by providing innovative solutions. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Preprint Version, 18 Figures

arXiv:2407.03765 [pdf, ps, other]

Design and Central Pattern Generator Control of a New Transformable Wheel-Legged Robot

Authors: Tyler Bishop, Keran Ye, Konstantinos Karydis

Abstract: This paper introduces a new wheel-legged robot and develops motion controllers based on central pattern generators (CPGs) for the robot to navigate over a range of terrains. A transformable leg-wheel design is considered and characterized in terms of key locomotion characteristics as a function of the design. Kinematic analysis is conducted based on a generalized four-bar mechanism driven by a coa… ▽ More This paper introduces a new wheel-legged robot and develops motion controllers based on central pattern generators (CPGs) for the robot to navigate over a range of terrains. A transformable leg-wheel design is considered and characterized in terms of key locomotion characteristics as a function of the design. Kinematic analysis is conducted based on a generalized four-bar mechanism driven by a coaxial hub arrangement. The analysis is used to inform the design of a central pattern generator to control the robot by mapping oscillator states to wheel-leg trajectories and implementing differential steering within the oscillator network. Three oscillator models are used as the basis of the CPGs, and their performance is compared over a range of inputs. The CPG-based controller is used to drive the developed robot prototype on level ground and over obstacles. Additional simulated tests are performed for uneven terrain negotiation and obstacle climbing. Results demonstrate the effectiveness of CPG control in transformable wheel-legged robots. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: ICRA 2024 in print

arXiv:2406.19377 [pdf, ps, other]

Grassmannian optimization is NP-hard

Authors: Zehua Lai, Lek-Heng Lim, Ke Ye

Abstract: We show that unconstrained quadratic optimization over a Grassmannian $\operatorname{Gr}(k,n)$ is NP-hard. Our results cover all scenarios: (i) when $k$ and $n$ are both allowed to grow; (ii) when $k$ is arbitrary but fixed; (iii) when $k$ is fixed at its lowest possible value $1$. We then deduce the NP-hardness of unconstrained cubic optimization over the Stiefel manifold $\operatorname{V}(k,n)$… ▽ More We show that unconstrained quadratic optimization over a Grassmannian $\operatorname{Gr}(k,n)$ is NP-hard. Our results cover all scenarios: (i) when $k$ and $n$ are both allowed to grow; (ii) when $k$ is arbitrary but fixed; (iii) when $k$ is fixed at its lowest possible value $1$. We then deduce the NP-hardness of unconstrained cubic optimization over the Stiefel manifold $\operatorname{V}(k,n)$ and the orthogonal group $\operatorname{O}(n)$. As an addendum we demonstrate the NP-hardness of unconstrained quadratic optimization over the Cartan manifold, i.e., the positive definite cone $\mathbb{S}^n_{\scriptscriptstyle++}$ regarded as a Riemannian manifold, another popular example in manifold optimization. We will also establish the nonexistence of $\mathrm{FPTAS}$ in all cases. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 19 pages

MSC Class: 03D15; 90C26; 90C23; 65K10; 68Q25; 90C60

arXiv:2406.17911 [pdf, other]

X-ray Made Simple: Radiology Report Generation and Evaluation with Layman's Terms

Authors: Kun Zhao, Chenghao Xiao, Chen Tang, Bohao Yang, Kai Ye, Noura Al Moubayed, Liang Zhan, Chenghua Lin

Abstract: Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This… ▽ More Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This has become an urgent problem for RRG due to the highly patternized nature of these reports. In this work, we un-intuitively approach this problem by proposing the Layman's RRG framework, a layman's terms-based dataset, evaluation and training framework that systematically improves RRG with day-to-day language. We first contribute the translated Layman's terms dataset. Building upon the dataset, we then propose a semantics-based evaluation method, which is proved to mitigate the inflated numbers of BLEU and provides fairer evaluation. Last, we show that training on the layman's terms dataset encourages models to focus on the semantics of the reports, as opposed to overfitting to learning the report templates. We reveal a promising scaling law between the number of training examples and semantics gain provided by our dataset, compared to the inverse pattern brought by the original formats. Our code is available at \url{https://github.com/hegehongcha/LaymanRRG}. △ Less

Submitted 30 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.11821 [pdf, ps, other]

Simple matrix expressions for the curvatures of Grassmannian

Authors: Zehua Lai, Lek-Heng Lim, Ke Ye

Abstract: We show that modeling a Grassmannian as symmetric orthogonal matrices $\operatorname{Gr}(k,\mathbb{R}^n) \cong\{Q \in \mathbb{R}^{n \times n} : Q^{\scriptscriptstyle\mathsf{T}} Q = I, \; Q^{\scriptscriptstyle\mathsf{T}} = Q,\; \operatorname{tr}(Q)=2k - n\}$ yields exceedingly simple matrix formulas for various curvatures and curvature-related quantities, both intrinsic and extrinsic. These include… ▽ More We show that modeling a Grassmannian as symmetric orthogonal matrices $\operatorname{Gr}(k,\mathbb{R}^n) \cong\{Q \in \mathbb{R}^{n \times n} : Q^{\scriptscriptstyle\mathsf{T}} Q = I, \; Q^{\scriptscriptstyle\mathsf{T}} = Q,\; \operatorname{tr}(Q)=2k - n\}$ yields exceedingly simple matrix formulas for various curvatures and curvature-related quantities, both intrinsic and extrinsic. These include Riemann, Ricci, Jacobi, sectional, scalar, mean, principal, and Gaussian curvatures; Schouten, Weyl, Cotton, Bach, Plebański, cocurvature, nonmetricity, and torsion tensors; first, second, and third fundamental forms; Gauss and Weingarten maps; and upper and lower delta invariants. We will derive explicit, simple expressions for the aforementioned quantities in terms of standard matrix operations that are stably computable with numerical linear algebra. Many of these aforementioned quantities have never before been presented for the Grassmannian. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 25 pages

MSC Class: 15A75; 14M15

arXiv:2406.02479 [pdf]

Applying Fine-Tuned LLMs for Reducing Data Needs in Load Profile Analysis

Authors: Yi Hu, Hyeonjin Kim, Kai Ye, Ning Lu

Abstract: This paper presents a novel method for utilizing fine-tuned Large Language Models (LLMs) to minimize data requirements in load profile analysis, demonstrated through the restoration of missing data in power system load profiles. A two-stage fine-tuning strategy is proposed to adapt a pre-trained LLMs, i.e., GPT-3.5, for missing data restoration tasks. Through empirical evaluation, we demonstrate t… ▽ More This paper presents a novel method for utilizing fine-tuned Large Language Models (LLMs) to minimize data requirements in load profile analysis, demonstrated through the restoration of missing data in power system load profiles. A two-stage fine-tuning strategy is proposed to adapt a pre-trained LLMs, i.e., GPT-3.5, for missing data restoration tasks. Through empirical evaluation, we demonstrate the effectiveness of the fine-tuned model in accurately restoring missing data, achieving comparable performance to state-of-the-art specifically designed models such as BERT-PIN. Key findings include the importance of prompt engineering and the optimal utilization of fine-tuning samples, highlighting the efficiency of few-shot learning in transferring knowledge from general user cases to specific target users. Furthermore, the proposed approach demonstrates notable cost-effectiveness and time efficiency compared to training models from scratch, making it a practical solution for scenarios with limited data availability and computing resources. This research has significant potential for application to other power system load profile analysis tasks. Consequently, it advances the use of LLMs in power system analytics, offering promising implications for enhancing the resilience and efficiency of power distribution systems. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.20560 [pdf, other]

Collaborative Resource Management and Workloads Scheduling in Cloud-Assisted Mobile Edge Computing across Timescales

Authors: Lujie Tang, Minxian Xu, Chengzhong Xu, Kejiang Ye

Abstract: Due to the limited resource capacity of edge servers and the high purchase costs of edge resources, service providers are facing the new challenge of how to take full advantage of the constrained edge resources for Internet of Things (IoT) service hosting and task scheduling to maximize system performance. In this paper, we study the joint optimization problem on service placement, resource provis… ▽ More Due to the limited resource capacity of edge servers and the high purchase costs of edge resources, service providers are facing the new challenge of how to take full advantage of the constrained edge resources for Internet of Things (IoT) service hosting and task scheduling to maximize system performance. In this paper, we study the joint optimization problem on service placement, resource provisioning, and workloads scheduling under resource and budget constraints, which is formulated as a mixed integer non-linear programming problem. Given that the frequent service placement and resource provisioning will significantly increase system configuration costs and instability, we propose a two-timescale framework for resource management and workloads scheduling, named RMWS. RMWS consists of a Gibbs sampling algorithm and an alternating minimization algorithm to determine the service placement and resource provisioning on large timescales. And a sub-gradient descent method has been designed to solve the workload scheduling challenge on small timescales.We conduct comprehensive experiments under different parameter settings. The RMWS consistently ensures a minimum 10% performance enhancement compared to other algorithms, showcasing its superiority. Theoretical proofs are also provided accordingly. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 11 pages, 10 figures

Journal ref: IEEE ICWS 2024

arXiv:2405.17241 [pdf, other]

NeurTV: Total Variation on the Neural Domain

Authors: Yisi Luo, Xile Zhao, Kai Ye, Deyu Meng

Abstract: Recently, we have witnessed the success of total variation (TV) for many imaging applications. However, traditional TV is defined on the original pixel domain, which limits its potential. In this work, we suggest a new TV regularization defined on the neural domain. Concretely, the discrete data is continuously and implicitly represented by a deep neural network (DNN), and we use the derivatives o… ▽ More Recently, we have witnessed the success of total variation (TV) for many imaging applications. However, traditional TV is defined on the original pixel domain, which limits its potential. In this work, we suggest a new TV regularization defined on the neural domain. Concretely, the discrete data is continuously and implicitly represented by a deep neural network (DNN), and we use the derivatives of DNN outputs w.r.t. input coordinates to capture local correlations of data. As compared with classical TV on the original domain, the proposed TV on the neural domain (termed NeurTV) enjoys two advantages. First, NeurTV is not limited to meshgrid but is suitable for both meshgrid and non-meshgrid data. Second, NeurTV can more exactly capture local correlations across data for any direction and any order of derivatives attributed to the implicit and continuous nature of neural domain. We theoretically reinterpret NeurTV under the variational approximation framework, which allows us to build the connection between classical TV and NeurTV and inspires us to develop variants (e.g., NeurTV with arbitrary resolution and space-variant NeurTV). Extensive numerical experiments with meshgrid data (e.g., color and hyperspectral images) and non-meshgrid data (e.g., point clouds and spatial transcriptomics) showcase the effectiveness of the proposed methods. △ Less

Submitted 27 May, 2024; originally announced May 2024.

MSC Class: 94A08; 68U10; 68T45

arXiv:2405.14206 [pdf, other]

LG-VQ: Language-Guided Codebook Learning

Authors: Guotao Liang, Baoquan Zhang, Yaowei Wang, Xutao Li, Yunming Ye, Huaibin Wang, Chuyao Luo, Kola Ye, linfeng Luo

Abstract: Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image synthesis, which aims to learn a codebook to encode an image with a sequence of discrete codes and then generate an image in an auto-regression manner. Although existing methods have shown superior performance, most methods prefer to learn a single-modal codebook (\emph{e.g.}, image), resulting in suboptimal per… ▽ More Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image synthesis, which aims to learn a codebook to encode an image with a sequence of discrete codes and then generate an image in an auto-regression manner. Although existing methods have shown superior performance, most methods prefer to learn a single-modal codebook (\emph{e.g.}, image), resulting in suboptimal performance when the codebook is applied to multi-modal downstream tasks (\emph{e.g.}, text-to-image, image captioning) due to the existence of modal gaps. In this paper, we propose a novel language-guided codebook learning framework, called LG-VQ, which aims to learn a codebook that can be aligned with the text to improve the performance of multi-modal downstream tasks. Specifically, we first introduce pre-trained text semantics as prior knowledge, then design two novel alignment modules (\emph{i.e.}, Semantic Alignment Module, and Relationship Alignment Module) to transfer such prior knowledge into codes for achieving codebook text alignment. In particular, our LG-VQ method is model-agnostic, which can be easily integrated into existing VQ models. Experimental results show that our method achieves superior performance on reconstruction and various multi-modal downstream tasks. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: None

arXiv:2405.13190 [pdf, other]

Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation

Authors: Haoteng Tang, Guodong Liu, Siyuan Dai, Kai Ye, Kun Zhao, Wenlu Wang, Carl Yang, Lifang He, Alex Leow, Paul Thompson, Heng Huang, Liang Zhan

Abstract: The MRI-derived brain network serves as a pivotal instrument in elucidating both the structural and functional aspects of the brain, encompassing the ramifications of diseases and developmental processes. However, prevailing methodologies, often focusing on synchronous BOLD signals from functional MRI (fMRI), may not capture directional influences among brain regions and rarely tackle temporal fun… ▽ More The MRI-derived brain network serves as a pivotal instrument in elucidating both the structural and functional aspects of the brain, encompassing the ramifications of diseases and developmental processes. However, prevailing methodologies, often focusing on synchronous BOLD signals from functional MRI (fMRI), may not capture directional influences among brain regions and rarely tackle temporal functional dynamics. In this study, we first construct the brain-effective network via the dynamic causal model. Subsequently, we introduce an interpretable graph learning framework termed Spatio-Temporal Embedding ODE (STE-ODE). This framework incorporates specifically designed directed node embedding layers, aiming at capturing the dynamic interplay between structural and effective networks via an ordinary differential equation (ODE) model, which characterizes spatial-temporal brain dynamics. Our framework is validated on several clinical phenotype prediction tasks using two independent publicly available datasets (HCP and OASIS). The experimental results clearly demonstrate the advantages of our model compared to several state-of-the-art methods. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.12635 [pdf, other]

TempoScale: A Cloud Workloads Prediction Approach Integrating Short-Term and Long-Term Information

Authors: Linfeng Wen, Minxian Xu, Adel N. Toosi, Kejiang Ye

Abstract: Cloud native solutions are widely applied in various fields, placing higher demands on the efficient management and utilization of resource platforms. To achieve the efficiency, load forecasting and elastic scaling have become crucial technologies for dynamically adjusting cloud resources to meet user demands and minimizing resource waste. However, existing prediction-based methods lack comprehens… ▽ More Cloud native solutions are widely applied in various fields, placing higher demands on the efficient management and utilization of resource platforms. To achieve the efficiency, load forecasting and elastic scaling have become crucial technologies for dynamically adjusting cloud resources to meet user demands and minimizing resource waste. However, existing prediction-based methods lack comprehensive analysis and integration of load characteristics across different time scales. For instance, long-term trend analysis helps reveal long-term changes in load and resource demand, thereby supporting proactive resource allocation over longer periods, while short-term volatility analysis can examine short-term fluctuations in load and resource demand, providing support for real-time scheduling and rapid response. In response to this, our research introduces TempoScale, which aims to enhance the comprehensive understanding of temporal variations in cloud workloads, enabling more intelligent and adaptive decision-making for elastic scaling. TempoScale utilizes the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise algorithm to decompose time-series load data into multiple Intrinsic Mode Functions (IMF) and a Residual Component (RC). First, we integrate the IMF, which represents both long-term trends and short-term fluctuations, into the time series prediction model to obtain intermediate results. Then, these intermediate results, along with the RC, are transferred into a fully connected layer to obtain the final result. Finally, this result is fed into the resource management system based on Kubernetes for resource scaling. Our proposed approach can reduce the Mean Square Error by 5.80% to 30.43% compared to the baselines, and reduce the average response time by 5.58% to 31.15%. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 11pages, 11 figures, 4 tables

Journal ref: In proceedings of IEEE CLOUD 2024

arXiv:2405.09554 [pdf, ps, other]

Underdetermined DOA Estimation of Off-Grid Sources Based on the Generalized Double Pareto Prior

Authors: Yongfeng Huang, Zhendong Chen, Kun Ye, Lang Zhou, Haixin Sun

Abstract: In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to mod… ▽ More In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to model the real array manifold matrix, and Bayesian inference is utilized to calculate the off-grid error, which mitigates the grid dictionary mismatch problem in underdetermined scenarios. Secondly, an innovative grid refinement method is introduced, treating grid points as iterative parameters to minimize the modeling error between the source and grid points. The numerical simulation results verify the superiority of the proposed strategy, especially when dealing with a coarse grid and few snapshots. △ Less

Submitted 17 May, 2024; v1 submitted 18 April, 2024; originally announced May 2024.

arXiv:2405.09470 [pdf, other]

Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer

Authors: Weifei Jin, Yuxin Cao, Junjie Su, Qi Shen, Kai Ye, Derui Wang, Jie Hao, Ziyao Liu

Abstract: In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of… ▽ More In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while keeping sound naturalness due to our user study. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Accepted to SecTL (AsiaCCS Workshop) 2024

arXiv:2405.05128 [pdf, ps, other]

Degree of the Grassmannian as an affine variety

Authors: Lek-Heng Lim, Ke Ye

Abstract: The degree of the Grassmannian with respect to the Plücker embedding is well-known. However, the Plücker embedding, while ubiquitous in pure mathematics, is almost never used in applied mathematics. In applied mathematics, the Grassmannian is usually embedded as projection matrices… ▽ More The degree of the Grassmannian with respect to the Plücker embedding is well-known. However, the Plücker embedding, while ubiquitous in pure mathematics, is almost never used in applied mathematics. In applied mathematics, the Grassmannian is usually embedded as projection matrices $\operatorname{Gr}(k,\mathbb{R}^n) \cong \{P \in \mathbb{R}^{n \times n} : P^{\scriptscriptstyle\mathsf{T}} = P = P^2,\; \operatorname{tr}(P) = k\}$ or as involution matrices $\operatorname{Gr}(k,\mathbb{R}^n) \cong \{X \in \mathbb{R}^{n \times n} : X^{\scriptscriptstyle\mathsf{T}} = X,\; X^2 = I,\; \operatorname{tr}(X)=2k - n\}$. We will determine an explicit expression for the degree of the Grassmannian with respect to these embeddings. In so doing, we resolved a conjecture of Devriendt--Friedman--Sturmfels about the degree $\operatorname{Gr}(2, \mathbb{R}^n)$ and in fact generalized it to $\operatorname{Gr}(k, \mathbb{R}^n)$. We also proved a set theoretic variant of another conjecture of Devriendt--Friedman--Sturmfels about the limit of $\operatorname{Gr}(k,\mathbb{R}^n)$ in the sense of Gröbner degneration. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 15 pages

MSC Class: 14E25; 14F45

arXiv:2404.18454 [pdf, other]

doi 10.1145/3641519.3657456

3D Gaussian Splatting with Deferred Reflection

Authors: Keyang Ye, Qiming Hou, Kun Zhou

Abstract: The advent of neural and Gaussian-based radiance field methods have achieved great success in the field of novel view synthesis. However, specular reflection remains non-trivial, as the high frequency radiance field is notoriously difficult to fit stably and accurately. We present a deferred shading method to effectively render specular reflection with Gaussian splatting. The key challenge comes f… ▽ More The advent of neural and Gaussian-based radiance field methods have achieved great success in the field of novel view synthesis. However, specular reflection remains non-trivial, as the high frequency radiance field is notoriously difficult to fit stably and accurately. We present a deferred shading method to effectively render specular reflection with Gaussian splatting. The key challenge comes from the environment map reflection model, which requires accurate surface normal while simultaneously bottlenecks normal estimation with discontinuous gradients. We leverage the per-pixel reflection gradients generated by deferred shading to bridge the optimization process of neighboring Gaussians, allowing nearly correct normal estimations to gradually propagate and eventually spread over all reflective objects. Our method significantly outperforms state-of-the-art techniques and concurrent work in synthesizing high-quality specular reflection effects, demonstrating a consistent improvement of peak signal-to-noise ratio (PSNR) for both synthetic and real-world scenes, while running at a frame rate almost identical to vanilla Gaussian splatting. △ Less

Submitted 4 June, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.10541 [pdf, other]

MPCOM: Robotic Data Gathering with Radio Mapping and Model Predictive Communication

Authors: Zhiyou Ji, Guoliang Li, Ruihua Han, Shuai Wang, Bing Bai, Wei Xu, Kejiang Ye, Chengzhong Xu

Abstract: Robotic data gathering (RDG) is an emerging paradigm that navigates a robot to harvest data from remote sensors. However, motion planning in this paradigm needs to maximize the RDG efficiency instead of the navigation efficiency, for which the existing motion planning methods become inefficient, as they plan robot trajectories merely according to motion factors. This paper proposes radio map guide… ▽ More Robotic data gathering (RDG) is an emerging paradigm that navigates a robot to harvest data from remote sensors. However, motion planning in this paradigm needs to maximize the RDG efficiency instead of the navigation efficiency, for which the existing motion planning methods become inefficient, as they plan robot trajectories merely according to motion factors. This paper proposes radio map guided model predictive communication (MPCOM), which navigates the robot with both grid and radio maps for shape-aware collision avoidance and communication-aware trajectory generation in a dynamic environment. The proposed MPCOM is able to trade off the time spent on reaching goal, avoiding collision, and improving communication. MPCOM captures high-order signal propagation characteristics using radio maps and incorporates the map-guided communication regularizer to the motion planning block. Experiments in IRSIM and CARLA simulators show that the proposed MPCOM outperforms other benchmarks in both LOS and NLOS cases. Real-world testing based on car-like robots is also provided to demonstrate the effectiveness of MPCOM in indoor environments. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: submit to IROS

arXiv:2404.08175 [pdf, ps, other]

A Novel Vision Transformer based Load Profile Analysis using Load Images as Inputs

Authors: Hyeonjin Kim, Yi Hu, Kai Ye, Ning Lu

Abstract: This paper introduces ViT4LPA, an innovative Vision Transformer (ViT) based approach for Load Profile Analysis (LPA). We transform time-series load profiles into load images. This allows us to leverage the ViT architecture, originally designed for image processing, as a pre-trained image encoder to uncover latent patterns within load data. ViT is pre-trained using an extensive load image dataset,… ▽ More This paper introduces ViT4LPA, an innovative Vision Transformer (ViT) based approach for Load Profile Analysis (LPA). We transform time-series load profiles into load images. This allows us to leverage the ViT architecture, originally designed for image processing, as a pre-trained image encoder to uncover latent patterns within load data. ViT is pre-trained using an extensive load image dataset, comprising 1M load images derived from smart meter data collected over a two-year period from 2,000 residential users. The training methodology is self-supervised, masked image modeling, wherein masked load images are restored to reveal hidden relationships among image patches. The pre-trained ViT encoder is then applied to various downstream tasks, including the identification of electric vehicle (EV) charging loads and behind-the-meter solar photovoltaic (PV) systems and load disaggregation. Simulation results illustrate ViT4LPA's superior performance compared to existing neural network models in downstream tasks. Additionally, we conduct an in-depth analysis of the attention weights within the ViT4LPA model to gain insights into its information flow mechanisms. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.20031 [pdf, other]

A Unified Framework for Human-centric Point Cloud Video Understanding

Authors: Yiteng Xu, Kecheng Ye, Xiao Han, Yiming Ren, Xinge Zhu, Yuexin Ma

Abstract: Human-centric Point Cloud Video Understanding (PVU) is an emerging field focused on extracting and interpreting human-related features from sequences of human point clouds, further advancing downstream human-centric tasks and applications. Previous works usually focus on tackling one specific task and rely on huge labeled data, which has poor generalization capability. Considering that human has s… ▽ More Human-centric Point Cloud Video Understanding (PVU) is an emerging field focused on extracting and interpreting human-related features from sequences of human point clouds, further advancing downstream human-centric tasks and applications. Previous works usually focus on tackling one specific task and rely on huge labeled data, which has poor generalization capability. Considering that human has specific characteristics, including the structural semantics of human body and the dynamics of human motions, we propose a unified framework to make full use of the prior knowledge and explore the inherent features in the data itself for generalized human-centric point cloud video understanding. Extensive experiments demonstrate that our method achieves state-of-the-art performance on various human-related tasks, including action recognition and 3D pose estimation. All datasets and code will be released soon. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2403.09016 [pdf]

A Processing Route to Chalcogenide Perovskites Alloys with Tunable Band Gap via Anion Exchange

Authors: Kevin Ye, Ida Sadeghi, Michael Xu, Jack Van Sambeek, Tao Cai, Jessica Dong, Rishabh Kothari, James M. LeBeau, R. Jaramillo

Abstract: We demonstrate synthesis of BaZr(S,Se)3 chalcogenide perovskite alloys by selenization of BaZrS3 thin films. The anion-exchange process produces films with tunable composition and band gap without changing the orthorhombic perovskite crystal structure or the film microstructure. The direct band gap is tunable between 1.5 and 1.9 eV. The alloy films made in this way feature 100x stronger photocondu… ▽ More We demonstrate synthesis of BaZr(S,Se)3 chalcogenide perovskite alloys by selenization of BaZrS3 thin films. The anion-exchange process produces films with tunable composition and band gap without changing the orthorhombic perovskite crystal structure or the film microstructure. The direct band gap is tunable between 1.5 and 1.9 eV. The alloy films made in this way feature 100x stronger photoconductive response and a lower density of extended defects, compared to alloy films made by direct growth. The perovskite structure is stable in high-selenium-content thin films with and without epitaxy. The manufacturing-compatible process of selenization in H2Se gas may spur the development of chalcogenide perovskite solar cell technology. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08136 [pdf, other]

RoboCertProb: Property Specification for Probabilistic RoboChart Models

Authors: Kangfeng Ye, Jim Woodcock

Abstract: RoboChart is a core notation in the RoboStar framework which brings modern modelling and formal verification technologies into software engineering for robotics. It is a timed and probabilistic domain-specific language for robotics and provides a UML-like architectural and state machine modelling. This work presents RoboCertProb for specifying quantitative properties of probabilistic robotic syste… ▽ More RoboChart is a core notation in the RoboStar framework which brings modern modelling and formal verification technologies into software engineering for robotics. It is a timed and probabilistic domain-specific language for robotics and provides a UML-like architectural and state machine modelling. This work presents RoboCertProb for specifying quantitative properties of probabilistic robotic systems modelled in RoboChart. RoboCertProb's semantics is based on PCTL*. To interpret RoboCertProb over RoboChart models, we give a Markov semantics (DTMCs and MDPs) to RoboChart, derived from its existing transformation semantics to the PRISM language. In addition to property specification, RoboCertProb also entitles us to configure loose constants and unspecified functions and operations in RoboChart models. It allows us to set up environmental inputs to verify reactive probabilistic systems not directly supported in probabilistic model checkers like PRISM because they employ a closed-world assumption. We implement RoboCertProb in an accompanying tool of RoboChart, RoboTool, for specifying properties and automatically generating PRISM properties from them to formally verify RoboChart models using PRISM. We have used it to analyse the behaviour of software controllers for two real robots: an industrial painting robot and an agricultural robot for treating plants with UV lights. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 24 pages, 10 figures, 4 tables, submitted to the International Journal on Software and Systems Modeling (SoSyM)

arXiv:2403.00169 [pdf, other]

Quantitative Assurance and Synthesis of Controllers from Activity Diagrams

Authors: Kangfeng Ye, Fang Yan, Simos Gerasimou

Abstract: Probabilistic model checking is a widely used formal verification technique to automatically verify qualitative and quantitative properties for probabilistic models. However, capturing such systems, writing corresponding properties, and verifying them require domain knowledge. This makes it not accessible for researchers and engineers who may not have the required knowledge. Previous studies have… ▽ More Probabilistic model checking is a widely used formal verification technique to automatically verify qualitative and quantitative properties for probabilistic models. However, capturing such systems, writing corresponding properties, and verifying them require domain knowledge. This makes it not accessible for researchers and engineers who may not have the required knowledge. Previous studies have extended UML activity diagrams (ADs), developed transformations, and implemented accompanying tools for automation. The research, however, is incomprehensive and not fully open, which makes it hard to be evaluated, extended, adapted, and accessed. In this paper, we propose a comprehensive verification framework for ADs, including a new profile for probability, time, and quality annotations, a semantics interpretation of ADs in three Markov models, and a set of transformation rules from activity diagrams to the PRISM language, supported by PRISM and Storm. Most importantly, we developed algorithms for transformation and implemented them in a tool, called QASCAD, using model-based techniques, for fully automated verification. We evaluated one case study where multiple robots are used for delivery in a hospital and further evaluated six other examples from the literature. With all these together, this work makes noteworthy contributions to the verification of ADs by improving evaluation, extensibility, adaptability, and accessibility. △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: 43 pages, 29 figures, 5 tables, submitted to Journal of Systems and Software (JSS)

ACM Class: D.2.4; F.3.1; F.3.2; F.4.3

arXiv:2402.18957 [pdf, other]

Vibrational properties differ between halide and chalcogenide perovskite semiconductors, and it matters for optoelectronic performance

Authors: K. Ye, M. Menahem, T. Salzillo, F. Knoop, B. Zhao, S. Niu, O. Hellman, J. Ravichandran, R. Jaramillo, O. Yaffe

Abstract: We report a comparative study of temperature-dependent photoluminescence and structural dynamics of two perovskite semiconductors, the chalcogenide BaZrS$_3$ (BZS) and the halide CsPbBr$_3$ (CPB). These materials have similar crystal structures and direct band gaps, but we find that they have quite distinct optoelectronic and vibrational properties. Both materials exhibit thermally-activated non-r… ▽ More We report a comparative study of temperature-dependent photoluminescence and structural dynamics of two perovskite semiconductors, the chalcogenide BaZrS$_3$ (BZS) and the halide CsPbBr$_3$ (CPB). These materials have similar crystal structures and direct band gaps, but we find that they have quite distinct optoelectronic and vibrational properties. Both materials exhibit thermally-activated non-radiative recombination, but the non-radiative recombination rate in BZS is between two and four orders of magnitude faster than in CPB. Raman spectroscopy reveals that the effects of phonon anharmonicity are far more pronounced in CPB than in BZS. Further, although both materials feature a large dielectric response due to low-energy polar optical phonons, the phonons in CPB are substantially lower in energy than in BZS. Our results suggest that electron-phonon coupling in BZS is more effective at non-radiative recombination than in CPB, and that BZS may also have a substantially higher concentration of non-radiative recombination centers than CPB. The low defect concentration in CPB may be related to the ease of lattice reconfiguration, typified by anharmonic bonding. It remains to be seen to what extent these differences are inherent to the chalcogenide and halide perovskites and to what extent they can be affected by materials processing; comparing BZS single-crystals and thin films provides reason for optimism. △ Less

Submitted 14 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: Main text - 12 pages with 5 figures and 1 table. Supplemental text - 16 pages with 6 figures and 5 tables

arXiv:2402.14255 [pdf]

Observation of temporal topological boundary states of light in a momentum bandgap

Authors: Yudong Ren, Kangpeng Ye, Qiaolu Chen, Fujia Chen, Li Zhang, Yuang Pan, Wenhao Li, Xinrui Li, Lu Zhang, Hongsheng Chen, Yihao Yang

Abstract: Topological phases have prevailed across diverse disciplines, spanning electronics, photonics, and acoustics. Hitherto, the understanding of these phases has centred on energy (frequency) bandstructures, showcasing topological boundary states at spatial interfaces. Recent strides have uncovered a unique category of bandstructures characterized by gaps in momentum, referred to as momentum bandgaps… ▽ More Topological phases have prevailed across diverse disciplines, spanning electronics, photonics, and acoustics. Hitherto, the understanding of these phases has centred on energy (frequency) bandstructures, showcasing topological boundary states at spatial interfaces. Recent strides have uncovered a unique category of bandstructures characterized by gaps in momentum, referred to as momentum bandgaps or k gaps, notably driven by breakthroughs in photonic time crystals. This discovery hints at abundant topological phases defined within momentum bands, alongside a wealth of topological boundary states in the time domain. Here, we report the first experimental observation of k-gap topology in a large-scale optical temporal synthetic lattice, manifesting as temporal topological boundary states. These boundary states are uniquely situated at temporal interfaces between two subsystems with distinct k-gap topology. Counterintuitively, despite the exponential amplification of k-gap modes within both subsystems, these topological boundary states exhibit decay in both temporal directions. Our findings mark a significant pathway for delving into k gaps, temporal topological states, and time-varying physics. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.08917 [pdf, other]

An Interference-aware Approach for Co-located Container Orchestration with Novel Metric

Authors: Xiang Li, Linfeng Wen, Minxian Xu, Kejiang Ye

Abstract: Container orchestration technologies are widely employed in cloud computing, facilitating the co-location of online and offline services on the same infrastructure. Online services demand rapid responsiveness and high availability, whereas offline services require extensive computational resources. However, this mixed deployment can lead to resource contention, adversely affecting the performance… ▽ More Container orchestration technologies are widely employed in cloud computing, facilitating the co-location of online and offline services on the same infrastructure. Online services demand rapid responsiveness and high availability, whereas offline services require extensive computational resources. However, this mixed deployment can lead to resource contention, adversely affecting the performance of online services, yet the metrics used by existing methods cannot accurately reflect the extent of interference. In this paper, we introduce scheduling latency as a novel metric for quantifying interference and compare it with existing metrics. Empirical evidence demonstrates that scheduling latency more accurately reflects the performance degradation of online services. We also utilize various machine learning techniques to predict potential interference on specific hosts for online services, providing reference information for subsequent scheduling decisions. Simultaneously, we propose a method for quantifying node interference based on scheduling latency. To enhance resource utilization, we train a model for online services that predicts CPU and MEM (memory) resource allocation based on workload type and QPS. Finally, we present a scheduling algorithm based on predictive modeling, aiming to reduce interference in online services while balancing node resource utilization. Through experiments and comparisons with three other baseline methods, we demonstrate the effectiveness of our approach. Compared with three baselines, our approach can reduce the average response time, 90th percentile response time, and 99th percentile response time of online services by 29.4%, 31.4%, and 14.5%, respectively. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 8 pages

Journal ref: In the Proceedings of IEEE SmartData 2023

arXiv:2402.04134 [pdf, other]

A quasi-optimal lower bound for skew polynomial multiplication

Authors: Qiyuan Chen, Ke Ye

Abstract: We establish a lower bound for the complexity of multiplying two skew polynomials. The lower bound coincides with the upper bound conjectured by Caruso and Borgne in 2017, up to a log factor. We present algorithms for three special cases, indicating that the aforementioned lower bound is quasi-optimal. In fact, our lower bound is quasi-optimal in the sense of bilinear complexity. In addition, we d… ▽ More We establish a lower bound for the complexity of multiplying two skew polynomials. The lower bound coincides with the upper bound conjectured by Caruso and Borgne in 2017, up to a log factor. We present algorithms for three special cases, indicating that the aforementioned lower bound is quasi-optimal. In fact, our lower bound is quasi-optimal in the sense of bilinear complexity. In addition, we discuss the average bilinear complexity of simultaneous multiplication of skew polynomials and the complexity of skew polynomial multiplication in the case of towers of extensions. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.03456 [pdf, other]

Constrained Multiview Representation for Self-supervised Contrastive Learning

Authors: Siyuan Dai, Kai Ye, Kun Zhao, Ge Cui, Haoteng Tang, Liang Zhan

Abstract: Representation learning constitutes a pivotal cornerstone in contemporary deep learning paradigms, offering a conduit to elucidate distinctive features within the latent space and interpret the deep models. Nevertheless, the inherent complexity of anatomical patterns and the random nature of lesion distribution in medical image segmentation pose significant challenges to the disentanglement of rep… ▽ More Representation learning constitutes a pivotal cornerstone in contemporary deep learning paradigms, offering a conduit to elucidate distinctive features within the latent space and interpret the deep models. Nevertheless, the inherent complexity of anatomical patterns and the random nature of lesion distribution in medical image segmentation pose significant challenges to the disentanglement of representations and the understanding of salient features. Methods guided by the maximization of mutual information, particularly within the framework of contrastive learning, have demonstrated remarkable success and superiority in decoupling densely intertwined representations. However, the effectiveness of contrastive learning highly depends on the quality of the positive and negative sample pairs, i.e. the unselected average mutual information among multi-views would obstruct the learning strategy so the selection of the views is vital. In this work, we introduce a novel approach predicated on representation distance-based mutual information (MI) maximization for measuring the significance of different views, aiming at conducting more efficient contrastive learning and representation disentanglement. Additionally, we introduce an MI re-ranking strategy for representation selection, benefiting both the continuous MI estimating and representation significance distance measuring. Specifically, we harness multi-view representations extracted from the frequency domain, re-evaluating their significance based on mutual information across varying frequencies, thereby facilitating a multifaceted contrastive learning approach to bolster semantic comprehension. The statistical results under the five metrics demonstrate that our proposed framework proficiently constrains the MI maximization-driven representation selection and steers the multi-view contrastive learning process. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 11 pages, 9 figures, 2 algorithms

arXiv:2401.13160 [pdf, other]

SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

Authors: Ke Ye, Heinrich Jiang, Afshin Rostamizadeh, Ayan Chakrabarti, Giulia DeSalvo, Jean-François Kagy, Lazaros Karydas, Gui Citovsky, Sanjiv Kumar

Abstract: Pre-training large language models is known to be extremely resource intensive and often times inefficient, under-utilizing the information encapsulated in the training text sequences. In this paper, we present SpacTor, a new training procedure consisting of (1) a hybrid objective combining span corruption (SC) and token replacement detection (RTD), and (2) a two-stage curriculum that optimizes th… ▽ More Pre-training large language models is known to be extremely resource intensive and often times inefficient, under-utilizing the information encapsulated in the training text sequences. In this paper, we present SpacTor, a new training procedure consisting of (1) a hybrid objective combining span corruption (SC) and token replacement detection (RTD), and (2) a two-stage curriculum that optimizes the hybrid objective over the initial $τ$ iterations, then transitions to standard SC loss. We show empirically that the effectiveness of the hybrid objective is tied to the two-stage pre-training schedule, and provide extensive analysis on why this is the case. In our experiments with encoder-decoder architectures (T5) on a variety of NLP tasks, SpacTor-T5 yields the same downstream performance as standard SC pre-training, while enabling a 50% reduction in pre-training iterations and 40% reduction in total FLOPs. Alternatively, given the same amount of computing budget, we find that SpacTor results in significantly improved downstream benchmark performance. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 9+13 pages, 5 figures

arXiv:2401.12651 [pdf, other]

Brillouin nonlinearity characterizations of a high refractive index silicon oxynitride platform

Authors: Kaixuan Ye, Akshay Keloth, Yvan Klaver, Alessio Baldazzi, Gioele Piccoli, Matteo Sanna, Lorenzo Pavesi, Mher Ghulinyan, David Marpaung

Abstract: Silicon oxynitride (SiON) is a low-loss and versatile material for linear and nonlinear photonics applications. Controlling the oxygen-to-nitrogen (O/N) ratio in SiON provides an effective way to engineer its optical and mechanical properties, making it a great platform for the investigation of on-chip optomechanical interactions, especially the stimulated Brillouin scattering (SBS). Here we repor… ▽ More Silicon oxynitride (SiON) is a low-loss and versatile material for linear and nonlinear photonics applications. Controlling the oxygen-to-nitrogen (O/N) ratio in SiON provides an effective way to engineer its optical and mechanical properties, making it a great platform for the investigation of on-chip optomechanical interactions, especially the stimulated Brillouin scattering (SBS). Here we report the Brillouin nonlinearity characterization of a SiON platform with a specific O/N ratio (characterized by a refractive index of $n=1.65$). First, we introduce this particular SiON platform with fabrication details. Subsequently, we discuss various techniques for the on-chip Brillouin nonlinearity characterizations. In particular, we focus on the intensity-modulated pump-probe lock-in amplifier technique, which enables ultra-sensitive characterization. Finally, we analyze the Brillouin nonlinearities of this SiON platform and compare them with other SiON platforms. This work underscores the potential of SiON for on-chip Brillouin-based applications. Moreover, it paves the way for Brillouin nonlinearity characterization across various material platforms. △ Less

Submitted 29 January, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.01484 [pdf, other]

Uncertainty Regularized Evidential Regression

Authors: Kai Ye, Tiejin Chen, Hua Wei, Liang Zhan

Abstract: The Evidential Regression Network (ERN) represents a novel approach that integrates deep learning with Dempster-Shafer's theory to predict a target and quantify the associated uncertainty. Guided by the underlying theory, specific activation functions must be employed to enforce non-negative values, which is a constraint that compromises model performance by limiting its ability to learn from all… ▽ More The Evidential Regression Network (ERN) represents a novel approach that integrates deep learning with Dempster-Shafer's theory to predict a target and quantify the associated uncertainty. Guided by the underlying theory, specific activation functions must be employed to enforce non-negative values, which is a constraint that compromises model performance by limiting its ability to learn from all samples. This paper provides a theoretical analysis of this limitation and introduces an improvement to overcome it. Initially, we define the region where the models can't effectively learn from the samples. Following this, we thoroughly analyze the ERN and investigate this constraint. Leveraging the insights from our analysis, we address the limitation by introducing a novel regularization term that empowers the ERN to learn from the whole training set. Our extensive experiments substantiate our theoretical findings and demonstrate the effectiveness of the proposed solution. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: Accepted to AAAI 2024 main track

arXiv:2312.13721 [pdf, ps, other]

Bundle-based similarity measurement for positive semidefinite matrices

Authors: Peng Liu, Ke Ye

Abstract: Positive semidefinite (PSD) matrices are indispensable in many fields of science. A similarity measurement for such matrices is usually an essential ingredient in the mathematical modelling of a scientific problem. This paper proposes a unified framework to construct similarity measurements for PSD matrices. The framework is obtained by exploring the fiber bundle structure of the cone of PSD matri… ▽ More Positive semidefinite (PSD) matrices are indispensable in many fields of science. A similarity measurement for such matrices is usually an essential ingredient in the mathematical modelling of a scientific problem. This paper proposes a unified framework to construct similarity measurements for PSD matrices. The framework is obtained by exploring the fiber bundle structure of the cone of PSD matrices and generalizing the idea of the point-set distance previously developed for linear subsapces and positive definite (PD) matrices. The framework demonstrates both theoretical advantages and computational convenience: (1) We prove that the similarity measurement constructed by the framework can be recognized either as the cost of a parallel transport or as the length of a quasi-geodesic curve. (2) We extend commonly used divergences for equidimensional PD matrices to the non-equidimensional case. Examples include Kullback-Leibler divergence, Bhattacharyya divergence and Rényi divergence. We prove that these extensions enjoy the same consistency property as their counterpart for geodesic distance. (3) We apply our geometric framework to further extend those in (2) to similarity measurements for arbitrary PSD matrices. We also provide simple formulae to compute these similarity measurements in most situations. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.11595 [pdf, other]

SPIRE: Semantic Prompt-Driven Image Restoration

Authors: Chenyang Qi, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi

Abstract: Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement. However, it still remains an open research problem to adopt this language-vision paradigm for more fine-level image processing tasks, such as denoising, super-resolution, deblurring, and compression artifact removal. In this paper, we develop SPI… ▽ More Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement. However, it still remains an open research problem to adopt this language-vision paradigm for more fine-level image processing tasks, such as denoising, super-resolution, deblurring, and compression artifact removal. In this paper, we develop SPIRE, a Semantic and restoration Prompt-driven Image Restoration framework that leverages natural language as a user-friendly interface to control the image restoration process. We consider the capacity of prompt information in two dimensions. First, we use content-related prompts to enhance the semantic alignment, effectively alleviating identity ambiguity in the restoration outcomes. Second, our approach is the first framework that supports fine-level instruction through language-based quantitative specification of the restoration strength, without the need for explicit task-specific design. In addition, we introduce a novel fusion mechanism that augments the existing ControlNet architecture by learning to rescale the generative prior, thereby achieving better restoration fidelity. Our extensive experiments demonstrate the superior restoration performance of SPIRE compared to the state of the arts, alongside offering the flexibility of text-based control over the restoration effects. △ Less

Submitted 16 July, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted by ECCV 2024; Webpage: https://chenyangqiqi.github.io/tip

arXiv:2311.14697 [pdf, other]

Surface acoustic wave stimulated Brillouin scattering in thin-film lithium niobate waveguides

Authors: Kaixuan Ye, Hanke Feng, Yvan Klaver, Akshay Keloth, Akhileshwar Mishra, Cheng Wang, David Marpaung

Abstract: We report the first-ever experimental observation of backward stimulated Brillouin scattering (SBS) in thin-film lithium niobate (TFLN) waveguides. The peak Brillouin gain coefficient of the z-cut LN waveguide with a crystal rotation angle of 20$^{\circ}$ is as high as 84.9m$^{-1}$W$^{-1}$, facilitated by surface acoustic waves (SAW) at 8.06GHz. We report the first-ever experimental observation of backward stimulated Brillouin scattering (SBS) in thin-film lithium niobate (TFLN) waveguides. The peak Brillouin gain coefficient of the z-cut LN waveguide with a crystal rotation angle of 20$^{\circ}$ is as high as 84.9m$^{-1}$W$^{-1}$, facilitated by surface acoustic waves (SAW) at 8.06GHz. △ Less

Submitted 1 December, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.13404

Animatable 3D Gaussians for High-fidelity Synthesis of Human Motions

Authors: Keyang Ye, Tianjia Shao, Kun Zhou

Abstract: We present a novel animatable 3D Gaussian model for rendering high-fidelity free-view human motions in real time. Compared to existing NeRF-based methods, the model owns better capability in synthesizing high-frequency details without the jittering problem across video frames. The core of our model is a novel augmented 3D Gaussian representation, which attaches each Gaussian with a learnable code.… ▽ More We present a novel animatable 3D Gaussian model for rendering high-fidelity free-view human motions in real time. Compared to existing NeRF-based methods, the model owns better capability in synthesizing high-frequency details without the jittering problem across video frames. The core of our model is a novel augmented 3D Gaussian representation, which attaches each Gaussian with a learnable code. The learnable code serves as a pose-dependent appearance embedding for refining the erroneous appearance caused by geometric transformation of Gaussians, based on which an appearance refinement model is learned to produce residual Gaussian properties to match the appearance in target pose. To force the Gaussians to learn the foreground human only without background interference, we further design a novel alpha loss to explicitly constrain the Gaussians within the human body. We also propose to jointly optimize the human joint parameters to improve the appearance accuracy. The animatable 3D Gaussian model can be learned with shallow MLPs, so new human motions can be synthesized in real time (66 fps on avarage). Experiments show that our model has superior performance over NeRF-based methods. △ Less

Submitted 26 November, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

Comments: Some experiment data is wrong. The expression of the paper in introduction and abstract is incorrect. Some graphs have inappropriate descriptions

arXiv:2311.04512 [pdf, other]

FFINet: Future Feedback Interaction Network for Motion Forecasting

Authors: Miao Kang, Shengqi Wang, Sanping Zhou, Ke Ye, Jingjing Jiang, Nanning Zheng

Abstract: Motion forecasting plays a crucial role in autonomous driving, with the aim of predicting the future reasonable motions of traffic agents. Most existing methods mainly model the historical interactions between agents and the environment, and predict multi-modal trajectories in a feedforward process, ignoring potential trajectory changes caused by future interactions between agents. In this paper,… ▽ More Motion forecasting plays a crucial role in autonomous driving, with the aim of predicting the future reasonable motions of traffic agents. Most existing methods mainly model the historical interactions between agents and the environment, and predict multi-modal trajectories in a feedforward process, ignoring potential trajectory changes caused by future interactions between agents. In this paper, we propose a novel Future Feedback Interaction Network (FFINet) to aggregate features the current observations and potential future interactions for trajectory prediction. Firstly, we employ different spatial-temporal encoders to embed the decomposed position vectors and the current position of each scene, providing rich features for the subsequent cross-temporal aggregation. Secondly, the relative interaction and cross-temporal aggregation strategies are sequentially adopted to integrate features in the current fusion module, observation interaction module, future feedback module and global fusion module, in which the future feedback module can enable the understanding of pre-action by feeding the influence of preview information to feedforward prediction. Thirdly, the comprehensive interaction features are further fed into final predictor to generate the joint predicted trajectories of multiple agents. Extensive experimental results show that our FFINet achieves the state-of-the-art performance on Argoverse 1 and Argoverse 2 motion forecasting benchmarks. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 11 pages, 8 figures, 12 tables

arXiv:2310.17742 [pdf]

BERT-PIN: A BERT-based Framework for Recovering Missing Data Segments in Time-series Load Profiles

Authors: Yi Hu, Kai Ye, Hyeonjin Kim, Ning Lu

Abstract: Inspired by the success of the Transformer model in natural language processing and computer vision, this paper introduces BERT-PIN, a Bidirectional Encoder Representations from Transformers (BERT) powered Profile Inpainting Network. BERT-PIN recovers multiple missing data segments (MDSs) using load and temperature time-series profiles as inputs. To adopt a standard Transformer model structure for… ▽ More Inspired by the success of the Transformer model in natural language processing and computer vision, this paper introduces BERT-PIN, a Bidirectional Encoder Representations from Transformers (BERT) powered Profile Inpainting Network. BERT-PIN recovers multiple missing data segments (MDSs) using load and temperature time-series profiles as inputs. To adopt a standard Transformer model structure for profile inpainting, we segment the load and temperature profiles into line segments, treating each segment as a word and the entire profile as a sentence. We incorporate a top candidates selection process in BERT-PIN, enabling it to produce a sequence of probability distributions, based on which users can generate multiple plausible imputed data sets, each reflecting different confidence levels. We develop and evaluate BERT-PIN using real-world dataset for two applications: multiple MDSs recovery and demand response baseline estimation. Simulation results show that BERT-PIN outperforms the existing methods in accuracy while is capable of restoring multiple MDSs within a longer window. BERT-PIN, served as a pre-trained model, can be fine-tuned for conducting many downstream tasks, such as classification and super resolution. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.12627 [pdf, ps, other]

Energy dependence of $J/ψ$ production in pp collisions with the PACIAE model

Authors: Kai-Fan Ye, Qiang Wang, Jia-Hao Shi, Zhi-Ying Qin, Wen-Chao Zhang, An-Ke Lei, Zhi-Lei She, Yu-Liang Yan, Ben-Hao Sa

Abstract: In this work we investigate the $J/ψ$ production in proton-proton collisions at the center-of-mass energy ($\sqrt{s}$) equal to 2.76, 5.02, 7, 8 and 13 TeV with a parton and hadron cascade model PACIAE 2.2a. It is based on PYTHIA but extended considering the partonic and hadronic rescatterings before and after hadronization, respectively. In the PYTHIA sector the $J/ψ$ production quantum chromodyn… ▽ More In this work we investigate the $J/ψ$ production in proton-proton collisions at the center-of-mass energy ($\sqrt{s}$) equal to 2.76, 5.02, 7, 8 and 13 TeV with a parton and hadron cascade model PACIAE 2.2a. It is based on PYTHIA but extended considering the partonic and hadronic rescatterings before and after hadronization, respectively. In the PYTHIA sector the $J/ψ$ production quantum chromodynamics processes are selected specially and a bias factor is proposed correspondingly. The calculated total cross sections, the differential cross sections as a function of the transverse momentum and the rapidity of $J/ψ$ in the forward rapidity region reproduce the corresponding experimental measurements reasonably well. In the mid-rapidity region, the double differential cross sections at $\sqrt{s}=$ 5.02, 7 and 13 TeV are also in a good agreement with the experimental data. Moreover, we interpolate the double differential cross section as well as the total cross section of $J/ψ$ in the mid-rapidity region at $\sqrt{s}=$ 8 TeV, which could be validated if the experimental data is available. △ Less

Submitted 8 February, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: 6 pages, 8 figures, 3 tables, accepted by Phys. Rev. C

arXiv:2310.05837 [pdf, other]

A Real-time Method for Inserting Virtual Objects into Neural Radiance Fields

Authors: Keyang Ye, Hongzhi Wu, Xin Tong, Kun Zhou

Abstract: We present the first real-time method for inserting a rigid virtual object into a neural radiance field, which produces realistic lighting and shadowing effects, as well as allows interactive manipulation of the object. By exploiting the rich information about lighting and geometry in a NeRF, our method overcomes several challenges of object insertion in augmented reality. For lighting estimation,… ▽ More We present the first real-time method for inserting a rigid virtual object into a neural radiance field, which produces realistic lighting and shadowing effects, as well as allows interactive manipulation of the object. By exploiting the rich information about lighting and geometry in a NeRF, our method overcomes several challenges of object insertion in augmented reality. For lighting estimation, we produce accurate, robust and 3D spatially-varying incident lighting that combines the near-field lighting from NeRF and an environment lighting to account for sources not covered by the NeRF. For occlusion, we blend the rendered virtual object with the background scene using an opacity map integrated from the NeRF. For shadows, with a precomputed field of spherical signed distance field, we query the visibility term for any point around the virtual object, and cast soft, detailed shadows onto 3D surfaces. Compared with state-of-the-art techniques, our approach can insert virtual object into scenes with superior fidelity, and has a great potential to be further applied to augmented reality systems. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.13432 [pdf, other]

Objective Bayesian analysis for the generalized exponential distribution

Authors: Aojun Li, Keying Ye, Min Wang

Abstract: In this paper, we consider objective Bayesian inference of the generalized exponential distribution using the independence Jeffreys prior and validate the propriety of the posterior distribution under a family of structured priors. We propose an efficient sampling algorithm via the generalized ratio-of-uniforms method to draw samples for making posterior inference. We carry out simulation studies… ▽ More In this paper, we consider objective Bayesian inference of the generalized exponential distribution using the independence Jeffreys prior and validate the propriety of the posterior distribution under a family of structured priors. We propose an efficient sampling algorithm via the generalized ratio-of-uniforms method to draw samples for making posterior inference. We carry out simulation studies to assess the finite-sample performance of the proposed Bayesian approach. Finally, a real-data application is provided for illustrative purposes. △ Less

Submitted 23 September, 2023; originally announced September 2023.

Comments: 13 pages, 5 figures, 2 tables

arXiv:2309.12592 [pdf, other]

ChainsFormer: A Chain Latency-aware Resource Provisioning Approach for Microservices Cluster

Authors: Chenghao Song, Minxian Xu, Kejiang Ye, Huaming Wu, Sukhpal Singh Gill, Rajkumar Buyya, Chengzhong Xu

Abstract: The trend towards transitioning from monolithic applications to microservices has been widely embraced in modern distributed systems and applications. This shift has resulted in the creation of lightweight, fine-grained, and self-contained microservices. Multiple microservices can be linked together via calls and inter-dependencies to form complex functions. One of the challenges in managing micro… ▽ More The trend towards transitioning from monolithic applications to microservices has been widely embraced in modern distributed systems and applications. This shift has resulted in the creation of lightweight, fine-grained, and self-contained microservices. Multiple microservices can be linked together via calls and inter-dependencies to form complex functions. One of the challenges in managing microservices is provisioning the optimal amount of resources for microservices in the chain to ensure application performance while improving resource usage efficiency. This paper presents ChainsFormer, a framework that analyzes microservice inter-dependencies to identify critical chains and nodes, and provision resources based on reinforcement learning. To analyze chains, ChainsFormer utilizes light-weight machine learning techniques to address the dynamic nature of microservice chains and workloads. For resource provisioning, a reinforcement learning approach is used that combines vertical and horizontal scaling to determine the amount of allocated resources and the number of replicates. We evaluate the effectiveness of ChainsFormer using realistic applications and traces on a real testbed based on Kubernetes. Our experimental results demonstrate that ChainsFormer can reduce response time by up to 26% and improve processed requests per second by 8% compared with state-of-the-art techniques. △ Less

Submitted 7 October, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: 15 pages

Journal ref: In the Proceedings of International Conference on Service Oriented Computing (ICSOC 2023)

arXiv:2308.15315 [pdf]

Practice of Alibaba Cloud on Elastic Resource Provisioning for Large-scale Microservices Cluster

Authors: Minxian Xu, Lei Yang, Yang Wang, Chengxi Gao, Linfeng Wen, Guoyao Xu, Liping Zhang, Kejiang Ye, Chengzhong Xu

Abstract: Cloud-native architecture is becoming increasingly crucial for today's cloud computing environments due to the need for speed and flexibility in developing applications. It utilizes microservice technology to break down traditional monolithic applications into light-weight and self-contained microservice components. However, as microservices grow in scale and have dynamic inter-dependencies, they… ▽ More Cloud-native architecture is becoming increasingly crucial for today's cloud computing environments due to the need for speed and flexibility in developing applications. It utilizes microservice technology to break down traditional monolithic applications into light-weight and self-contained microservice components. However, as microservices grow in scale and have dynamic inter-dependencies, they also pose new challenges in resource provisioning that cannot be fully addressed by traditional resource scheduling approaches. The various microservices with different resource needs and latency requirements can create complex calling chains, making it difficult to provide fine-grained and accurate resource allocation to each component while maintaining the overall quality of service in the chain. In this work, we aim to address the research problem on how to efficiently provision resources for the growing scale of microservice platform and ensure the performance of latency-critical microservices. To address the problem, we present in-depth analyses of Alibaba's microservice cluster and propose optimized resource provisioning algorithms to enhance resource utilization while ensuring the latency requirement. First, we analyze the distinct features of microservices in Alibaba's cluster compared to traditional applications. Then we present Alibaba's resource capacity provisioning workflow and framework to address challenges in resource provisioning for large-scale and latency-critical microservice clusters. Finally, we propose enhanced resource provisioning algorithms over Alibaba's current practice by making both proactive and reactive scheduling decisions based on different workloads patterns, which can improve resource usage by 10-15% in Alibaba's clusters, while maintaining the necessary latency for microservices. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: 19 pages

Journal ref: Software: Practice and Experience, 2023

arXiv:2308.09814 [pdf, other]

doi 10.1063/5.0178804

Observation of a Brillouin dynamic grating in silicon nitride waveguides

Authors: Roel Botter, Jasper van den Hoogen, Akhileshwar Mishra, Kaixuan Ye, Albert van Rees, Marcel Hoekman, Klaus Boller, David Marpaung

Abstract: Brillouin enhanced four wave mixing in the form of a Brillouin dynamic grating (BDG) enables a uniquely tunable filter, whose properties can be tuned by purely optical means. This makes the BDG a valuable tool in microwave photonics (MWP). BDGs have been studied extensively in fibers, but the only observation in an integrated platform required exotic materials. Unlocking BDG in a standard and matu… ▽ More Brillouin enhanced four wave mixing in the form of a Brillouin dynamic grating (BDG) enables a uniquely tunable filter, whose properties can be tuned by purely optical means. This makes the BDG a valuable tool in microwave photonics (MWP). BDGs have been studied extensively in fibers, but the only observation in an integrated platform required exotic materials. Unlocking BDG in a standard and mature platform will enable its integration into large-scale circuits. Here we demonstrate the first observation of a BDG in a silicon nitride (Si$_3$N$_4$) waveguide. We also present a new, optimized design, which will enhance the BDG response of the waveguide, unlocking a path to large-scale integration into MWP circuits. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Journal ref: APL Photonics, Vol. 9, Issue 4, 046105 (2024)

arXiv:2308.03258 [pdf, other]

APBench: A Unified Benchmark for Availability Poisoning Attacks and Defenses

Authors: Tianrui Qin, Xitong Gao, Juanjuan Zhao, Kejiang Ye, Cheng-Zhong Xu

Abstract: The efficacy of availability poisoning, a method of poisoning data by injecting imperceptible perturbations to prevent its use in model training, has been a hot subject of investigation. Previous research suggested that it was difficult to effectively counteract such poisoning attacks. However, the introduction of various defense methods has challenged this notion. Due to the rapid progress in thi… ▽ More The efficacy of availability poisoning, a method of poisoning data by injecting imperceptible perturbations to prevent its use in model training, has been a hot subject of investigation. Previous research suggested that it was difficult to effectively counteract such poisoning attacks. However, the introduction of various defense methods has challenged this notion. Due to the rapid progress in this field, the performance of different novel methods cannot be accurately validated due to variations in experimental setups. To further evaluate the attack and defense capabilities of these poisoning methods, we have developed a benchmark -- APBench for assessing the efficacy of adversarial poisoning. APBench consists of 9 state-of-the-art availability poisoning attacks, 8 defense algorithms, and 4 conventional data augmentation techniques. We also have set up experiments with varying different poisoning ratios, and evaluated the attacks on multiple datasets and their transferability across model architectures. We further conducted a comprehensive evaluation of 2 additional attacks specifically targeting unsupervised models. Our results reveal the glaring inadequacy of existing attacks in safeguarding individual privacy. APBench is open source and available to the deep learning community: https://github.com/lafeat/apbench. △ Less

Submitted 6 August, 2023; originally announced August 2023.

arXiv:2308.02970 [pdf, other]

Resource Management for GPT-based Model Deployed on Clouds: Challenges, Solutions, and Future Directions

Authors: Yongkang Dang, Minxian Xu, Kejiang Ye

Abstract: The widespread adoption of the large language model (LLM), e.g. Generative Pre-trained Transformer (GPT), deployed on cloud computing environment (e.g. Azure) has led to a huge increased demand for resources. This surge in demand poses significant challenges to resource management in clouds. This paper aims to highlight these challenges by first identifying the unique characteristics of resource m… ▽ More The widespread adoption of the large language model (LLM), e.g. Generative Pre-trained Transformer (GPT), deployed on cloud computing environment (e.g. Azure) has led to a huge increased demand for resources. This surge in demand poses significant challenges to resource management in clouds. This paper aims to highlight these challenges by first identifying the unique characteristics of resource management for the GPT-based model. Building upon this understanding, we analyze the specific challenges faced by resource management in the context of GPT-based model deployed on clouds, and propose corresponding potential solutions. To facilitate effective resource management, we introduce a comprehensive resource management framework and present resource scheduling algorithms specifically designed for the GPT-based model. Furthermore, we delve into the future directions for resource management in the GPT-based model, highlighting potential areas for further exploration and improvement. Through this study, we aim to provide valuable insights into resource management for GPT-based models deployed in clouds and promote their sustainable development for GPT-based models and applications. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: 21 pages

arXiv:2308.00923 [pdf, other]

A Novel Lockable Spring-loaded Prismatic Spine to Support Agile Quadrupedal Locomotion

Authors: Keran Ye, Kenneth Chung, Konstantinos Karydis

Abstract: This paper introduces a way to systematically investigate the effect of compliant prismatic spines in quadrupedal robot locomotion. We develop a novel spring-loaded lockable spine module, together with a new Spinal Compliance-Integrated Quadruped (SCIQ) platform for both empirical and numerical research. Individual spine tests reveal beneficial spinal characteristics like a degressive spring, and… ▽ More This paper introduces a way to systematically investigate the effect of compliant prismatic spines in quadrupedal robot locomotion. We develop a novel spring-loaded lockable spine module, together with a new Spinal Compliance-Integrated Quadruped (SCIQ) platform for both empirical and numerical research. Individual spine tests reveal beneficial spinal characteristics like a degressive spring, and validate the efficacy of a proposed compact locking/unlocking mechanism for the spine. Benchmark vertical jumping and landing tests with our robot show comparable jumping performance between the rigid and compliant spines. An observed advantage of the compliant spine module is that it can alleviate more challenging landing conditions by absorbing impact energy and dissipating the remainder via feet slipping through much in cat-like stretching fashion. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: To appear in 2023 IEEE IROS

arXiv:2307.14819 [pdf, other]

High-temperature superconductivity with zero-resistance and strange metal behavior in La$_{3}$Ni$_{2}$O$_{7-δ}$

Authors: Yanan Zhang, Dajun Su, Yanen Huang, Zhaoyang Shan, Hualei Sun, Mengwu Huo, Kaixin Ye, Jiawen Zhang, Zihan Yang, Yongkang Xu, Yi Su, Rui Li, Michael Smidman, Meng Wang, Lin Jiao, Huiqiu Yuan

Abstract: Recently signatures of superconductivity were observed close to 80 K in \LN\ under pressure. This discovery positions \LN\ as the first bulk nickelate with high-temperature superconductivity, but the lack of zero resistance presents a significant drawback for validating the findings. Here we report pressure measurements up to over 30 GPa using a liquid pressure medium and show that single crystals… ▽ More Recently signatures of superconductivity were observed close to 80 K in \LN\ under pressure. This discovery positions \LN\ as the first bulk nickelate with high-temperature superconductivity, but the lack of zero resistance presents a significant drawback for validating the findings. Here we report pressure measurements up to over 30 GPa using a liquid pressure medium and show that single crystals of \LNO\ do exhibit zero resistance. We find that \LNO\ remains metallic under applied pressures, suggesting the absence of a metal-insulator transition proximate to the superconductivity. Analysis of the normal state $T$-linear resistance suggests an intricate link between this strange metal behaviour and superconductivity, whereby at high pressures both the linear resistance coefficient and superconducting transition are slowly suppressed by pressure, while at intermediate pressures both the superconductivity and strange metal behaviour appear disrupted, possibly due to a nearby structural instability. The association between strange metal behaviour and high-temperature superconductivity is very much in line with diverse classes of unconventional superconductors, including the cuprates and Fe-based superconductors. Understanding the superconductivity of \LNO\ evidently requires further revealing the interplay of strange metal behaviour, superconductivity, as well as possible competing electronic or structural phases. △ Less

Submitted 18 April, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

Comments: 28 pages, 4+8 figures, including Extended Data Files

arXiv:2307.12814 [pdf, other]

Stimulated Brillouin scattering in tellurite-covered silicon nitride waveguides

Authors: Roel A. Botter, Yvan Klaver, Randy te Morsche, Bruno L. Segat Frare, Batoul Hashemi, Kaixuan Ye, Akhileshwar Mishra, Redlef B. G. Braamhaar, Jonathan D. B. Bradley, David Marpaung

Abstract: Stimulated Brillouin scattering (SBS), a coherent nonlinear effect coupling acoustics and optics, can be used in a wide range of applications such as Brillouin lasers and tunable narrowband RF filtering. Wide adoption of such technologies however, would need a balance of strong Brillouin interaction and low optical loss in a structure compatible with large scale fabrication. Achieving these charac… ▽ More Stimulated Brillouin scattering (SBS), a coherent nonlinear effect coupling acoustics and optics, can be used in a wide range of applications such as Brillouin lasers and tunable narrowband RF filtering. Wide adoption of such technologies however, would need a balance of strong Brillouin interaction and low optical loss in a structure compatible with large scale fabrication. Achieving these characteristics in scalable platforms such as silicon and silicon nitride remains a challenge. Here, we investigate a scalable Brillouin platform combining low loss Si$_3$N$_4$ and tellurium oxide (TeO$_2$) exhibiting strong Brillouin response and enhanced acoustic confinement. In this platform we measure a Brillouin gain coefficient of 8.5~m$^{-1}$W$^{-1}$, exhibiting a twenty fold improvement over the largest previously reported Brillouin gain in a Si$_3$N$_4$ platform. Further, we demonstrate cladding engineering to control the strength of the Brillouin interaction. We utilized the Brillouin gain and loss resonances in this waveguide for an RF photonic filter with more than 15 dB rejection and 250 MHz linewidth. Finally, we present a pathway by geometric optimization and cladding engineering to a further enhancement of the gain coefficient to 155~m$^{-1}$W$^{-1}$, a potential 400 times increase in the Brillouin gain coefficient. △ Less

Submitted 24 July, 2023; originally announced July 2023.

arXiv:2305.13777 [pdf, other]

VisorGPT: Learning Visual Prior via Generative Pre-Training

Authors: Jinheng Xie, Kai Ye, Yudong Li, Yuexiang Li, Kevin Qinghong Lin, Yefeng Zheng, Linlin Shen, Mike Zheng Shou

Abstract: Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic re… ▽ More Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VisorGPT. By discretizing visual locations of objects, e.g., bounding boxes, human pose, and instance masks, into sequences, VisorGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate that VisorGPT can effectively model the visual prior, which can be employed for many vision tasks, such as customizing accurate human pose for conditional image synthesis models like ControlNet. Code will be released at https://github.com/Sierkinhane/VisorGPT. △ Less

Submitted 30 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Project web-page: https://sierkinhane.github.io/visor-gpt/

Showing 1–50 of 170 results for author: Ye, K