subscribe to arXiv mailings

CILP: Co-simulation based Imitation Learner for Dynamic Resource Provisioning in Cloud Computing Environments

Authors: Shreshth Tuli, Giuliano Casale, Nicholas R. Jennings

Abstract: Intelligent Virtual Machine (VM) provisioning is central to cost and resource efficient computation in cloud computing environments. As bootstrapping VMs is time-consuming, a key challenge for latency-critical tasks is to predict future workload demands to provision VMs proactively. However, existing AI-based solutions tend to not holistically consider all crucial aspects such as provisioning over… ▽ More Intelligent Virtual Machine (VM) provisioning is central to cost and resource efficient computation in cloud computing environments. As bootstrapping VMs is time-consuming, a key challenge for latency-critical tasks is to predict future workload demands to provision VMs proactively. However, existing AI-based solutions tend to not holistically consider all crucial aspects such as provisioning overheads, heterogeneous VM costs and Quality of Service (QoS) of the cloud system. To address this, we propose a novel method, called CILP, that formulates the VM provisioning problem as two sub-problems of prediction and optimization, where the provisioning plan is optimized based on predicted workload demands. CILP leverages a neural network as a surrogate model to predict future workload demands with a co-simulated digital-twin of the infrastructure to compute QoS scores. We extend the neural network to also act as an imitation learner that dynamically decides the optimal VM provisioning plan. A transformer based neural model reduces training and inference overheads while our novel two-phase decision making loop facilitates in making informed provisioning decisions. Crucially, we address limitations of prior work by including resource utilization, deployment costs and provisioning overheads to inform the provisioning decisions in our imitation learning framework. Experiments with three public benchmarks demonstrate that CILP gives up to 22% higher resource utilization, 14% higher QoS scores and 44% lower execution costs compared to the current online and offline optimization based state-of-the-art methods. △ Less

Submitted 16 April, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

Comments: Accepted in IEEE Transactions on Network and Service Management

arXiv:2212.01302 [pdf, other]

DeepFT: Fault-Tolerant Edge Computing using a Self-Supervised Deep Surrogate Model

Authors: Shreshth Tuli, Giuliano Casale, Ludmila Cherkasova, Nicholas R. Jennings

Abstract: The emergence of latency-critical AI applications has been supported by the evolution of the edge computing paradigm. However, edge solutions are typically resource-constrained, posing reliability challenges due to heightened contention for compute and communication capacities and faulty application behavior in the presence of overload conditions. Although a large amount of generated log data can… ▽ More The emergence of latency-critical AI applications has been supported by the evolution of the edge computing paradigm. However, edge solutions are typically resource-constrained, posing reliability challenges due to heightened contention for compute and communication capacities and faulty application behavior in the presence of overload conditions. Although a large amount of generated log data can be mined for fault prediction, labeling this data for training is a manual process and thus a limiting factor for automation. Due to this, many companies resort to unsupervised fault-tolerance models. Yet, failure models of this kind can incur a loss of accuracy when they need to adapt to non-stationary workloads and diverse host characteristics. To cope with this, we propose a novel modeling approach, called DeepFT, to proactively avoid system overloads and their adverse effects by optimizing the task scheduling and migration decisions. DeepFT uses a deep surrogate model to accurately predict and diagnose faults in the system and co-simulation based self-supervised learning to dynamically adapt the model in volatile settings. It offers a highly scalable solution as the model size scales by only 3 and 1 percent per unit increase in the number of active tasks and hosts. Extensive experimentation on a Raspberry-Pi based edge cluster with DeFog benchmarks shows that DeepFT can outperform state-of-the-art baseline methods in fault-detection and QoS metrics. Specifically, DeepFT gives the highest F1 scores for fault-detection, reducing service deadline violations by up to 37\% while also improving response time by up to 9%. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: Accepted in IEEE INFOCOM 2023

arXiv:2210.04595 [pdf, other]

SampleHST: Efficient On-the-Fly Selection of Distributed Traces

Authors: Alim Ul Gias, Yicheng Gao, Matthew Sheldon, José A. Perusquía, Owen O'Brien, Giuliano Casale

Abstract: Since only a small number of traces generated from distributed tracing helps in troubleshooting, its storage requirement can be significantly reduced by biasing the selection towards anomalous traces. To aid in this scenario, we propose SampleHST, a novel approach to sample on-the-fly from a stream of traces in an unsupervised manner. SampleHST adjusts the storage quota of normal and anomalous tra… ▽ More Since only a small number of traces generated from distributed tracing helps in troubleshooting, its storage requirement can be significantly reduced by biasing the selection towards anomalous traces. To aid in this scenario, we propose SampleHST, a novel approach to sample on-the-fly from a stream of traces in an unsupervised manner. SampleHST adjusts the storage quota of normal and anomalous traces depending on the size of its budget. Initially, it utilizes a forest of Half Space Trees (HSTs) for trace scoring. This is based on the distribution of the mass scores across the trees, which characterizes the probability of observing different traces. The mass distribution from HSTs is subsequently used to cluster the traces online leveraging a variant of the mean-shift algorithm. This trace-cluster association eventually drives the sampling decision. We have compared the performance of SampleHST with a recently suggested method using data from a cloud data center and demonstrated that SampleHST improves sampling performance up to by 9.5x. △ Less

Submitted 9 September, 2022; originally announced October 2022.

Comments: 10 pages, 5 figures

arXiv:2208.07658 [pdf, other]

DRAGON: Decentralized Fault Tolerance in Edge Federations

Authors: Shreshth Tuli, Giuliano Casale, Nicholas R. Jennings

Abstract: Edge Federation is a new computing paradigm that seamlessly interconnects the resources of multiple edge service providers. A key challenge in such systems is the deployment of latency-critical and AI based resource-intensive applications in constrained devices. To address this challenge, we propose a novel memory-efficient deep learning based model, namely generative optimization networks (GON).… ▽ More Edge Federation is a new computing paradigm that seamlessly interconnects the resources of multiple edge service providers. A key challenge in such systems is the deployment of latency-critical and AI based resource-intensive applications in constrained devices. To address this challenge, we propose a novel memory-efficient deep learning based model, namely generative optimization networks (GON). Unlike GANs, GONs use a single network to both discriminate input and generate samples, significantly reducing their memory footprint. Leveraging the low memory footprint of GONs, we propose a decentralized fault-tolerance method called DRAGON that runs simulations (as per a digital modeling twin) to quickly predict and optimize the performance of the edge federation. Extensive experiments with real-world edge computing benchmarks on multiple Raspberry-Pi based federated edge configurations show that DRAGON can outperform the baseline methods in fault-detection and Quality of Service (QoS) metrics. Specifically, the proposed method gives higher F1 scores for fault-detection than the best deep learning (DL) method, while consuming lower memory than the heuristic methods. This allows for improvement in energy consumption, response time and service level agreement violations by up to 74, 63 and 82 percent, respectively. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: Accepted in IEEE Transactions on Network and Service Management (TNSM)

arXiv:2208.00761 [pdf, other]

AI Augmented Edge and Fog Computing: Trends and Challenges

Authors: Shreshth Tuli, Fatemeh Mirhakimi, Samodha Pallewatta, Syed Zawad, Giuliano Casale, Bahman Javadi, Feng Yan, Rajkumar Buyya, Nicholas R. Jennings

Abstract: In recent years, the landscape of computing paradigms has witnessed a gradual yet remarkable shift from monolithic computing to distributed and decentralized paradigms such as Internet of Things (IoT), Edge, Fog, Cloud, and Serverless. The frontiers of these computing technologies have been boosted by shift from manually encoded algorithms to Artificial Intelligence (AI)-driven autonomous systems… ▽ More In recent years, the landscape of computing paradigms has witnessed a gradual yet remarkable shift from monolithic computing to distributed and decentralized paradigms such as Internet of Things (IoT), Edge, Fog, Cloud, and Serverless. The frontiers of these computing technologies have been boosted by shift from manually encoded algorithms to Artificial Intelligence (AI)-driven autonomous systems for optimum and reliable management of distributed computing resources. Prior work focuses on improving existing systems using AI across a wide range of domains, such as efficient resource provisioning, application deployment, task placement, and service management. This survey reviews the evolution of data-driven AI-augmented technologies and their impact on computing systems. We demystify new techniques and draw key insights in Edge, Fog and Cloud resource management-related uses of AI methods and also look at how AI can innovate traditional applications for enhanced Quality of Service (QoS) in the presence of a continuum of resources. We present the latest trends and impact areas such as optimizing AI models that are deployed on or for computing systems. We layout a roadmap for future research directions in areas such as resource management for QoS optimization and service reliability. Finally, we discuss blue-sky ideas and envision this work as an anchor point for future research on AI-driven computing systems. △ Less

Submitted 14 April, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

Comments: Accepted in Elsevier Journal of Network and Computer Applications

arXiv:2205.10642 [pdf, other]

MetaNet: Automated Dynamic Selection of Scheduling Policies in Cloud Environments

Authors: Shreshth Tuli, Giuliano Casale, Nicholas R. Jennings

Abstract: Task scheduling is a well-studied problem in the context of optimizing the Quality of Service (QoS) of cloud computing environments. In order to sustain the rapid growth of computational demands, one of the most important QoS metrics for cloud schedulers is the execution cost. In this regard, several data-driven deep neural networks (DNNs) based schedulers have been proposed in recent years to all… ▽ More Task scheduling is a well-studied problem in the context of optimizing the Quality of Service (QoS) of cloud computing environments. In order to sustain the rapid growth of computational demands, one of the most important QoS metrics for cloud schedulers is the execution cost. In this regard, several data-driven deep neural networks (DNNs) based schedulers have been proposed in recent years to allow scalable and efficient resource management in dynamic workload settings. However, optimal scheduling frequently relies on sophisticated DNNs with high computational needs implying higher execution costs. Further, even in non-stationary environments, sophisticated schedulers might not always be required and we could briefly rely on low-cost schedulers in the interest of cost-efficiency. Therefore, this work aims to solve the non-trivial meta problem of online dynamic selection of a scheduling policy using a surrogate model called MetaNet. Unlike traditional solutions with a fixed scheduling policy, MetaNet on-the-fly chooses a scheduler from a large set of DNN based methods to optimize task scheduling and execution costs in tandem. Compared to state-of-the-art DNN schedulers, this allows for improvement in execution costs, energy consumption, response time and service level agreement violations by up to 11, 43, 8 and 13 percent, respectively. △ Less

Submitted 21 May, 2022; originally announced May 2022.

Comments: Accepted in IEEE CLOUD 2022

arXiv:2205.10640 [pdf, other]

Learning to Dynamically Select Cost Optimal Schedulers in Cloud Computing Environments

Authors: Shreshth Tuli, Giuliano Casale, Nicholas R. Jennings

Abstract: The operational cost of a cloud computing platform is one of the most significant Quality of Service (QoS) criteria for schedulers, crucial to keep up with the growing computational demands. Several data-driven deep neural network (DNN)-based schedulers have been proposed in recent years that outperform alternative approaches by providing scalable and effective resource management for dynamic work… ▽ More The operational cost of a cloud computing platform is one of the most significant Quality of Service (QoS) criteria for schedulers, crucial to keep up with the growing computational demands. Several data-driven deep neural network (DNN)-based schedulers have been proposed in recent years that outperform alternative approaches by providing scalable and effective resource management for dynamic workloads. However, state-of-the-art schedulers rely on advanced DNNs with high computational requirements, implying high scheduling costs. In non-stationary contexts, the most sophisticated schedulers may not always be required, and it may be sufficient to rely on low-cost schedulers to temporarily save operational costs. In this work, we propose MetaNet, a surrogate model that predicts the operational costs and scheduling overheads of a large number of DNN-based schedulers and chooses one on-the-fly to jointly optimize job scheduling and execution costs. This facilitates improvements in execution costs, energy usage and service level agreement violations of up to 11%, 43% and 13% compared to the state-of-the-art methods. △ Less

Submitted 21 May, 2022; originally announced May 2022.

Comments: Accepted as a poster in SIGMETRICS 2022

arXiv:2205.10635 [pdf, other]

SplitPlace: AI Augmented Splitting and Placement of Large-Scale Neural Networks in Mobile Edge Environments

Authors: Shreshth Tuli, Giuliano Casale, Nicholas R. Jennings

Abstract: In recent years, deep learning models have become ubiquitous in industry and academia alike. Deep neural networks can solve some of the most complex pattern-recognition problems today, but come with the price of massive compute and memory requirements. This makes the problem of deploying such large-scale neural networks challenging in resource-constrained mobile edge computing platforms, specifica… ▽ More In recent years, deep learning models have become ubiquitous in industry and academia alike. Deep neural networks can solve some of the most complex pattern-recognition problems today, but come with the price of massive compute and memory requirements. This makes the problem of deploying such large-scale neural networks challenging in resource-constrained mobile edge computing platforms, specifically in mission-critical domains like surveillance and healthcare. To solve this, a promising solution is to split resource-hungry neural networks into lightweight disjoint smaller components for pipelined distributed processing. At present, there are two main approaches to do this: semantic and layer-wise splitting. The former partitions a neural network into parallel disjoint models that produce a part of the result, whereas the latter partitions into sequential models that produce intermediate results. However, there is no intelligent algorithm that decides which splitting strategy to use and places such modular splits to edge nodes for optimal performance. To combat this, this work proposes a novel AI-driven online policy, SplitPlace, that uses Multi-Armed-Bandits to intelligently decide between layer and semantic splitting strategies based on the input task's service deadline demands. SplitPlace places such neural network split fragments on mobile edge devices using decision-aware reinforcement learning for efficient and scalable computing. Moreover, SplitPlace fine-tunes its placement engine to adapt to volatile environments. Our experiments on physical mobile-edge environments with real-world workloads show that SplitPlace can significantly improve the state-of-the-art in terms of average response time, deadline violation rate, inference accuracy, and total reward by up to 46, 69, 3 and 12 percent respectively. △ Less

Submitted 21 May, 2022; originally announced May 2022.

Comments: Accepted in IEEE Transactions on Mobile Computing

arXiv:2205.04575 [pdf, other]

JCSP: Joint Caching and Service Placement for Edge Computing Systems

Authors: Yicheng Gao, Giuliano Casale

Abstract: With constrained resources, what, where, and how to cache at the edge is one of the key challenges for edge computing systems. The cached items include not only the application data contents but also the local caching of edge services that handle incoming requests. However, current systems separate the contents and services without considering the latency interplay of caching and queueing. Therefo… ▽ More With constrained resources, what, where, and how to cache at the edge is one of the key challenges for edge computing systems. The cached items include not only the application data contents but also the local caching of edge services that handle incoming requests. However, current systems separate the contents and services without considering the latency interplay of caching and queueing. Therefore, in this paper, we propose a novel class of stochastic models that enable the optimization of content caching and service placement decisions jointly. We first explain how to apply layered queueing networks (LQNs) models for edge service placement and show that combining this with genetic algorithms provides higher accuracy in resource allocation than an established baseline. Next, we extend LQNs with caching components to establish a joint modeling method for content caching and service placement (JCSP) and present analytical methods to analyze the resulting model. Finally, we simulate real-world Azure traces to evaluate the JCSP method and find that JCSP achieves up to 35% improvement in response time and 500MB reduction in memory usage than baseline heuristics for edge caching resource allocation. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2203.07140 [pdf, other]

CAROL: Confidence-Aware Resilience Model for Edge Federations

Authors: Shreshth Tuli, Giuliano Casale, Nicholas R. Jennings

Abstract: In recent years, the deployment of large-scale Internet of Things (IoT) applications has given rise to edge federations that seamlessly interconnect and leverage resources from multiple edge service providers. The requirement of supporting both latency-sensitive and compute-intensive IoT tasks necessitates service resilience, especially for the broker nodes in typical broker-worker deployment desi… ▽ More In recent years, the deployment of large-scale Internet of Things (IoT) applications has given rise to edge federations that seamlessly interconnect and leverage resources from multiple edge service providers. The requirement of supporting both latency-sensitive and compute-intensive IoT tasks necessitates service resilience, especially for the broker nodes in typical broker-worker deployment designs. Existing fault-tolerance or resilience schemes often lack robustness and generalization capability in non-stationary workload settings. This is typically due to the expensive periodic fine-tuning of models required to adapt them in dynamic scenarios. To address this, we present a confidence aware resilience model, CAROL, that utilizes a memory-efficient generative neural network to predict the Quality of Service (QoS) for a future state and a confidence score for each prediction. Thus, whenever a broker fails, we quickly recover the system by executing a local-search over the broker-worker topology space and optimize future QoS. The confidence score enables us to keep track of the prediction performance and run parsimonious neural network fine-tuning to avoid excessive overheads, further improving the QoS of the system. Experiments on a Raspberry-Pi based edge testbed with IoT benchmark applications show that CAROL outperforms state-of-the-art resilience schemes by reducing the energy consumption, deadline violation rates and resilience overheads by up to 16, 17 and 36 percent, respectively. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: Accepted in DSN 2022

arXiv:2201.07284 [pdf, other]

TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data

Authors: Shreshth Tuli, Giuliano Casale, Nicholas R. Jennings

Abstract: Efficient anomaly detection and diagnosis in multivariate time-series data is of great importance for modern industrial applications. However, building a system that is able to quickly and accurately pinpoint anomalous observations is a challenging problem. This is due to the lack of anomaly labels, high data volatility and the demands of ultra-low inference times in modern applications. Despite t… ▽ More Efficient anomaly detection and diagnosis in multivariate time-series data is of great importance for modern industrial applications. However, building a system that is able to quickly and accurately pinpoint anomalous observations is a challenging problem. This is due to the lack of anomaly labels, high data volatility and the demands of ultra-low inference times in modern applications. Despite the recent developments of deep learning approaches for anomaly detection, only a few of them can address all of these challenges. In this paper, we propose TranAD, a deep transformer network based anomaly detection and diagnosis model which uses attention-based sequence encoders to swiftly perform inference with the knowledge of the broader temporal trends in the data. TranAD uses focus score-based self-conditioning to enable robust multi-modal feature extraction and adversarial training to gain stability. Additionally, model-agnostic meta learning (MAML) allows us to train the model using limited data. Extensive empirical studies on six publicly available datasets demonstrate that TranAD can outperform state-of-the-art baseline methods in detection and diagnosis performance with data and time-efficient training. Specifically, TranAD increases F1 scores by up to 17%, reducing training times by up to 99% compared to the baselines. △ Less

Submitted 14 May, 2022; v1 submitted 18 January, 2022; originally announced January 2022.

Comments: Accepted in VLDB 2022

arXiv:2112.08916 [pdf, other]

GOSH: Task Scheduling Using Deep Surrogate Models in Fog Computing Environments

Authors: Shreshth Tuli, Giuliano Casale, Nicholas R. Jennings

Abstract: Recently, intelligent scheduling approaches using surrogate models have been proposed to efficiently allocate volatile tasks in heterogeneous fog environments. Advances like deterministic surrogate models, deep neural networks (DNN) and gradient-based optimization allow low energy consumption and response times to be reached. However, deterministic surrogate models, which estimate objective values… ▽ More Recently, intelligent scheduling approaches using surrogate models have been proposed to efficiently allocate volatile tasks in heterogeneous fog environments. Advances like deterministic surrogate models, deep neural networks (DNN) and gradient-based optimization allow low energy consumption and response times to be reached. However, deterministic surrogate models, which estimate objective values for optimization, do not consider the uncertainties in the distribution of the Quality of Service (QoS) objective function that can lead to high Service Level Agreement (SLA) violation rates. Moreover, the brittle nature of DNN training and prevent such models from reaching minimal energy or response times. To overcome these difficulties, we present a novel scheduler: GOSH i.e. Gradient Based Optimization using Second Order derivatives and Heteroscedastic Deep Surrogate Models. GOSH uses a second-order gradient based optimization approach to obtain better QoS and reduce the number of iterations to converge to a scheduling decision, subsequently lowering the scheduling time. Instead of a vanilla DNN, GOSH uses a Natural Parameter Network to approximate objective scores. Further, a Lower Confidence Bound optimization approach allows GOSH to find an optimal trade-off between greedy minimization of the mean latency and uncertainty reduction by employing error-based exploration. Thus, GOSH and its co-simulation based extension GOSH*, can adapt quickly and reach better objective scores than baseline methods. We show that GOSH* reaches better objective scores than GOSH, but it is suitable only for high resource availability settings, whereas GOSH is apt for limited resource settings. Real system experiments for both GOSH and GOSH* show significant improvements against the state-of-the-art in terms of energy consumption, response time and SLA violations by up to 18, 27 and 82 percent, respectively. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: Accepted in IEEE Transactions on Parallel and Distributed Systems (Special Issue on PDC for AI), 2022

arXiv:2112.07269 [pdf, other]

MCDS: AI Augmented Workflow Scheduling in Mobile Edge Cloud Computing Systems

Authors: Shreshth Tuli, Giuliano Casale, Nicholas R. Jennings

Abstract: Workflow scheduling is a long-studied problem in parallel and distributed computing (PDC), aiming to efficiently utilize compute resources to meet user's service requirements. Recently proposed scheduling methods leverage the low response times of edge computing platforms to optimize application Quality of Service (QoS). However, scheduling workflow applications in mobile edge-cloud systems is cha… ▽ More Workflow scheduling is a long-studied problem in parallel and distributed computing (PDC), aiming to efficiently utilize compute resources to meet user's service requirements. Recently proposed scheduling methods leverage the low response times of edge computing platforms to optimize application Quality of Service (QoS). However, scheduling workflow applications in mobile edge-cloud systems is challenging due to computational heterogeneity, changing latencies of mobile devices and the volatile nature of workload resource requirements. To overcome these difficulties, it is essential, but at the same time challenging, to develop a long-sighted optimization scheme that efficiently models the QoS objectives. In this work, we propose MCDS: Monte Carlo Learning using Deep Surrogate Models to efficiently schedule workflow applications in mobile edge-cloud computing systems. MCDS is an Artificial Intelligence (AI) based scheduling approach that uses a tree-based search strategy and a deep neural network-based surrogate model to estimate the long-term QoS impact of immediate actions for robust optimization of scheduling decisions. Experiments on physical and simulated edge-cloud testbeds show that MCDS can improve over the state-of-the-art methods in terms of energy consumption, response time, SLA violations and cost by at least 6.13, 4.56, 45.09 and 30.71 percent respectively. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: Accepted in IEEE Transactions on Parallel and Distributed Systems (Special Issue on PDC for AI), 2022

arXiv:2112.02292 [pdf, other]

PreGAN: Preemptive Migration Prediction Network for Proactive Fault-Tolerant Edge Computing

Authors: Shreshth Tuli, Giuliano Casale, Nicholas R. Jennings

Abstract: Building a fault-tolerant edge system that can quickly react to node overloads or failures is challenging due to the unreliability of edge devices and the strict service deadlines of modern applications. Moreover, unnecessary task migrations can stress the system network, giving rise to the need for a smart and parsimonious failure recovery scheme. Prior approaches often fail to adapt to highly vo… ▽ More Building a fault-tolerant edge system that can quickly react to node overloads or failures is challenging due to the unreliability of edge devices and the strict service deadlines of modern applications. Moreover, unnecessary task migrations can stress the system network, giving rise to the need for a smart and parsimonious failure recovery scheme. Prior approaches often fail to adapt to highly volatile workloads or accurately detect and diagnose faults for optimal remediation. There is thus a need for a robust and proactive fault-tolerance mechanism to meet service level objectives. In this work, we propose PreGAN, a composite AI model using a Generative Adversarial Network (GAN) to predict preemptive migration decisions for proactive fault-tolerance in containerized edge deployments. PreGAN uses co-simulations in tandem with a GAN to learn a few-shot anomaly classifier and proactively predict migration decisions for reliable computing. Extensive experiments on a Raspberry-Pi based edge environment show that PreGAN can outperform state-of-the-art baseline methods in fault-detection, diagnosis and classification, thus achieving high quality of service. PreGAN accomplishes this by 5.1% more accurate fault detection, higher diagnosis scores and 23.8% lower overheads compared to the best method among the considered baselines. △ Less

Submitted 4 December, 2021; originally announced December 2021.

Comments: Accepted in Infocom 2022

arXiv:2111.10241 [pdf, other]

START: Straggler Prediction and Mitigation for Cloud Computing Environments using Encoder LSTM Networks

Authors: Shreshth Tuli, Sukhpal Singh Gill, Peter Garraghan, Rajkumar Buyya, Giuliano Casale, Nicholas R. Jennings

Abstract: Modern large-scale computing systems distribute jobs into multiple smaller tasks which execute in parallel to accelerate job completion rates and reduce energy consumption. However, a common performance problem in such systems is dealing with straggler tasks that are slow running instances that increase the overall response time. Such tasks can significantly impact the system's Quality of Service… ▽ More Modern large-scale computing systems distribute jobs into multiple smaller tasks which execute in parallel to accelerate job completion rates and reduce energy consumption. However, a common performance problem in such systems is dealing with straggler tasks that are slow running instances that increase the overall response time. Such tasks can significantly impact the system's Quality of Service (QoS) and the Service Level Agreements (SLA). To combat this issue, there is a need for automatic straggler detection and mitigation mechanisms that execute jobs without violating the SLA. Prior work typically builds reactive models that focus first on detection and then mitigation of straggler tasks, which leads to delays. Other works use prediction based proactive mechanisms, but ignore heterogeneous host or volatile task characteristics. In this paper, we propose a Straggler Prediction and Mitigation Technique (START) that is able to predict which tasks might be stragglers and dynamically adapt scheduling to achieve lower response times. Our technique analyzes all tasks and hosts based on compute and network resource consumption using an Encoder Long-Short-Term-Memory (LSTM) network. The output of this network is then used to predict and mitigate expected straggler tasks. This reduces the SLA violation rate and execution time without compromising QoS. Specifically, we use the CloudSim toolkit to simulate START in a cloud environment and compare it with state-of-the-art techniques (IGRU-SD, SGC, Dolly, GRASS, NearestFit and Wrangler) in terms of QoS parameters such as energy consumption, execution time, resource contention, CPU utilization and SLA violation rate. Experiments show that START reduces execution time, resource contention, energy and SLA violations by 13%, 11%, 16% and 19%, respectively, compared to the state-of-the-art approaches. △ Less

Submitted 19 November, 2021; originally announced November 2021.

Comments: Accepted in IEEE Transactions on Services Computing, 2021

arXiv:2110.05529 [pdf, other]

doi 10.1016/j.jss.2021.111124

HUNTER: AI based Holistic Resource Management for Sustainable Cloud Computing

Authors: Shreshth Tuli, Sukhpal Singh Gill, Minxian Xu, Peter Garraghan, Rami Bahsoon, Schahram Dustdar, Rizos Sakellariou, Omer Rana, Rajkumar Buyya, Giuliano Casale, Nicholas R. Jennings

Abstract: The worldwide adoption of cloud data centers (CDCs) has given rise to the ubiquitous demand for hosting application services on the cloud. Further, contemporary data-intensive industries have seen a sharp upsurge in the resource requirements of modern applications. This has led to the provisioning of an increased number of cloud servers, giving rise to higher energy consumption and, consequently,… ▽ More The worldwide adoption of cloud data centers (CDCs) has given rise to the ubiquitous demand for hosting application services on the cloud. Further, contemporary data-intensive industries have seen a sharp upsurge in the resource requirements of modern applications. This has led to the provisioning of an increased number of cloud servers, giving rise to higher energy consumption and, consequently, sustainability concerns. Traditional heuristics and reinforcement learning based algorithms for energy-efficient cloud resource management address the scalability and adaptability related challenges to a limited extent. Existing work often fails to capture dependencies across thermal characteristics of hosts, resource consumption of tasks and the corresponding scheduling decisions. This leads to poor scalability and an increase in the compute resource requirements, particularly in environments with non-stationary resource demands. To address these limitations, we propose an artificial intelligence (AI) based holistic resource management technique for sustainable cloud computing called HUNTER. The proposed model formulates the goal of optimizing energy efficiency in data centers as a multi-objective scheduling problem, considering three important models: energy, thermal and cooling. HUNTER utilizes a Gated Graph Convolution Network as a surrogate model for approximating the Quality of Service (QoS) for a system state and generating optimal scheduling decisions. Experiments on simulated and physical cloud environments using the CloudSim toolkit and the COSCO framework show that HUNTER outperforms state-of-the-art baselines in terms of energy consumption, SLA violation, scheduling time, cost and temperature by up to 12, 35, 43, 54 and 3 percent respectively. △ Less

Submitted 28 October, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: Accepted in Elsevier Journal of Systems and Software, 2021

arXiv:2110.02912 [pdf, other]

Generative Optimization Networks for Memory Efficient Data Generation

Authors: Shreshth Tuli, Shikhar Tuli, Giuliano Casale, Nicholas R. Jennings

Abstract: In standard generative deep learning models, such as autoencoders or GANs, the size of the parameter set is proportional to the complexity of the generated data distribution. A significant challenge is to deploy resource-hungry deep learning models in devices with limited memory to prevent system upgrade costs. To combat this, we propose a novel framework called generative optimization networks (G… ▽ More In standard generative deep learning models, such as autoencoders or GANs, the size of the parameter set is proportional to the complexity of the generated data distribution. A significant challenge is to deploy resource-hungry deep learning models in devices with limited memory to prevent system upgrade costs. To combat this, we propose a novel framework called generative optimization networks (GON) that is similar to GANs, but does not use a generator, significantly reducing its memory footprint. GONs use a single discriminator network and run optimization in the input space to generate new data samples, achieving an effective compromise between training time and memory consumption. GONs are most suited for data generation problems in limited memory settings. Here we illustrate their use for the problem of anomaly detection in memory-constrained edge devices arising from attacks or intrusion events. Specifically, we use a GON to calculate a reconstruction-based anomaly score for input time-series windows. Experiments on a Raspberry-Pi testbed with two existing and a new suite of datasets show that our framework gives up to 32% higher detection F1 scores and 58% lower memory consumption, with only 5% higher training overheads compared to the state-of-the-art. △ Less

Submitted 28 October, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: Accepted in NeurIPS 2021 - Workshop on ML for Systems

arXiv:2106.01847 [pdf, other]

Towards Cost-Optimal Policies for DAGs to Utilize IaaS Clouds with Online Learning

Authors: Xiaohu Wu, Han Yu, Giuliano Casale, Guanyu Gao

Abstract: Premier cloud service providers (CSPs) offer two types of purchase options, namely on-demand and spot instances, with time-varying features in availability and price. Users like startups have to operate on a limited budget and similarly others hope to reduce their costs. While interacting with a CSP, central to their concerns is the process of cost-effectively utilizing different purchase options… ▽ More Premier cloud service providers (CSPs) offer two types of purchase options, namely on-demand and spot instances, with time-varying features in availability and price. Users like startups have to operate on a limited budget and similarly others hope to reduce their costs. While interacting with a CSP, central to their concerns is the process of cost-effectively utilizing different purchase options possibly in addition to self-owned instances. A job in data-intensive applications is typically represented by a directed acyclic graph which can further be transformed into a chain of tasks. The key to achieving cost efficiency is determining the allocation of a specific deadline to each task, as well as the allocation of different types of instances to the task. In this paper, we propose a framework that determines the optimal allocation of deadlines to tasks. The framework also features an optimal policy to determine the allocation of spot and on-demand instances in a predefined time window, and a near-optimal policy for allocating self-owned instances. The policies are designed to be parametric to support the usage of online learning to infer the optimal values against the dynamics of cloud markets. Finally, several intuitive heuristics are used as baselines to validate the cost improvement brought by the proposed solutions. We show that the cost improvement over the state-of-the-art is up to 24.87% when spot and on-demand instances are considered and up to 59.05% when self-owned instances are considered. △ Less

Submitted 3 June, 2021; originally announced June 2021.

arXiv:2104.14392 [pdf, other]

doi 10.1109/TPDS.2021.3087349

COSCO: Container Orchestration using Co-Simulation and Gradient Based Optimization for Fog Computing Environments

Authors: Shreshth Tuli, Shivananda Poojara, Satish N. Srirama, Giuliano Casale, Nicholas R. Jennings

Abstract: Intelligent task placement and management of tasks in large-scale fog platforms is challenging due to the highly volatile nature of modern workload applications and sensitive user requirements of low energy consumption and response time. Container orchestration platforms have emerged to alleviate this problem with prior art either using heuristics to quickly reach scheduling decisions or AI driven… ▽ More Intelligent task placement and management of tasks in large-scale fog platforms is challenging due to the highly volatile nature of modern workload applications and sensitive user requirements of low energy consumption and response time. Container orchestration platforms have emerged to alleviate this problem with prior art either using heuristics to quickly reach scheduling decisions or AI driven methods like reinforcement learning and evolutionary approaches to adapt to dynamic scenarios. The former often fail to quickly adapt in highly dynamic environments, whereas the latter have run-times that are slow enough to negatively impact response time. Therefore, there is a need for scheduling policies that are both reactive to work efficiently in volatile environments and have low scheduling overheads. To achieve this, we propose a Gradient Based Optimization Strategy using Back-propagation of gradients with respect to Input (GOBI). Further, we leverage the accuracy of predictive digital-twin models and simulation capabilities by developing a Coupled Simulation and Container Orchestration Framework (COSCO). Using this, we create a hybrid simulation driven decision approach, GOBI*, to optimize Quality of Service (QoS) parameters. Co-simulation and the back-propagation approaches allow these methods to adapt quickly in volatile environments. Experiments conducted using real-world data on fog applications using the GOBI and GOBI* methods, show a significant improvement in terms of energy consumption, response time, Service Level Objective and scheduling time by up to 15, 40, 4, and 82 percent respectively when compared to the state-of-the-art algorithms. △ Less

Submitted 9 July, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

Comments: Accepted in IEEE Transactions on Parallel and Distributed Systems, 2021

arXiv:2007.15314 [pdf, other]

Delay and Price Differentiation in Cloud Computing: A Service Model, Supporting Architectures, and Performance

Authors: Xiaohu Wu, Francesco De Pellegrini, Giuliano Casale

Abstract: Many cloud service providers (CSPs) provide on-demand service at a price with a small delay. We propose a QoS-differentiated model where multiple SLAs deliver both on-demand service for latency-critical users and delayed services for delay-tolerant users at lower prices. Two architectures are considered to fulfill SLAs. The first is based on priority queues. The second simply separates servers int… ▽ More Many cloud service providers (CSPs) provide on-demand service at a price with a small delay. We propose a QoS-differentiated model where multiple SLAs deliver both on-demand service for latency-critical users and delayed services for delay-tolerant users at lower prices. Two architectures are considered to fulfill SLAs. The first is based on priority queues. The second simply separates servers into multiple modules, each for one SLA. As an ecosystem, we show that the proposed framework is dominant-strategy incentive compatible. Although the first architecture appears more prevalent in the literature, we prove the superiority of the second architecture, under which we further leverage queueing theory to determine the optimal SLA delays and prices. Finally, the viability of the proposed framework is validated through numerical comparison with the on-demand service and it exhibits a revenue improvement in excess of 200%. Our results can help CSPs design optimal delay-differentiated services and choose appropriate serving architectures. △ Less

Submitted 30 July, 2020; originally announced July 2020.

arXiv:2007.01222 [pdf, other]

COCOA: Cold Start Aware Capacity Planning for Function-as-a-Service Platforms

Authors: Alim Ul Gias, Giuliano Casale

Abstract: Function-as-a-Service (FaaS) is increasingly popular in the software industry due to the implied cost-savings in event-driven workloads and its synergy with DevOps. To size an on-premise FaaS platform, it is important to estimate the required CPU and memory capacity to serve the expected loads. Given the service-level agreements, it is however challenging to take the cold start issue into account… ▽ More Function-as-a-Service (FaaS) is increasingly popular in the software industry due to the implied cost-savings in event-driven workloads and its synergy with DevOps. To size an on-premise FaaS platform, it is important to estimate the required CPU and memory capacity to serve the expected loads. Given the service-level agreements, it is however challenging to take the cold start issue into account during the sizing process. We have investigated the similarity of this problem with the hit rate improvement problem in TTL caches and concluded that solutions for TTL cache, although potentially applicable, lead to over-provisioning in FaaS. Thus, we propose a novel approach, COCOA, to solve this issue. COCOA uses a queueing-based approach to assess the effect of cold starts on FaaS response times. It also considers different memory consumption values depending on whether the function is idle or in execution. Using an event-driven FaaS simulator, FaasSim, we have developed, we show that COCOA can reduce over-provisioning by over 70% in some workloads, while satisfying the service-level agreements. △ Less

Submitted 2 July, 2020; originally announced July 2020.

Comments: 8 pages, 9 figures

arXiv:1902.01321 [pdf, other]

A Framework for Allocating Server Time to Spot and On-demand Services in Cloud Computing

Authors: Xiaohu Wu, Francesco De Pellegrini, Guanyu Gao, Giuliano Casale

Abstract: Cloud computing delivers value to users by facilitating their access to computing capacity in periods when their need arises. An approach is to provide both on-demand and spot services on shared servers. The former allows users to access servers on demand at a fixed price and users occupy different periods of servers. The latter allows users to bid for the remaining unoccupied periods via dynamic… ▽ More Cloud computing delivers value to users by facilitating their access to computing capacity in periods when their need arises. An approach is to provide both on-demand and spot services on shared servers. The former allows users to access servers on demand at a fixed price and users occupy different periods of servers. The latter allows users to bid for the remaining unoccupied periods via dynamic pricing; however, without appropriate design, such periods may be arbitrarily small since on-demand users arrive randomly. This is also the current service model adopted by Amazon Elastic Cloud Compute. In this paper, we provide the first integral framework for sharing the time of servers between on-demand and spot services while optimally pricing spot instances. It guarantees that on-demand users can get served quickly while spot users can stably utilize servers for a properly long period once accepted, which is a key feature to make both on-demand and spot services accessible. Simulation results show that, by complementing the on-demand market with a spot market, a cloud provider can improve revenue by up to 464.7%. The framework is designed under assumptions which are met in real environments. It is a new tool that cloud operators can use to quantify the advantage of a hybrid spot and on-demand service, eventually making the case for operating such service model in their own infrastructures. △ Less

Submitted 1 September, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

arXiv:1807.08673 [pdf, ps, other]

Variational inequalities and mean-field approximations for partially observed systems of queueing networks

Authors: Iker Perez, Giuliano Casale

Abstract: Queueing networks are systems of theoretical interest that find widespread use in the performance evaluation of interconnected resources. In comparison to counterpart models in genetics or mathematical biology, the stochastic (jump) processes induced by queueing networks have distinctive coupling and synchronization properties. This has prevented the derivation of variational approximations for co… ▽ More Queueing networks are systems of theoretical interest that find widespread use in the performance evaluation of interconnected resources. In comparison to counterpart models in genetics or mathematical biology, the stochastic (jump) processes induced by queueing networks have distinctive coupling and synchronization properties. This has prevented the derivation of variational approximations for conditional representations of transient dynamics, which rely on simplifying independence assumptions. Here, we present a model augmentation to a multivariate counting process for interactions across service stations, and we enable the variational evaluation of mean-field measures for partially-observed multi-class networks. We also show that our framework offers an efficient and improved alternative for inference tasks, where existing variational or numerically intensive solutions do not work. △ Less

Submitted 27 June, 2019; v1 submitted 23 July, 2018; originally announced July 2018.

arXiv:1711.09123 [pdf, other]

A Manifesto for Future Generation Cloud Computing: Research Directions for the Next Decade

Authors: Rajkumar Buyya, Satish Narayana Srirama, Giuliano Casale, Rodrigo Calheiros, Yogesh Simmhan, Blesson Varghese, Erol Gelenbe, Bahman Javadi, Luis Miguel Vaquero, Marco A. S. Netto, Adel Nadjaran Toosi, Maria Alejandra Rodriguez, Ignacio M. Llorente, Sabrina De Capitani di Vimercati, Pierangela Samarati, Dejan Milojicic, Carlos Varela, Rami Bahsoon, Marcos Dias de Assuncao, Omer Rana, Wanlei Zhou, Hai Jin, Wolfgang Gentzsch, Albert Y. Zomaya, Haiying Shen

Abstract: The Cloud computing paradigm has revolutionised the computer science horizon during the past decade and has enabled the emergence of computing as the fifth utility. It has captured significant attention of academia, industries, and government bodies. Now, it has emerged as the backbone of modern economy by offering subscription-based services anytime, anywhere following a pay-as-you-go model. This… ▽ More The Cloud computing paradigm has revolutionised the computer science horizon during the past decade and has enabled the emergence of computing as the fifth utility. It has captured significant attention of academia, industries, and government bodies. Now, it has emerged as the backbone of modern economy by offering subscription-based services anytime, anywhere following a pay-as-you-go model. This has instigated (1) shorter establishment times for start-ups, (2) creation of scalable global enterprise applications, (3) better cost-to-value associativity for scientific and high performance computing applications, and (4) different invocation/execution models for pervasive and ubiquitous applications. The recent technological developments and paradigms such as serverless computing, software-defined networking, Internet of Things, and processing at network edge are creating new opportunities for Cloud computing. However, they are also posing several new challenges and creating the need for new approaches and research strategies, as well as the re-evaluation of the models that were developed to address issues such as scalability, elasticity, reliability, security, sustainability, and application models. The proposed manifesto addresses them by identifying the major open challenges in Cloud computing, emerging trends, and impact areas. It then offers research directions for the next decade, thus helping in the realisation of Future Generation Cloud Computing. △ Less

Submitted 24 August, 2018; v1 submitted 24 November, 2017; originally announced November 2017.

Comments: 51 pages, 3 figures

arXiv:1704.05867 [pdf, ps, other]

A note on integrating products of linear forms over the unit simplex

Authors: Giuliano Casale

Abstract: Integrating a product of linear forms over the unit simplex can be done in polynomial time if the number of variables n is fixed (V. Baldoni et al., 2011). In this note, we highlight that this problem is equivalent to obtaining the normalizing constant of state probabilities for a popular class of Markov processes used in queueing network theory. In light of this equivalence, we survey existing co… ▽ More Integrating a product of linear forms over the unit simplex can be done in polynomial time if the number of variables n is fixed (V. Baldoni et al., 2011). In this note, we highlight that this problem is equivalent to obtaining the normalizing constant of state probabilities for a popular class of Markov processes used in queueing network theory. In light of this equivalence, we survey existing computational algorithms developed in queueing theory that can be used for exact integration. For example, under some regularity conditions, queueing theory algorithms can exactly integrate a product of linear forms of total degree N by solving N systems of linear equations. △ Less

Submitted 8 March, 2023; v1 submitted 19 April, 2017; originally announced April 2017.

ACM Class: C.4; G.2

arXiv:1606.06543 [pdf, other]

An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing Systems

Authors: Pooyan Jamshidi, Giuliano Casale

Abstract: Finding optimal configurations for Stream Processing Systems (SPS) is a challenging problem due to the large number of parameters that can influence their performance and the lack of analytical models to anticipate the effect of a change. To tackle this issue, we consider tuning methods where an experimenter is given a limited budget of experiments and needs to carefully allocate this budget to fi… ▽ More Finding optimal configurations for Stream Processing Systems (SPS) is a challenging problem due to the large number of parameters that can influence their performance and the lack of analytical models to anticipate the effect of a change. To tackle this issue, we consider tuning methods where an experimenter is given a limited budget of experiments and needs to carefully allocate this budget to find optimal configurations. We propose in this setting Bayesian Optimization for Configuration Optimization (BO4CO), an auto-tuning algorithm that leverages Gaussian Processes (GPs) to iteratively capture posterior distributions of the configuration spaces and sequentially drive the experimentation. Validation based on Apache Storm demonstrates that our approach locates optimal configurations within a limited experimental budget, with an improvement of SPS performance typically of at least an order of magnitude compared to existing configuration algorithms. △ Less

Submitted 21 June, 2016; originally announced June 2016.

Comments: MASCOTS 2016, code is available at https://github.com/dice-project/DICE-Configuration-BO4CO

arXiv:0902.3065 [pdf, ps, other]

The Multi-Branched Method of Moments for Queueing Networks

Authors: Giuliano Casale

Abstract: We propose a new exact solution algorithm for closed multiclass product-form queueing networks that is several orders of magnitude faster and less memory consuming than established methods for multiclass models, such as the Mean Value Analysis (MVA) algorithm. The technique is an important generalization of the recently proposed Method of Moments (MoM) which, differently from MVA, recursively co… ▽ More We propose a new exact solution algorithm for closed multiclass product-form queueing networks that is several orders of magnitude faster and less memory consuming than established methods for multiclass models, such as the Mean Value Analysis (MVA) algorithm. The technique is an important generalization of the recently proposed Method of Moments (MoM) which, differently from MVA, recursively computes higher-order moments of queue-lengths instead of mean values. The main contribution of this paper is to prove that the information used in the MoM recursion can be increased by considering multiple recursive branches that evaluate models with different number of queues. This reformulation allows to formulate a simpler matrix difference equation which leads to large computational savings with respect to the original MoM recursion. Computational analysis shows several cases where the proposed algorithm is between 1,000 and 10,000 times faster and less memory consuming than the original MoM, thus extending the range of multiclass models where exact solutions are feasible. △ Less

Submitted 18 February, 2009; originally announced February 2009.

ACM Class: C.4

Showing 1–27 of 27 results for author: Casale, G