subscribe to arXiv mailings

CONGO: Compressive Online Gradient Optimization with Application to Microservices Management

Authors: Jeremy Carleton, Prathik Vijaykumar, Divyanshu Saxena, Dheeraj Narasimha, Srinivas Shakkottai, Aditya Akella

Abstract: We address the challenge of online convex optimization where the objective function's gradient exhibits sparsity, indicating that only a small number of dimensions possess non-zero gradients. Our aim is to leverage this sparsity to obtain useful estimates of the objective function's gradient even when the only information available is a limited number of function samples. Our motivation stems from… ▽ More We address the challenge of online convex optimization where the objective function's gradient exhibits sparsity, indicating that only a small number of dimensions possess non-zero gradients. Our aim is to leverage this sparsity to obtain useful estimates of the objective function's gradient even when the only information available is a limited number of function samples. Our motivation stems from distributed queueing systems like microservices-based applications, characterized by request-response workloads. Here, each request type proceeds through a sequence of microservices to produce a response, and the resource allocation across the collection of microservices is controlled to balance end-to-end latency with resource costs. While the number of microservices is substantial, the latency function primarily reacts to resource changes in a few, rendering the gradient sparse. Our proposed method, CONGO (Compressive Online Gradient Optimization), combines simultaneous perturbation with compressive sensing to estimate gradients. We establish analytical bounds on the requisite number of compressive sensing samples per iteration to maintain bounded bias of gradient estimates, ensuring sub-linear regret. By exploiting sparsity, we reduce the samples required per iteration to match the gradient's sparsity, rather than the problem's original dimensionality. Numerical experiments and real-world microservices benchmarks demonstrate CONGO's superiority over multiple stochastic gradient descent approaches, as it quickly converges to performance comparable to policies pre-trained with workload awareness. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 28 pages, 7 figures

arXiv:2405.19213 [pdf, other]

HawkVision: Low-Latency Modeless Edge AI Serving

Authors: ChonLam Lao, Jiaqi Gao, Ganesh Ananthanarayanan, Aditya Akella, Minlan Yu

Abstract: The trend of modeless ML inference is increasingly growing in popularity as it hides the complexity of model inference from users and caters to diverse user and application accuracy requirements. Previous work mostly focuses on modeless inference in data centers. To provide low-latency inference, in this paper, we promote modeless inference at the edge. The edge environment introduces additional c… ▽ More The trend of modeless ML inference is increasingly growing in popularity as it hides the complexity of model inference from users and caters to diverse user and application accuracy requirements. Previous work mostly focuses on modeless inference in data centers. To provide low-latency inference, in this paper, we promote modeless inference at the edge. The edge environment introduces additional challenges related to low power consumption, limited device memory, and volatile network environments. To address these challenges, we propose HawkVision, which provides low-latency modeless serving of vision DNNs. HawkVision leverages a two-layer edge-DC architecture that employs confidence scaling to reduce the number of model options while meeting diverse accuracy requirements. It also supports lossy inference under volatile network environments. Our experimental results show that HawkVision outperforms current serving systems by up to 1.6X in P99 latency for providing modeless service. Our FPGA prototype demonstrates similar performance at certain accuracy levels with up to a 3.34X reduction in power consumption. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.05618 [pdf, other]

An Automatic Prompt Generation System for Tabular Data Tasks

Authors: Ashlesha Akella, Abhijit Manatkar, Brij Chavda, Hima Patel

Abstract: Efficient processing of tabular data is important in various industries, especially when working with datasets containing a large number of columns. Large language models (LLMs) have demonstrated their ability on several tasks through carefully crafted prompts. However, creating effective prompts for tabular datasets is challenging due to the structured nature of the data and the need to manage nu… ▽ More Efficient processing of tabular data is important in various industries, especially when working with datasets containing a large number of columns. Large language models (LLMs) have demonstrated their ability on several tasks through carefully crafted prompts. However, creating effective prompts for tabular datasets is challenging due to the structured nature of the data and the need to manage numerous columns. This paper presents an innovative auto-prompt generation system suitable for multiple LLMs, with minimal training. It proposes two novel methods; 1) A Reinforcement Learning-based algorithm for identifying and sequencing task-relevant columns 2) Cell-level similarity-based approach for enhancing few-shot example selection. Our approach has been extensively tested across 66 datasets, demonstrating improved performance in three downstream tasks: data imputation, error detection, and entity matching using two distinct LLMs; Google flan-t5-xxl and Mixtral 8x7B. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: Accepted to NAACL 2024 Industry Track

arXiv:2404.19093 [pdf, other]

Large Language Models as Conversational Movie Recommenders: A User Study

Authors: Ruixuan Sun, Xinyi Li, Avinash Akella, Joseph A. Konstan

Abstract: This paper explores the effectiveness of using large language models (LLMs) for personalized movie recommendations from users' perspectives in an online field experiment. Our study involves a combination of between-subject prompt and historic consumption assessments, along with within-subject recommendation scenario evaluations. By examining conversation and survey response data from 160 active us… ▽ More This paper explores the effectiveness of using large language models (LLMs) for personalized movie recommendations from users' perspectives in an online field experiment. Our study involves a combination of between-subject prompt and historic consumption assessments, along with within-subject recommendation scenario evaluations. By examining conversation and survey response data from 160 active users, we find that LLMs offer strong recommendation explainability but lack overall personalization, diversity, and user trust. Our results also indicate that different personalized prompting techniques do not significantly affect user-perceived recommendation quality, but the number of movies a user has watched plays a more significant role. Furthermore, LLMs show a greater ability to recommend lesser-known or niche movies. Through qualitative analysis, we identify key conversational patterns linked to positive and negative user interaction experiences and conclude that providing personal context and examples is crucial for obtaining high-quality recommendations from LLMs. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18322 [pdf, other]

BlockLLM: Multi-tenant Finer-grained Serving for Large Language Models

Authors: Jiamin Li, Le Xu, Hong Xu, Aditya Akella

Abstract: The growing demand for Large Language Models (LLMs) across diverse applications has prompted a paradigm shift in the design of deep learning serving systems. Deploying LLMs, especially in multi-tenant environments, presents considerable challenges due to their high computational and memory demands. We present BlockLLM, a serving system that exploits the potential of sharing components among fine-t… ▽ More The growing demand for Large Language Models (LLMs) across diverse applications has prompted a paradigm shift in the design of deep learning serving systems. Deploying LLMs, especially in multi-tenant environments, presents considerable challenges due to their high computational and memory demands. We present BlockLLM, a serving system that exploits the potential of sharing components among fine-tuned LLM models to offer an efficient and flexible solution for LLM workloads. BlockLLM partitions the models into finer-grained blocks to enable the reuse of model components and independent provisioning to improve the computation efficiency. BlockLLM consists of an offline block zoo, for storing the blocks, and an online system to serve the requests through chains of blocks. It offers multi-fold flexibility: (1) Adaptive assembly of block chains on-the-fly is achieved with the help of equivalence evaluation among blocks in the zoo. (2) We enable per-block batch size and configure best-effort KV cache coordination at individual block level. (3) We adopt speculative execution and locality-aware block placement to mitigate the communication costs from dynamic block resource allocation. Our evaluation demonstrates that BlockLLM reduces memory and storage footprints and improves computation efficiency, outperforming existing serving approach in 95\%ile latency and GPU utilization by 33.5\% and 20.1\%, respectively. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.03865 [pdf, other]

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping

Authors: Ajay Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Shiwei Liu, Tianlong Chen, Aditya Akella

Abstract: Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges for autoregressive token-by-token generation. To mitigate computation overload incurred during generation, several early-exit and layer… ▽ More Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges for autoregressive token-by-token generation. To mitigate computation overload incurred during generation, several early-exit and layer-dropping strategies have been proposed. Despite some promising success due to the redundancy across LLMs layers on metrics like Rough-L/BLUE, our careful knowledge-intensive evaluation unveils issues such as generation collapse, hallucination of wrong facts, and noticeable performance drop even at the trivial exit ratio of 10-15% of layers. We attribute these errors primarily to ineffective handling of the KV cache through state copying during early-exit. In this work, we observed the saturation of computationally expensive feed-forward blocks of LLM layers and proposed FFN-SkipLLM, which is a novel fine-grained skip strategy of autoregressive LLMs. More specifically, FFN-SkipLLM is an input-adaptive feed-forward skipping strategy that can skip 25-30% of FFN blocks of LLMs with marginal change in performance on knowledge-intensive generation tasks without any requirement to handle KV cache. Our extensive experiments and ablation across benchmarks like MT-Bench, Factoid-QA, and variable-length text summarization illustrate how our simple and ease-at-use method can facilitate faster autoregressive decoding. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.01382

arXiv:2402.07529 [pdf, other]

Accelerating Distributed Deep Learning using Lossless Homomorphic Compression

Authors: Haoyu Li, Yuchen Xu, Jiayi Chen, Rohit Dwivedula, Wenfei Wu, Keqiang He, Aditya Akella, Daehyeok Kim

Abstract: As deep neural networks (DNNs) grow in complexity and size, the resultant increase in communication overhead during distributed training has become a significant bottleneck, challenging the scalability of distributed training systems. Existing solutions, while aiming to mitigate this bottleneck through worker-level compression and in-network aggregation, fall short due to their inability to effici… ▽ More As deep neural networks (DNNs) grow in complexity and size, the resultant increase in communication overhead during distributed training has become a significant bottleneck, challenging the scalability of distributed training systems. Existing solutions, while aiming to mitigate this bottleneck through worker-level compression and in-network aggregation, fall short due to their inability to efficiently reconcile the trade-offs between compression effectiveness and computational overhead, hindering overall performance and scalability. In this paper, we introduce a novel compression algorithm that effectively merges worker-level compression with in-network aggregation. Our solution is both homomorphic, allowing for efficient in-network aggregation without CPU/GPU processing, and lossless, ensuring no compromise on training accuracy. Theoretically optimal in compression and computational efficiency, our approach is empirically validated across diverse DNN models such as NCF, LSTM, VGG19, and BERT-base, showing up to a 6.33$\times$ improvement in aggregation throughput and a 3.74$\times$ increase in per-iteration training speed. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.11632 [pdf, other]

What Are We Optimizing For? A Human-centric Evaluation of Deep Learning-based Movie Recommenders

Authors: Ruixuan Sun, Xinyi Wu, Avinash Akella, Ruoyan Kong, Bart Knijnenburg, Joseph A. Konstan

Abstract: In the past decade, deep learning (DL) models have gained prominence for their exceptional accuracy on benchmark datasets in recommender systems (RecSys). However, their evaluation has primarily relied on offline metrics, overlooking direct user perception and experience. To address this gap, we conduct a human-centric evaluation case study of four leading DL-RecSys models in the movie domain. We… ▽ More In the past decade, deep learning (DL) models have gained prominence for their exceptional accuracy on benchmark datasets in recommender systems (RecSys). However, their evaluation has primarily relied on offline metrics, overlooking direct user perception and experience. To address this gap, we conduct a human-centric evaluation case study of four leading DL-RecSys models in the movie domain. We test how different DL-RecSys models perform in personalized recommendation generation by conducting survey study with 445 real active users. We find some DL-RecSys models to be superior in recommending novel and unexpected items and weaker in diversity, trustworthiness, transparency, accuracy, and overall user satisfaction compared to classic collaborative filtering (CF) methods. To further explain the reasons behind the underperformance, we apply a comprehensive path analysis. We discover that the lack of diversity and too much serendipity from DL models can negatively impact the consequent perceived transparency and personalization of recommendations. Such a path ultimately leads to lower summative user satisfaction. Qualitatively, we confirm with real user quotes that accuracy plus at least one other attribute is necessary to ensure a good user experience, while their demands for transparency and trust can not be neglected. Based on our findings, we discuss future human-centric DL-RecSys design and optimization strategies. △ Less

Submitted 1 May, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

arXiv:2312.07813 [pdf, other]

On a Foundation Model for Operating Systems

Authors: Divyanshu Saxena, Nihal Sharma, Donghyun Kim, Rohit Dwivedula, Jiayi Chen, Chenxi Yang, Sriram Ravula, Zichao Hu, Aditya Akella, Sebastian Angel, Joydeep Biswas, Swarat Chaudhuri, Isil Dillig, Alex Dimakis, P. Brighten Godfrey, Daehyeok Kim, Chris Rossbach, Gang Wang

Abstract: This paper lays down the research agenda for a domain-specific foundation model for operating systems (OSes). Our case for a foundation model revolves around the observations that several OS components such as CPU, memory, and network subsystems are interrelated and that OS traces offer the ideal dataset for a foundation model to grasp the intricacies of diverse OS components and their behavior in… ▽ More This paper lays down the research agenda for a domain-specific foundation model for operating systems (OSes). Our case for a foundation model revolves around the observations that several OS components such as CPU, memory, and network subsystems are interrelated and that OS traces offer the ideal dataset for a foundation model to grasp the intricacies of diverse OS components and their behavior in varying environments and workloads. We discuss a wide range of possibilities that then arise, from employing foundation models as policy agents to utilizing them as generators and predictors to assist traditional OS control algorithms. Our hope is that this paper spurs further research into OS foundation models and creating the next generation of operating systems for the evolving computing landscape. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: Machine Learning for Systems Workshop at 37th NeurIPS Conference, 2023, New Orleans, LA, USA

arXiv:2310.18481 [pdf, other]

MOSEL: Inference Serving Using Dynamic Modality Selection

Authors: Bodun Hu, Le Xu, Jeongyoon Moon, Neeraja J. Yadwadkar, Aditya Akella

Abstract: Rapid advancements over the years have helped machine learning models reach previously hard-to-achieve goals, sometimes even exceeding human capabilities. However, to attain the desired accuracy, the model sizes and in turn their computational requirements have increased drastically. Thus, serving predictions from these models to meet any target latency and cost requirements of applications remain… ▽ More Rapid advancements over the years have helped machine learning models reach previously hard-to-achieve goals, sometimes even exceeding human capabilities. However, to attain the desired accuracy, the model sizes and in turn their computational requirements have increased drastically. Thus, serving predictions from these models to meet any target latency and cost requirements of applications remains a key challenge, despite recent work in building inference-serving systems as well as algorithmic approaches that dynamically adapt models based on inputs. In this paper, we introduce a form of dynamism, modality selection, where we adaptively choose modalities from inference inputs while maintaining the model quality. We introduce MOSEL, an automated inference serving system for multi-modal ML models that carefully picks input modalities per request based on user-defined performance and accuracy requirements. MOSEL exploits modality configurations extensively, improving system throughput by 3.6$\times$ with an accuracy guarantee and shortening job completion times by 11$\times$. △ Less

Submitted 27 October, 2023; originally announced October 2023.

arXiv:2309.13296 [pdf, other]

doi 10.1080/10447318.2023.2262796

Interactive Content Diversity and User Exploration in Online Movie Recommenders: A Field Experiment

Authors: Ruixuan Sun, Avinash Akella, Ruoyan Kong, Moyan Zhou, Joseph A. Konstan

Abstract: Recommender systems often struggle to strike a balance between matching users' tastes and providing unexpected recommendations. When recommendations are too narrow and fail to cover the full range of users' preferences, the system is perceived as useless. Conversely, when the system suggests too many items that users don't like, it is considered impersonal or ineffective. To better understand user… ▽ More Recommender systems often struggle to strike a balance between matching users' tastes and providing unexpected recommendations. When recommendations are too narrow and fail to cover the full range of users' preferences, the system is perceived as useless. Conversely, when the system suggests too many items that users don't like, it is considered impersonal or ineffective. To better understand user sentiment about the breadth of recommendations given by a movie recommender, we conducted interviews and surveys and found out that many users considered narrow recommendations to be useful, while a smaller number explicitly wanted greater breadth. Additionally, we designed and ran an online field experiment with a larger user group, evaluating two new interfaces designed to provide users with greater access to broader recommendations. We looked at user preferences and behavior for two groups of users: those with higher initial movie diversity and those with lower diversity. Among our findings, we discovered that different level of exploration control and users' subjective preferences on interfaces are more predictive of their satisfaction with the recommender. △ Less

Submitted 23 September, 2023; originally announced September 2023.

Comments: International Journal of Human Computer Interaction

arXiv:2308.13632 [pdf, other]

ChainedFilter: Combining Membership Filters by Chain Rule

Authors: Haoyu Li, Liuhui Wang, Qizhi Chen, Jianan Ji, Yuhan Wu, Yikai Zhao, Tong Yang, Aditya Akella

Abstract: Membership (membership query / membership testing) is a fundamental problem across databases, networks and security. However, previous research has primarily focused on either approximate solutions, such as Bloom Filters, or exact methods, like perfect hashing and dictionaries, without attempting to develop a an integral theory. In this paper, we propose a unified and complete theory, namely chain… ▽ More Membership (membership query / membership testing) is a fundamental problem across databases, networks and security. However, previous research has primarily focused on either approximate solutions, such as Bloom Filters, or exact methods, like perfect hashing and dictionaries, without attempting to develop a an integral theory. In this paper, we propose a unified and complete theory, namely chain rule, for general membership problems, which encompasses both approximate and exact membership as extreme cases. Building upon the chain rule, we introduce a straightforward yet versatile algorithm framework, namely ChainedFilter, to combine different elementary filters without losing information. Our evaluation results demonstrate that ChainedFilter performs well in many applications: (1) it requires only 26% additional space over the theoretical lower bound for implicit static dictionary, (2) it requires only 0.22 additional bit per item over the theoretical lower bound for lossless data compression, (3) it reduces up to 31% external memory access than raw Cuckoo Hashing, (4) it reduces up to 36% P99 tail point query latency than Bloom Filter under the same space cost in RocksDB database, and (5) it reduces up to 99.1% filter space than original Learned Bloom Filter. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: SIGMOD 2024

arXiv:2308.12580 [pdf, other]

Laying foundations to quantify the "Effort of Reproducibility"

Authors: Akhil Pandey Akella, David Koop, Hamed Alhoori

Abstract: Why are some research studies easy to reproduce while others are difficult? Casting doubt on the accuracy of scientific work is not fruitful, especially when an individual researcher cannot reproduce the claims made in the paper. There could be many subjective reasons behind the inability to reproduce a scientific paper. The field of Machine Learning (ML) faces a reproducibility crisis, and survey… ▽ More Why are some research studies easy to reproduce while others are difficult? Casting doubt on the accuracy of scientific work is not fruitful, especially when an individual researcher cannot reproduce the claims made in the paper. There could be many subjective reasons behind the inability to reproduce a scientific paper. The field of Machine Learning (ML) faces a reproducibility crisis, and surveying a portion of published articles has resulted in a group realization that although sharing code repositories would be appreciable, code bases are not the end all be all for determining the reproducibility of an article. Various parties involved in the publication process have come forward to address the reproducibility crisis and solutions such as badging articles as reproducible, reproducibility checklists at conferences (\textit{NeurIPS, ICML, ICLR, etc.}), and sharing artifacts on \textit{OpenReview} come across as promising solutions to the core problem. The breadth of literature on reproducibility focuses on measures required to avoid ir-reproducibility, and there is not much research into the effort behind reproducing these articles. In this paper, we investigate the factors that contribute to the easiness and difficulty of reproducing previously published studies and report on the foundational framework to quantify effort of reproducibility. △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: Accepted at ACM/IEEE conference JCDL' 2023. Refer https://2023.jcdl.org/program/schedule-printable/ for confirmation

arXiv:2308.03615 [pdf, other]

Dirigo: Self-scaling Stateful Actors For Serverless Real-time Data Processing

Authors: Le Xu, Divyanshu Saxena, Neeraja J. Yadwadkar, Aditya Akella, Indranil Gupta

Abstract: We propose Dirigo, a distributed stream processing service built atop virtual actors. Dirigo achieves both a high level of resource efficiency and performance isolation driven by user intent (SLO). To improve resource efficiency, Dirigo adopts a serverless architecture that enables time-sharing of compute resources among streaming operators, both within and across applications. Meanwhile, Dirigo i… ▽ More We propose Dirigo, a distributed stream processing service built atop virtual actors. Dirigo achieves both a high level of resource efficiency and performance isolation driven by user intent (SLO). To improve resource efficiency, Dirigo adopts a serverless architecture that enables time-sharing of compute resources among streaming operators, both within and across applications. Meanwhile, Dirigo improves performance isolation by inheriting the property of function autoscaling from serverless architecture. Specifically, Dirigo proposes (i) dual-mode actor, an actor abstraction that dynamically provides orderliness guarantee for streaming operator during autoscaling and (ii) a data plane scheduling mechanism, along with its API, that allows scheduling and scaling at the message-level granularity. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2308.00852 [pdf, other]

CASSINI: Network-Aware Job Scheduling in Machine Learning Clusters

Authors: Sudarsanan Rajasekaran, Manya Ghobadi, Aditya Akella

Abstract: We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters. CASSINI introduces a novel geometric abstraction to consider the communication pattern of different jobs while placing them on network links. To do so, CASSINI uses an affinity graph that finds a series of time-shift values to adjust the communication phases of a subset of jobs, such that the communication patter… ▽ More We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters. CASSINI introduces a novel geometric abstraction to consider the communication pattern of different jobs while placing them on network links. To do so, CASSINI uses an affinity graph that finds a series of time-shift values to adjust the communication phases of a subset of jobs, such that the communication patterns of jobs sharing the same network link are interleaved with each other. Experiments with 13 common ML models on a 24-server testbed demonstrate that compared to the state-of-the-art ML schedulers, CASSINI improves the average and tail completion time of jobs by up to 1.6x and 2.5x, respectively. Moreover, we show that CASSINI reduces the number of ECN marked packets in the cluster by up to 33x. △ Less

Submitted 1 August, 2023; originally announced August 2023.

ACM Class: C.2.4

arXiv:2305.00538 [pdf, other]

SFC: Near-Source Congestion Signaling and Flow Control

Authors: Yanfang Le, Jeongkeun Lee, Jeremias Blendin, Jiayi Chen, Georgios Nikolaidis, Rong Pan, Robert Soule, Aditya Akella, Pedro Yebenes Segura, Arjun singhvi, Yuliang Li, Qingkai Meng, Changhoon Kim, Serhat Arslan

Abstract: State-of-the-art congestion control algorithms for data centers alone do not cope well with transient congestion and high traffic bursts. To help with these, we revisit the concept of direct \emph{backward} feedback from switches and propose Back-to-Sender (BTS) signaling to many concurrent incast senders. Combining it with our novel approach to in-network caching, we achieve near-source sub-RTT c… ▽ More State-of-the-art congestion control algorithms for data centers alone do not cope well with transient congestion and high traffic bursts. To help with these, we revisit the concept of direct \emph{backward} feedback from switches and propose Back-to-Sender (BTS) signaling to many concurrent incast senders. Combining it with our novel approach to in-network caching, we achieve near-source sub-RTT congestion signaling. Source Flow Control (SFC) combines these two simple signaling mechanisms to instantly pause traffic sources, hence avoiding the head-of-line blocking problem of conventional hop-by-hop flow control. Our prototype system and scale simulations demonstrate that near-source signaling can significantly reduce the message completion time of various workloads in the presence of incast, complementing existing congestion control algorithms. Our results show that SFC can reduce the $99^{th}$-percentile flow completion times by $1.2-6\times$ and the peak switch buffer usage by $2-3\times$ compared to the recent incast solutions. △ Less

Submitted 30 April, 2023; originally announced May 2023.

arXiv:2301.04369 [pdf, other]

Reproducibility Signals in Science: A preliminary analysis

Authors: Akhil Pandey Akella, Hamed Alhoori, David Koop

Abstract: Reproducibility is an important feature of science; experiments are retested, and analyses are repeated. Trust in the findings increases when consistent results are achieved. Despite the importance of reproducibility, significant work is often involved in these efforts, and some published findings may not be reproducible due to oversights or errors. In this paper, we examine a myriad of features i… ▽ More Reproducibility is an important feature of science; experiments are retested, and analyses are repeated. Trust in the findings increases when consistent results are achieved. Despite the importance of reproducibility, significant work is often involved in these efforts, and some published findings may not be reproducible due to oversights or errors. In this paper, we examine a myriad of features in scholarly articles published in computer science conferences and journals and test how they correlate with reproducibility. We collected data from three different sources that labeled publications as either reproducible or irreproducible and employed statistical significance tests to identify features of those publications that hold clues about reproducibility. We found the readability of the scholarly article and accessibility of the software artifacts through hyperlinks to be strong signals noticeable amongst reproducible scholarly articles. △ Less

Submitted 11 January, 2023; originally announced January 2023.

Comments: Accepted as a Workshop paper for WIESP-22 (https://ui.adsabs.harvard.edu/WIESP/2022/Schedule)

arXiv:2301.04205 [pdf, other]

A Performance Verification Methodology for Resource Allocation Heuristics

Authors: Saksham Goel, Benjamin Mikek, Jehad Aly, Venkat Arun, Ahmed Saeed, Aditya Akella

Abstract: Performance verification is a nascent but promising tool for understanding the performance and limitations of heuristics under realistic assumptions. Bespoke performance verification tools have already demonstrated their value in settings like congestion control and packet scheduling. In this paper, we aim to emphasize the broad applicability and utility of performance verification. To that end, w… ▽ More Performance verification is a nascent but promising tool for understanding the performance and limitations of heuristics under realistic assumptions. Bespoke performance verification tools have already demonstrated their value in settings like congestion control and packet scheduling. In this paper, we aim to emphasize the broad applicability and utility of performance verification. To that end, we highlight the design principles of performance verification. Then, we leverage that understanding to develop a set of easy-to-follow guidelines that are applicable to a wide range of resource allocation heuristics. In particular, we introduce Virelay, a framework that enables heuristic designers to express the behavior of their algorithms and their assumptions about the system in an environment that resembles a discrete-event simulator. We demonstrate the utility and ease-of-use of Virelay by applying it to six diverse case studies. We produce bounds on the performance of classical algorithms, work stealing and SRPT scheduling, under practical assumptions. We demonstrate Virelay's expressiveness by capturing existing models for congestion control and packet scheduling, and we verify the observation that TCP unfairness can cause some ML training workloads to spontaneously converge to a state of high network utilization. Finally, we use Virelay to identify two bugs in the Linux CFS load balancer. △ Less

Submitted 28 February, 2024; v1 submitted 10 January, 2023; originally announced January 2023.

Comments: 12 pages, 11 figures

arXiv:2211.05594 [pdf, other]

A Brief Survey on Representation Learning based Graph Dimensionality Reduction Techniques

Authors: Akhil Pandey Akella

Abstract: Dimensionality reduction techniques map data represented on higher dimensions onto lower dimensions with varying degrees of information loss. Graph dimensionality reduction techniques adopt the same principle of providing latent representations of the graph structure with minor adaptations to the output representations along with the input data. There exist several cutting edge techniques that are… ▽ More Dimensionality reduction techniques map data represented on higher dimensions onto lower dimensions with varying degrees of information loss. Graph dimensionality reduction techniques adopt the same principle of providing latent representations of the graph structure with minor adaptations to the output representations along with the input data. There exist several cutting edge techniques that are efficient at generating embeddings from graph data and projecting them onto low dimensional latent spaces. Due to variations in the operational philosophy, the benefits of a particular graph dimensionality reduction technique might not prove advantageous to every scenario or rather every dataset. As a result, some techniques are efficient at representing the relationship between nodes at lower dimensions, while others are good at encapsulating the entire graph structure on low dimensional space. We present this survey to outline the benefits as well as problems associated with the existing graph dimensionality reduction techniques. We also attempted to connect the dots regarding the potential improvements to some of the techniques. This survey could be helpful for upcoming researchers interested in exploring the usage of graph representation learning to effectively produce low-dimensional graph embeddings with varying degrees of granularity. △ Less

Submitted 13 October, 2022; originally announced November 2022.

arXiv:2210.16656 [pdf, other]

doi 10.1145/3620678.3624651

Auxo: Efficient Federated Learning via Scalable Client Clustering

Authors: Jiachen Liu, Fan Lai, Yinwei Dai, Aditya Akella, Harsha Madhyastha, Mosharaf Chowdhury

Abstract: Federated learning (FL) is an emerging machine learning (ML) paradigm that enables heterogeneous edge devices to collaboratively train ML models without revealing their raw data to a logically centralized server. However, beyond the heterogeneous device capacity, FL participants often exhibit differences in their data distributions, which are not independent and identically distributed (Non-IID).… ▽ More Federated learning (FL) is an emerging machine learning (ML) paradigm that enables heterogeneous edge devices to collaboratively train ML models without revealing their raw data to a logically centralized server. However, beyond the heterogeneous device capacity, FL participants often exhibit differences in their data distributions, which are not independent and identically distributed (Non-IID). Many existing works present point solutions to address issues like slow convergence, low final accuracy, and bias in FL, all stemming from client heterogeneity. In this paper, we explore an additional layer of complexity to mitigate such heterogeneity by grouping clients with statistically similar data distributions (cohorts). We propose Auxo to gradually identify such cohorts in large-scale, low-availability, and resource-constrained FL populations. Auxo then adaptively determines how to train cohort-specific models in order to achieve better model performance and ensure resource efficiency. Our extensive evaluations show that, by identifying cohorts with smaller heterogeneity and performing efficient cohort-based training, Auxo boosts various existing FL solutions in terms of final accuracy (2.1% - 8.2%), convergence time (up to 2.2x), and model bias (4.8% - 53.8%). △ Less

Submitted 30 September, 2023; v1 submitted 29 October, 2022; originally announced October 2022.

Comments: 18 pages

arXiv:2210.00093 [pdf, other]

Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning

Authors: Pengfei Zheng, Rui Pan, Tarannum Khan, Shivaram Venkataraman, Aditya Akella

Abstract: Dynamic adaptation has become an essential technique in accelerating distributed machine learning (ML) training. Recent studies have shown that dynamically adjusting model structure (e.g., lottery ticket hypothesis) or hyperparameters (e.g., batch size) can significantly accelerate training without sacrificing accuracy. However, existing ML cluster schedulers are not designed to handle dynamic ada… ▽ More Dynamic adaptation has become an essential technique in accelerating distributed machine learning (ML) training. Recent studies have shown that dynamically adjusting model structure (e.g., lottery ticket hypothesis) or hyperparameters (e.g., batch size) can significantly accelerate training without sacrificing accuracy. However, existing ML cluster schedulers are not designed to handle dynamic adaptation. We show that existing schemes fail to provide fairness and degrade system efficiency when the training throughput changes over time under dynamic adaptation. We design Shockwave, a scheduler with future planning that builds on two key ideas. First, Shockwave extends classic market theory from static settings to dynamic settings to co-optimize efficiency and fairness. Second, Shockwave utilizes stochastic dynamic programming to handle dynamic changes. We build a system for Shockwave and validate its performance with both trace-driven simulation and cluster experiments. Results show that for traces of ML jobs with dynamic adaptation, Shockwave improves makespan by 1.3X and fairness by 2X when compared with existing fair scheduling schemes. △ Less

Submitted 30 September, 2022; originally announced October 2022.

Comments: Accepted at the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI '23)

arXiv:2207.10898 [pdf, other]

Impact of RoCE Congestion Control Policies on Distributed Training of DNNs

Authors: Tarannum Khan, Saeed Rashidi, Srinivas Sridharan, Pallavi Shurpali, Aditya Akella, Tushar Krishna

Abstract: RDMA over Converged Ethernet (RoCE) has gained significant attraction for datacenter networks due to its compatibility with conventional Ethernet-based fabric. However, the RDMA protocol is efficient only on (nearly) lossless networks, emphasizing the vital role of congestion control on RoCE networks. Unfortunately, the native RoCE congestion control scheme, based on Priority Flow Control (PFC), s… ▽ More RDMA over Converged Ethernet (RoCE) has gained significant attraction for datacenter networks due to its compatibility with conventional Ethernet-based fabric. However, the RDMA protocol is efficient only on (nearly) lossless networks, emphasizing the vital role of congestion control on RoCE networks. Unfortunately, the native RoCE congestion control scheme, based on Priority Flow Control (PFC), suffers from many drawbacks such as unfairness, head-of-line-blocking, and deadlock. Therefore, in recent years many schemes have been proposed to provide additional congestion control for RoCE networks to minimize PFC drawbacks. However, these schemes are proposed for general datacenter environments. In contrast to the general datacenters that are built using commodity hardware and run general-purpose workloads, high-performance distributed training platforms deploy high-end accelerators and network components and exclusively run training workloads using collectives (All-Reduce, All-To-All) communication libraries for communication. Furthermore, these platforms usually have a private network, separating their communication traffic from the rest of the datacenter traffic. Scalable topology-aware collective algorithms are inherently designed to avoid incast patterns and balance traffic optimally. These distinct features necessitate revisiting previously proposed congestion control schemes for general-purpose datacenter environments. In this paper, we thoroughly analyze some of the SOTA RoCE congestion control schemes vs. PFC when running on distributed training platforms. Our results indicate that previously proposed RoCE congestion control schemes have little impact on the end-to-end performance of training workloads, motivating the necessity of designing an optimized, yet low-overhead, congestion control scheme based on the characteristics of distributed training platforms and workloads. △ Less

Submitted 22 July, 2022; originally announced July 2022.

arXiv:2205.14323 [pdf, other]

Multi-agent Databases via Independent Learning

Authors: Chi Zhang, Olga Papaemmanouil, Josiah P. Hanna, Aditya Akella

Abstract: Machine learning is rapidly being used in database research to improve the effectiveness of numerous tasks included but not limited to query optimization, workload scheduling, physical design, etc. Currently, the research focus has been on replacing a single database component responsible for one task by its learning-based counterpart. However, query performance is not simply determined by the per… ▽ More Machine learning is rapidly being used in database research to improve the effectiveness of numerous tasks included but not limited to query optimization, workload scheduling, physical design, etc. Currently, the research focus has been on replacing a single database component responsible for one task by its learning-based counterpart. However, query performance is not simply determined by the performance of a single component, but by the cooperation of multiple ones. As such, learning based database components need to collaborate during both training and execution in order to develop policies that meet end performance goals. Thus, the paper attempts to address the question "Is it possible to design a database consisting of various learned components that cooperatively work to improve end-to-end query latency?". To answer this question, we introduce MADB (Multi-Agent DB), a proof-of-concept system that incorporates a learned query scheduler and a learned query optimizer. MADB leverages a cooperative multi-agent reinforcement learning approach that allows the two components to exchange the context of their decisions with each other and collaboratively work towards reducing the query latency. Preliminary results demonstrate that MADB can outperform the non-cooperative integration of learned components. △ Less

Submitted 5 August, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

Journal ref: AIDB@VLDB 2022 Proceedings of 4th International Workshop on Applied AI for Database Systems and Applications

arXiv:2204.03211 [pdf, other]

Elastic Model Aggregation with Parameter Service

Authors: Juncheng Gu, Mosharaf Chowdhury, Kang G. Shin, Aditya Akella

Abstract: Model aggregation, the process that updates model parameters, is an important step for model convergence in distributed deep learning (DDL). However, the parameter server (PS), a popular paradigm of performing model aggregation, causes CPU underutilization in deep learning (DL) clusters, due to the bursty nature of aggregation and static resource allocation. To remedy this problem, we propose Para… ▽ More Model aggregation, the process that updates model parameters, is an important step for model convergence in distributed deep learning (DDL). However, the parameter server (PS), a popular paradigm of performing model aggregation, causes CPU underutilization in deep learning (DL) clusters, due to the bursty nature of aggregation and static resource allocation. To remedy this problem, we propose Parameter Service, an elastic model aggregation framework for DDL training, which decouples the function of model aggregation from individual training jobs and provides a shared model aggregation service to all jobs in the cluster. In Parameter Service, model aggregations are efficiently packed and dynamically migrated to fit into the available CPUs with negligible time overhead. Furthermore, Parameter Service can elastically manage its CPU resources based on its load to enhance resource efficiency. We have implemented Parameter Service in a prototype system called AutoPS and evaluated it via testbed experimentation and trace-driven simulations. AutoPS reduces up to 75% of CPU consumption with little or no performance impact on the training jobs. The design of Parameter Service is transparent to the users and can be incorporated in popular DL frameworks. △ Less

Submitted 7 April, 2022; originally announced April 2022.

arXiv:2111.10672 [pdf, other]

Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters

Authors: Adarsh Kumar, Kausik Subramanian, Shivaram Venkataraman, Aditya Akella

Abstract: Many organizations employ compute clusters equipped with accelerators such as GPUs and TPUs for training deep learning models in a distributed fashion. Training is resource-intensive, consuming significant compute, memory, and network resources. Many prior works explore how to reduce training resource footprint without impacting quality, but their focus on a subset of the bottlenecks (typically on… ▽ More Many organizations employ compute clusters equipped with accelerators such as GPUs and TPUs for training deep learning models in a distributed fashion. Training is resource-intensive, consuming significant compute, memory, and network resources. Many prior works explore how to reduce training resource footprint without impacting quality, but their focus on a subset of the bottlenecks (typically only the network) limits their ability to improve overall cluster utilization. In this work, we exploit the unique characteristics of deep learning workloads to propose Structured Partial Backpropagation(SPB), a technique that systematically controls the amount of backpropagation at individual workers in distributed training. This simultaneously reduces network bandwidth, compute utilization, and memory footprint while preserving model quality. To efficiently leverage the benefits of SPB at cluster level, we introduce JigSaw, a SPB aware scheduler, which does scheduling at the iteration level for Deep Learning Training(DLT) jobs. We find that JigSaw can improve large scale cluster efficiency by as high as 28\%. △ Less

Submitted 20 November, 2021; originally announced November 2021.

Comments: Accepted at DistributedML-2021

arXiv:2109.06449 [pdf, other]

Deep hierarchical reinforcement agents for automated penetration testing

Authors: Khuong Tran, Ashlesha Akella, Maxwell Standen, Junae Kim, David Bowman, Toby Richer, Chin-Teng Lin

Abstract: Penetration testing the organised attack of a computer system in order to test existing defences has been used extensively to evaluate network security. This is a time consuming process and requires in-depth knowledge for the establishment of a strategy that resembles a real cyber-attack. This paper presents a novel deep reinforcement learning architecture with hierarchically structured agents cal… ▽ More Penetration testing the organised attack of a computer system in order to test existing defences has been used extensively to evaluate network security. This is a time consuming process and requires in-depth knowledge for the establishment of a strategy that resembles a real cyber-attack. This paper presents a novel deep reinforcement learning architecture with hierarchically structured agents called HA-DRL, which employs an algebraic action decomposition strategy to address the large discrete action space of an autonomous penetration testing simulator where the number of actions is exponentially increased with the complexity of the designed cybersecurity network. The proposed architecture is shown to find the optimal attacking policy faster and more stably than a conventional deep Q-learning agent which is commonly used as a method to apply artificial intelligence in automatic penetration testing. △ Less

Submitted 14 September, 2021; originally announced September 2021.

Comments: Presented at 1st International Workshop on Adaptive Cyber Defense, 2021 (arXiv:2108.08476)

Report number: IJCAI-ACD/2021/114

arXiv:2101.07344 [pdf, other]

Accelerating Deep Learning Inference via Learned Caches

Authors: Arjun Balasubramanian, Adarsh Kumar, Yuhan Liu, Han Cao, Shivaram Venkataraman, Aditya Akella

Abstract: Deep Neural Networks (DNNs) are witnessing increased adoption in multiple domains owing to their high accuracy in solving real-world problems. However, this high accuracy has been achieved by building deeper networks, posing a fundamental challenge to the low latency inference desired by user-facing applications. Current low latency solutions trade-off on accuracy or fail to exploit the inherent t… ▽ More Deep Neural Networks (DNNs) are witnessing increased adoption in multiple domains owing to their high accuracy in solving real-world problems. However, this high accuracy has been achieved by building deeper networks, posing a fundamental challenge to the low latency inference desired by user-facing applications. Current low latency solutions trade-off on accuracy or fail to exploit the inherent temporal locality in prediction serving workloads. We observe that caching hidden layer outputs of the DNN can introduce a form of late-binding where inference requests only consume the amount of computation needed. This enables a mechanism for achieving low latencies, coupled with an ability to exploit temporal locality. However, traditional caching approaches incur high memory overheads and lookup latencies, leading us to design learned caches - caches that consist of simple ML models that are continuously updated. We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency DNN inference. Results show that GATI can reduce inference latency by up to 7.69X on realistic workloads. △ Less

Submitted 18 January, 2021; originally announced January 2021.

arXiv:2101.06537 [pdf, other]

PL2: Towards Predictable Low Latency in Rack-Scale Networks

Authors: Yanfang Le, Radhika Niranjan Mysore, Lalith Suresh, Gerd Zellweger, Sujata Banerjee, Aditya Akella, Michael Swift

Abstract: High performance rack-scale offerings package disaggregated pools of compute, memory and storage hardware in a single rack to run diverse workloads with varying requirements, including applications that need low and predictable latency. The intra-rack network is typically high speed Ethernet, which can suffer from congestion leading to packet drops and may not satisfy the stringent tail latency re… ▽ More High performance rack-scale offerings package disaggregated pools of compute, memory and storage hardware in a single rack to run diverse workloads with varying requirements, including applications that need low and predictable latency. The intra-rack network is typically high speed Ethernet, which can suffer from congestion leading to packet drops and may not satisfy the stringent tail latency requirements for some workloads (including remote memory/storage accesses). In this paper, we design a Predictable Low Latency(PL2) network architecture for rack-scale systems with Ethernet as interconnecting fabric. PL2 leverages programmable Ethernet switches to carefully schedule packets such that they incur no loss with NIC and switch queues maintained at small, near-zero levels. In our 100 Gbps rack-prototype, PL2 keeps 99th-percentile memcached RPC latencies under 60us even when the RPCs compete with extreme offered-loads of 400%, without losing traffic. Network transfers for a machine learning training task complete 30% faster than a receiver-driven scheme implementation modeled after Homa (222ms vs 321ms 99%ile latency per iteration). △ Less

Submitted 22 January, 2021; v1 submitted 16 January, 2021; originally announced January 2021.

Comments: 13 pages, 23 pdf figures, use acmart.cls

arXiv:2012.13599 [pdf]

Early Indicators of Scientific Impact: Predicting Citations with Altmetrics

Authors: Akhil Pandey Akella, Hamed Alhoori, Pavan Ravikanth Kondamudi, Cole Freeman, Haiming Zhou

Abstract: Identifying important scholarly literature at an early stage is vital to the academic research community and other stakeholders such as technology companies and government bodies. Due to the sheer amount of research published and the growth of ever-changing interdisciplinary areas, researchers need an efficient way to identify important scholarly work. The number of citations a given research publ… ▽ More Identifying important scholarly literature at an early stage is vital to the academic research community and other stakeholders such as technology companies and government bodies. Due to the sheer amount of research published and the growth of ever-changing interdisciplinary areas, researchers need an efficient way to identify important scholarly work. The number of citations a given research publication has accrued has been used for this purpose, but these take time to occur and longer to accumulate. In this article, we use altmetrics to predict the short-term and long-term citations that a scholarly publication could receive. We build various classification and regression models and evaluate their performance, finding neural networks and ensemble models to perform best for these tasks. We also find that Mendeley readership is the most important factor in predicting the early citations, followed by other factors such as the academic status of the readers (e.g., student, postdoc, professor), followers on Twitter, online post length, author count, and the number of mentions on Twitter, Wikipedia, and across different countries. △ Less

Submitted 25 December, 2020; originally announced December 2020.

arXiv:2002.02645 [pdf, other]

Accelerating Deep Learning Inference via Freezing

Authors: Adarsh Kumar, Arjun Balasubramanian, Shivaram Venkataraman, Aditya Akella

Abstract: Over the last few years, Deep Neural Networks (DNNs) have become ubiquitous owing to their high accuracy on real-world tasks. However, this increase in accuracy comes at the cost of computationally expensive models leading to higher prediction latencies. Prior efforts to reduce this latency such as quantization, model distillation, and any-time prediction models typically trade-off accuracy for pe… ▽ More Over the last few years, Deep Neural Networks (DNNs) have become ubiquitous owing to their high accuracy on real-world tasks. However, this increase in accuracy comes at the cost of computationally expensive models leading to higher prediction latencies. Prior efforts to reduce this latency such as quantization, model distillation, and any-time prediction models typically trade-off accuracy for performance. In this work, we observe that caching intermediate layer outputs can help us avoid running all the layers of a DNN for a sizeable fraction of inference requests. We find that this can potentially reduce the number of effective layers by half for 91.58% of CIFAR-10 requests run on ResNet-18. We present Freeze Inference, a system that introduces approximate caching at each intermediate layer and we discuss techniques to reduce the cache size and improve the cache hit rate. Finally, we discuss some of the open research challenges in realizing such a design. △ Less

Submitted 7 February, 2020; originally announced February 2020.

Comments: 11th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2019

arXiv:1912.02402 [pdf, other]

D2R: Dataplane-Only Policy-Compliant Routing Under Failures

Authors: Kausik Subramanian, Anubhavnidhi Abhashkumar, Loris D'Antoni, Aditya Akella

Abstract: In networks today, the data plane handles forwarding---sending a packet to the next device in the path---and the control plane handles routing---deciding the path of the packet in the network. This architecture has limitations. First, when link failures occur, the data plane has to wait for the control plane to install new routes, and packet losses can occur due to delayed routing convergence or c… ▽ More In networks today, the data plane handles forwarding---sending a packet to the next device in the path---and the control plane handles routing---deciding the path of the packet in the network. This architecture has limitations. First, when link failures occur, the data plane has to wait for the control plane to install new routes, and packet losses can occur due to delayed routing convergence or central controller latencies. Second, policy-compliance is not guaranteed without sophisticated configuration synthesis or controller intervention. In this paper, we take advantage of the recent advances in fast programmable switches to perform policy-compliant route computations entirely in the data plane, thus providing fast reactions to failures. D2R, our new network architecture, can provide the illusion of a network fabric that is always available and policy-compliant, even under failures. We implement our data plane in P4 and demonstrate its viability in real world topologies. △ Less

Submitted 5 December, 2019; originally announced December 2019.

arXiv:1911.09849 [pdf, other]

Archipelago: A Scalable Low-Latency Serverless Platform

Authors: Arjun Singhvi, Kevin Houck, Arjun Balasubramanian, Mohammed Danish Shaikh, Shivaram Venkataraman, Aditya Akella

Abstract: The increased use of micro-services to build web applications has spurred the rapid growth of Function-as-a-Service (FaaS) or serverless computing platforms. While FaaS simplifies provisioning and scaling for application developers, it introduces new challenges in resource management that need to be handled by the cloud provider. Our analysis of popular serverless workloads indicates that schedule… ▽ More The increased use of micro-services to build web applications has spurred the rapid growth of Function-as-a-Service (FaaS) or serverless computing platforms. While FaaS simplifies provisioning and scaling for application developers, it introduces new challenges in resource management that need to be handled by the cloud provider. Our analysis of popular serverless workloads indicates that schedulers need to handle functions that are very short-lived, have unpredictable arrival patterns, and require expensive setup of sandboxes. The challenge of running a large number of such functions in a multi-tenant cluster makes existing scheduling frameworks unsuitable. We present Archipelago, a platform that enables low latency request execution in a multi-tenant serverless setting. Archipelago views each application as a DAG of functions, and every DAG in associated with a latency deadline. Archipelago achieves its per-DAG request latency goals by: (1) partitioning a given cluster into a number of smaller worker pools, and associating each pool with a semi-global scheduler (SGS), (2) using a latency-aware scheduler within each SGS along with proactive sandbox allocation to reduce overheads, and (3) using a load balancing layer to route requests for different DAGs to the appropriate SGS, and automatically scale the number of SGSs per DAG. Our testbed results show that Archipelago meets the latency deadline for more than 99% of realistic application request workloads, and reduces tail latencies by up to 36X compared to state-of-the-art serverless platforms. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: 14 pages

arXiv:1910.07700 [pdf, other]

SNF: Serverless Network Functions

Authors: Arjun Singhvi, Junaid Khalid, Aditya Akella, Sujata Banerjee

Abstract: It is increasingly common to outsource network functions (NFs) to the cloud. However, no cloud providers offer NFs-as-a-Service (NFaaS) that allows users to run custom NFs. Our work addresses how a cloud provider can offer NFaaS. We use the emerging serverless computing paradigm as it has the right building blocks - usage-based billing, convenient event-driven programming model and automatic compu… ▽ More It is increasingly common to outsource network functions (NFs) to the cloud. However, no cloud providers offer NFs-as-a-Service (NFaaS) that allows users to run custom NFs. Our work addresses how a cloud provider can offer NFaaS. We use the emerging serverless computing paradigm as it has the right building blocks - usage-based billing, convenient event-driven programming model and automatic compute elasticity. Towards this end, we identify two core limitations of existing serverless platforms to support demanding stateful NFs - coupling of the billing and work assignment granularities, and state sharing via an external store. We develop a novel NFaaS framework, SNF, that overcomes these issues using two ideas. SNF allocates work at the granularity of flowlets observed in network traffic, whereas billing and programming occur on the basis of packets. SNF embellishes serverless platforms with ephemeral state that lasts for the duration of the flowlet and supports high performance state operations between compute units in a peer-to-peer manner. We present algorithms for work allocation and state maintenance, and demonstrate that our SNF prototype dynamically adapts compute resources for various stateful NFs based on traffic demand at very fine time scales, with minimal overheads. △ Less

Submitted 16 October, 2019; originally announced October 2019.

Comments: 14 pages

arXiv:1907.01484 [pdf, other]

Themis: Fair and Efficient GPU Cluster Scheduling

Authors: Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, Shuchi Chawla

Abstract: Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads. We find that established cluster scheduling disciplines are a poor fit because of ML workloads' unique attributes: ML jobs h… ▽ More Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads. We find that established cluster scheduling disciplines are a poor fit because of ML workloads' unique attributes: ML jobs have long-running tasks that need to be gang-scheduled, and their performance is sensitive to tasks' relative placement. We propose Themis, a new scheduling framework for ML training workloads. It's GPU allocation policy enforces that ML workloads complete in a finish-time fair manner, a new notion we introduce. To capture placement sensitivity and ensure efficiency, Themis uses a two-level scheduling architecture where ML workloads bid on available resources that are offered in an auction run by a central arbiter. Our auction design allocates GPUs to winning bids by trading off efficiency for fairness in the short term but ensuring finish-time fairness in the long term. Our evaluation on a production trace shows that Themis can improve fairness by more than 2.25X and is ~5% to 250% more cluster efficient in comparison to state-of-the-art schedulers. △ Less

Submitted 29 October, 2019; v1 submitted 2 July, 2019; originally announced July 2019.

arXiv:1907.00434 [pdf, other]

Network-accelerated Distributed Machine Learning Using MLFabric

Authors: Raajay Viswanathan, Aditya Akella

Abstract: Existing distributed machine learning (DML) systems focus on improving the computational efficiency of distributed learning, whereas communication aspects have received less attention. Many DML systems treat the network as a blackbox. Thus, DML algorithms' performance is impeded by network bottlenecks, and DML systems end up sacrificing important algorithmic and system-level benefits. We present M… ▽ More Existing distributed machine learning (DML) systems focus on improving the computational efficiency of distributed learning, whereas communication aspects have received less attention. Many DML systems treat the network as a blackbox. Thus, DML algorithms' performance is impeded by network bottlenecks, and DML systems end up sacrificing important algorithmic and system-level benefits. We present MLfabric, a communication library that manages all network transfers in a DML system, and holistically determines the communication pattern of a DML algorithm at any point in time. This allows MLfabric to carefully order transfers (i.e., gradient updates) to improve convergence, opportunistically aggregate updates in-network to improve efficiency, and proactively replicate some of them to support new notions of fault tolerance. We empirically find that MLfabric achieves up to 3X speed-up in training large deep learning models in realistic dynamic cluster settings. △ Less

Submitted 30 June, 2019; originally announced July 2019.

arXiv:1906.02043 [pdf, other]

Tiramisu: Fast and General Network Verification

Authors: Anubhavnidhi Abhashkumar, Aaron Gember-Jacobson, Aditya Akella

Abstract: Today's distributed network control planes support multiple routing protocols, filtering mechanisms, and route selection policies. These protocols operate at different layers, e.g. BGP operates at the EGP layer, OSPF at the IGP layer, and VLANs at layer 2. The behavior of a network's control plane depends on how these protocols interact with each other. This makes network configurations highly com… ▽ More Today's distributed network control planes support multiple routing protocols, filtering mechanisms, and route selection policies. These protocols operate at different layers, e.g. BGP operates at the EGP layer, OSPF at the IGP layer, and VLANs at layer 2. The behavior of a network's control plane depends on how these protocols interact with each other. This makes network configurations highly complex and error-prone. State-of-the-art control plane verifiers are either too slow, or do not model certain features of the network. In this paper, we propose a new multilayer hedge graph abstraction, Tiramisu, that supports fast verification of the control plane. Tiramisu uses a combination of graph traversal algorithms and ILPs (Integer Linear Programs) to check different network policies. We use Tiramisu to verify policies of various real-world and synthetic configurations. Our experiments show that Tiramisu can verify any policy in < 0.08 s in small networks (~35 devices) and < 0.12 s in large networks (~160 devices), and it is 10-600X faster than state-of-the-art without losing generality. △ Less

Submitted 5 June, 2019; originally announced June 2019.

Comments: 14 pages + Appendices

arXiv:1808.08309 [pdf, other]

Trajectory Tracking Control of a Flexible Spine Robot, With and Without a Reference Input

Authors: Andrew P. Sabelhaus, Shirley Huajing Zhao, Mallory C. Daly, Ellande Tang, Edward Zhu, Abishek K. Akella, Zeerek A. Ahmad, Vytas SunSpiral, Alice M. Agogino

Abstract: The Underactuated Lightweight Tensegrity Robotic Assistive Spine (ULTRA Spine) project is an ongoing effort to develop a flexible, actuated backbone for quadruped robots. In this work, model-predictive control is used to track a trajectory in the robot's state space, in simulation. The state trajectory used here corresponds to a bending motion of the spine, with translations and rotations of the m… ▽ More The Underactuated Lightweight Tensegrity Robotic Assistive Spine (ULTRA Spine) project is an ongoing effort to develop a flexible, actuated backbone for quadruped robots. In this work, model-predictive control is used to track a trajectory in the robot's state space, in simulation. The state trajectory used here corresponds to a bending motion of the spine, with translations and rotations of the moving vertebrae. Two different controllers are presented in this work: one that does not use a reference input but includes smoothing constrants, and a second one that uses a reference input without smoothing. For the smoothing controller, without reference inputs, the error converges to zero, while the simpler-to-tune controller with an input reference shows small errors but not complete convergence. It is expected that this controller will converge as it is improved further. △ Less

Submitted 24 August, 2018; originally announced August 2018.

Journal ref: 2017 NASA/ESA Conference on Adaptive Hardware and Systems - Workshop on Structurally Adaptive Tensegrity Robots

arXiv:1703.10272 [pdf, other]

Whiz: A Fast and Flexible Data Analytics System

Authors: Robert Grandl, Arjun Singhvi, Raajay Viswanathan, Aditya Akella

Abstract: Today's data analytics frameworks are compute-centric, with analytics execution almost entirely dependent on the pre-determined physical structure of the high-level computation. Relegating intermediate data to a second class entity in this manner hurts flexibility, performance, and efficiency. We present Whiz, a new analytics framework that cleanly separates computation from intermediate data. It… ▽ More Today's data analytics frameworks are compute-centric, with analytics execution almost entirely dependent on the pre-determined physical structure of the high-level computation. Relegating intermediate data to a second class entity in this manner hurts flexibility, performance, and efficiency. We present Whiz, a new analytics framework that cleanly separates computation from intermediate data. It enables runtime visibility into data via programmable monitoring, and data-driven computation (where intermediate data values drive when/what computation runs) via an event abstraction. Experiments with a Whiz prototype on a large cluster using batch, streaming, and graph analytics workloads show that its performance is 1.3-2x better than state-of-the-art. △ Less

Submitted 21 June, 2019; v1 submitted 29 March, 2017; originally announced March 2017.

Comments: 16 pages

arXiv:1612.01497 [pdf, other]

Correctness and Performance for Stateful Chained Network Functions

Authors: Junaid Khalid, Aditya Akella

Abstract: Network functions virtualization (NFV) allows operators to employ NF chains to realize custom policies, and dynamically add instances to meet demand or for failover. NFs maintain detailed per- and cross-flow state which needs careful management, especially during dynamic actions. Crucially, state management must: (1) ensure NF chain-wide correctness and (2) have good performance. To this end, we b… ▽ More Network functions virtualization (NFV) allows operators to employ NF chains to realize custom policies, and dynamically add instances to meet demand or for failover. NFs maintain detailed per- and cross-flow state which needs careful management, especially during dynamic actions. Crucially, state management must: (1) ensure NF chain-wide correctness and (2) have good performance. To this end, we built \name, an NFV framework that leverages an external state store coupled with state management algorithms and metadata maintenance for correct operation even under a range of failures. Our evaluation shows that CHC can support ~10Gbps per-NF throughput and <0.6mus increase in median per-NF packet processing latency, and chain-wide correctness at little additional cost. △ Less

Submitted 16 October, 2018; v1 submitted 5 December, 2016; originally announced December 2016.

arXiv:1604.07371 [pdf, other]

Do the Hard Stuff First: Scheduling Dependent Computations in Data-Analytics Clusters

Authors: Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, Janardhan Kulkarni

Abstract: We present a scheduler that improves cluster utilization and job completion times by packing tasks having multi-resource requirements and inter-dependencies. While the problem is algorithmically very hard, we achieve near-optimality on the job DAGs that appear in production clusters at a large enterprise and in benchmarks such as TPC-DS. A key insight is that carefully handling the long-running ta… ▽ More We present a scheduler that improves cluster utilization and job completion times by packing tasks having multi-resource requirements and inter-dependencies. While the problem is algorithmically very hard, we achieve near-optimality on the job DAGs that appear in production clusters at a large enterprise and in benchmarks such as TPC-DS. A key insight is that carefully handling the long-running tasks and those with tough-to-pack resource needs will produce good-enough schedules. However, which subset of tasks to treat carefully is not clear (and intractable to discover). Hence, we offer a search procedure that evaluates various possibilities and outputs a preferred schedule order over tasks. An online component enforces the schedule orders desired by the various jobs running on the cluster. In addition, it packs tasks, overbooks the fungible resources and guarantees bounded unfairness for a variety of desirable fairness schemes. Relative to the state-of-the art schedulers, we speed up 50% of the jobs by over 30% each. △ Less

Submitted 25 April, 2016; originally announced April 2016.

Report number: MSR-TR-2016-19

arXiv:1403.7115 [pdf, ps, other]

Active Switching: Packet Steering Flow Annotations

Authors: Saul St. John, Aditya Akella

Abstract: Our previous experience building systems for middlebox chain composition and scaling in software-defined networks has revealed that existing mechanisms of flow annotation commonly do not survive middlebox-traversals, or suffer from extreme identifier domain limitations resulting in excessive flow table size. In this paper, we analyze the structural artifacts resulting in these challenges, and offe… ▽ More Our previous experience building systems for middlebox chain composition and scaling in software-defined networks has revealed that existing mechanisms of flow annotation commonly do not survive middlebox-traversals, or suffer from extreme identifier domain limitations resulting in excessive flow table size. In this paper, we analyze the structural artifacts resulting in these challenges, and offer a framework for describing the behavior of middleboxes based on actions taken on traversing packets. We then present a novel mechanism for flow annotation that features an identifier domain significantly larger than existing techniques, that is transparent to hosts traversed, and that conserves flow-table resources by requiring only a small number of match rules and actions in most switches. We evaluate said technique, showing that it requires less per-switch state than conventional techniques. We then describe extensions allowing implementation of this architecture within a broader class of systems. Finally, we close with architectural suggestions for enabling straightforward integration of middleboxes within software-defined networks. △ Less

Submitted 27 March, 2014; originally announced March 2014.

MSC Class: 68M10 ACM Class: C.2.1

arXiv:1305.0209 [pdf, ps, other]

Stratos: A Network-Aware Orchestration Layer for Virtual Middleboxes in Clouds

Authors: Aaron Gember, Anand Krishnamurthy, Saul St. John, Robert Grandl, Xiaoyang Gao, Ashok Anand, Theophilus Benson, Vyas Sekar, Aditya Akella

Abstract: Enterprises want their in-cloud services to leverage the performance and security benefits that middleboxes offer in traditional deployments. Such virtualized deployments create new opportunities (e.g., flexible scaling) as well as new challenges (e.g., dynamics, multiplexing) for middlebox management tasks such as service composition and provisioning. Unfortunately, enterprises lack systematic to… ▽ More Enterprises want their in-cloud services to leverage the performance and security benefits that middleboxes offer in traditional deployments. Such virtualized deployments create new opportunities (e.g., flexible scaling) as well as new challenges (e.g., dynamics, multiplexing) for middlebox management tasks such as service composition and provisioning. Unfortunately, enterprises lack systematic tools to efficiently compose and provision in-the-cloud middleboxes and thus fall short of achieving the benefits that cloud-based deployments can offer. To this end, we present the design and implementation of Stratos, an orchestration layer for virtual middleboxes. Stratos provides efficient and correct composition in the presence of dynamic scaling via software-defined networking mechanisms. It ensures efficient and scalable provisioning by combining middlebox-specific traffic engineering, placement, and horizontal scaling strategies. We demonstrate the effectiveness of Stratos using an experimental prototype testbed and large-scale simulations. △ Less

Submitted 11 March, 2014; v1 submitted 1 May, 2013; originally announced May 2013.

Showing 1–42 of 42 results for author: Akella, A