-
D3: An Adaptive Reconfigurable Datacenter Network
Authors:
Johannes Zerwas,
Chen Griner,
Stefan Schmid,
Chen Avin
Abstract:
The explosively growing communication traffic in datacenters imposes increasingly stringent performance requirements on the underlying networks. Over the last years, researchers have developed innovative optical switching technologies that enable reconfigurable datacenter networks (RCDNs) which support very fast topology reconfigurations. This paper presents D3, a novel and feasible RDCN architect…
▽ More
The explosively growing communication traffic in datacenters imposes increasingly stringent performance requirements on the underlying networks. Over the last years, researchers have developed innovative optical switching technologies that enable reconfigurable datacenter networks (RCDNs) which support very fast topology reconfigurations. This paper presents D3, a novel and feasible RDCN architecture that improves throughput and flow completion time. D3 quickly and jointly adapts its links and packet scheduling toward the evolving demand, combining both demand-oblivious and demand-aware behaviors when needed. D3 relies on a decentralized network control plane supporting greedy, integrated-multihop, IP-based routing, allowing to react, quickly and locally, to topological changes without overheads. A rack-local synchronization and transport layer further support fast network adjustments. Moreover, we argue that D3 can be implemented using the recently proposed Sirius architecture (SIGCOMM 2020). We report on an extensive empirical evaluation using packet-level simulations. We find that D3 improves throughput by up to 15% and preserves competitive flow completion times compared to the state of the art. We further provide an analytical explanation of the superiority of D3, introducing an extension of the well-known Birkhoff-von Neumann decomposition, which may be of independent interest.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Understanding the Throughput Bounds of Reconfigurable Datacenter Networks
Authors:
Vamsi Addanki,
Chen Avin,
Stefan Schmid
Abstract:
The increasing gap between the growth of datacenter traffic volume and the capacity of electrical switches led to the emergence of reconfigurable datacenter network designs based on optical circuit switching. A multitude of research works, ranging from demand-oblivious (e.g., RotorNet, Sirius) to demand-aware (e.g., Helios, ProjecToR) reconfigurable networks, demonstrate significant performance be…
▽ More
The increasing gap between the growth of datacenter traffic volume and the capacity of electrical switches led to the emergence of reconfigurable datacenter network designs based on optical circuit switching. A multitude of research works, ranging from demand-oblivious (e.g., RotorNet, Sirius) to demand-aware (e.g., Helios, ProjecToR) reconfigurable networks, demonstrate significant performance benefits. Unfortunately, little is formally known about the achievable throughput of such networks. Only recently have the throughput bounds of demand-oblivious networks been studied. In this paper, we tackle a fundamental question: Whether and to what extent can demand-aware reconfigurable networks improve the throughput of datacenters?
This paper attempts to understand the landscape of the throughput bounds of reconfigurable datacenter networks. Given the rise of machine learning workloads and collective communication in modern datacenters, we specifically focus on their typical communication patterns, namely uniform-residual demand matrices. We formally establish a separation bound of demand-aware networks over demand-oblivious networks, proving analytically that the former can provide at least $16\%$ higher throughput. Our analysis further uncovers new design opportunities based on periodic, fixed-duration reconfigurations that can harness the throughput benefits of demand-aware networks while inheriting the simplicity and low reconfiguration overheads of demand-oblivious networks. Finally, our evaluations corroborate the theoretical results of this paper, demonstrating that demand-aware networks significantly outperform oblivious networks in terms of throughput. This work barely scratches the surface and unveils several intriguing open questions, which we discuss at the end of this paper.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Learning Minimum Linear Arrangement of Cliques and Lines
Authors:
Julien Dallot,
Maciej Pacut,
Marcin Bienkowski,
Darya Melnyk,
Stefan Schmid
Abstract:
In the well-known Minimum Linear Arrangement problem (MinLA), the goal is to arrange the nodes of an undirected graph into a permutation so that the total stretch of the edges is minimized. This paper studies an online (learning) variant of MinLA where the graph is not given at the beginning, but rather revealed piece-by-piece. The algorithm starts in a fixed initial permutation, and after a piece…
▽ More
In the well-known Minimum Linear Arrangement problem (MinLA), the goal is to arrange the nodes of an undirected graph into a permutation so that the total stretch of the edges is minimized. This paper studies an online (learning) variant of MinLA where the graph is not given at the beginning, but rather revealed piece-by-piece. The algorithm starts in a fixed initial permutation, and after a piece of the graph is revealed, the algorithm must update its current permutation to be a MinLA of the subgraph revealed so far. The objective is to minimize the total number of swaps of adjacent nodes as the algorithm updates the permutation.
The main result of this paper is an online randomized algorithm that solves this online variant for the restricted cases where the revealed graph is either a collection of cliques or a collection of lines. We show that the algorithm is $8 \ln n$-competitive, where $n$ is the number of nodes of the graph. We complement this result by constructing an asymptotically matching lower bound of $Ω(\ln n)$.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Energy-based Hopfield Boosting for Out-of-Distribution Detection
Authors:
Claus Hofmann,
Simon Schmid,
Bernhard Lehner,
Daniel Klotz,
Sepp Hochreiter
Abstract:
Out-of-distribution (OOD) detection is critical when deploying machine learning models in the real world. Outlier exposure methods, which incorporate auxiliary outlier data in the training process, can drastically improve OOD detection performance compared to approaches without advanced training strategies. We introduce Hopfield Boosting, a boosting approach, which leverages modern Hopfield energy…
▽ More
Out-of-distribution (OOD) detection is critical when deploying machine learning models in the real world. Outlier exposure methods, which incorporate auxiliary outlier data in the training process, can drastically improve OOD detection performance compared to approaches without advanced training strategies. We introduce Hopfield Boosting, a boosting approach, which leverages modern Hopfield energy (MHE) to sharpen the decision boundary between the in-distribution and OOD data. Hopfield Boosting encourages the model to concentrate on hard-to-distinguish auxiliary outlier examples that lie close to the decision boundary between in-distribution and auxiliary outlier data. Our method achieves a new state-of-the-art in OOD detection with outlier exposure, improving the FPR95 metric from 2.28 to 0.92 on CIFAR-10 and from 11.76 to 7.94 on CIFAR-100.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
In the Search of Optimal Tree Networks: Hardness and Heuristics
Authors:
Maxim Buzdalov,
Pavel Martynov,
Sergey Pankratov,
Vitaly Aksenov,
Stefan Schmid
Abstract:
Demand-aware communication networks are networks whose topology is optimized toward the traffic they need to serve. These networks have recently been enabled by novel optical communication technologies and are investigated intensively in the context of datacenters. In this work, we consider networks with one of the most common topologies~ -- a binary tree.
We show that finding an optimal demand-…
▽ More
Demand-aware communication networks are networks whose topology is optimized toward the traffic they need to serve. These networks have recently been enabled by novel optical communication technologies and are investigated intensively in the context of datacenters. In this work, we consider networks with one of the most common topologies~ -- a binary tree.
We show that finding an optimal demand-aware binary tree network is NP-hard. Then, we propose optimization algorithms that generate efficient binary tree networks on real-life and synthetic workloads.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
I DPID It My Way! A Covert Timing Channel in Software-Defined Networks
Authors:
Robert Krösche,
Kashyap Thimmaraju,
Liron Schiff,
Stefan Schmid
Abstract:
Software-defined networking is considered a promising new paradigm, enabling more reliable and formally verifiable communication networks. However, this paper shows that the separation of the control plane from the data plane, which lies at the heart of Software-Defined Networks (SDNs), can be exploited for covert channels based on SDN Teleportation, even when the data planes are physically discon…
▽ More
Software-defined networking is considered a promising new paradigm, enabling more reliable and formally verifiable communication networks. However, this paper shows that the separation of the control plane from the data plane, which lies at the heart of Software-Defined Networks (SDNs), can be exploited for covert channels based on SDN Teleportation, even when the data planes are physically disconnected.
This paper describes the theoretical model and design of our covert timing channel based on SDN Teleportation. We implement our covert channel using a popular SDN switch, Open vSwitch, and a popular SDN controller, ONOS. Our evaluation of the prototype shows that even under load at the controller, throughput rates of 20 bits per second are possible, with a communication accuracy of approximately 90\%. We also discuss techniques to increase the throughput further.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
MTS: Bringing Multi-Tenancy to Virtual Networking
Authors:
Kashyap Thimmaraju,
Saad Hermak,
Gábor Rétvári,
Stefan Schmid
Abstract:
Multi-tenant cloud computing provides great benefits in terms of resource sharing, elastic pricing, and scalability, however, it also changes the security landscape and introduces the need for strong isolation between the tenants, also inside the network. This paper is motivated by the observation that while multi-tenancy is widely used in cloud computing, the virtual switch designs currently used…
▽ More
Multi-tenant cloud computing provides great benefits in terms of resource sharing, elastic pricing, and scalability, however, it also changes the security landscape and introduces the need for strong isolation between the tenants, also inside the network. This paper is motivated by the observation that while multi-tenancy is widely used in cloud computing, the virtual switch designs currently used for network virtualization lack sufficient support for tenant isolation. Hence, we present, implement, and evaluate a virtual switch architecture, MTS, which brings secure design best-practice to the context of multi-tenant virtual networking: compartmentalization of virtual switches, least-privilege execution, complete mediation of all network communication, and reducing the trusted computing base shared between tenants. We build MTS from commodity components, providing an incrementally deployable and inexpensive upgrade path to cloud operators. Our extensive experiments, extending to both micro-benchmarks and cloud applications, show that, depending on the way it is deployed, MTS may produce 1.5-2x the throughput compared to state-of-the-art, with similar or better latency and modest resource overhead (1 extra CPU). MTS is available as open source software.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators
Authors:
Benedikt Alkin,
Andreas Fürst,
Simon Schmid,
Lukas Gruber,
Markus Holzleitner,
Johannes Brandstetter
Abstract:
Neural operators, serving as physics surrogate models, have recently gained increased interest. With ever increasing problem complexity, the natural question arises: what is an efficient way to scale neural operators to larger and more complex simulations - most importantly by taking into account different types of simulation datasets. This is of special interest since, akin to their numerical cou…
▽ More
Neural operators, serving as physics surrogate models, have recently gained increased interest. With ever increasing problem complexity, the natural question arises: what is an efficient way to scale neural operators to larger and more complex simulations - most importantly by taking into account different types of simulation datasets. This is of special interest since, akin to their numerical counterparts, different techniques are used across applications, even if the underlying dynamics of the systems are similar. Whereas the flexibility of transformers has enabled unified architectures across domains, neural operators mostly follow a problem specific design, where GNNs are commonly used for Lagrangian simulations and grid-based models predominate Eulerian simulations. We introduce Universal Physics Transformers (UPTs), an efficient and unified learning paradigm for a wide range of spatio-temporal problems. UPTs operate without grid- or particle-based latent structures, enabling flexibility and scalability across meshes and particles. UPTs efficiently propagate dynamics in the latent space, emphasized by inverse encoding and decoding techniques. Finally, UPTs allow for queries of the latent space representation at any point in space-time. We demonstrate diverse applicability and efficacy of UPTs in mesh-based fluid simulations, and steady-state Reynolds averaged Navier-Stokes simulations, and Lagrangian-based dynamics.
△ Less
Submitted 30 April, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Can machine learning predict citizen-reported angler behavior?
Authors:
Julia S. Schmid,
Sean Simmons,
Mark A. Lewis,
Mark S. Poesch,
Pouria Ramazi
Abstract:
Prediction of angler behaviors, such as catch rates and angler pressure, is essential to maintaining fish populations and ensuring angler satisfaction. Angler behavior can partly be tracked by online platforms and mobile phone applications that provide fishing activities reported by recreational anglers. Moreover, angler behavior is known to be driven by local site attributes. Here, the prediction…
▽ More
Prediction of angler behaviors, such as catch rates and angler pressure, is essential to maintaining fish populations and ensuring angler satisfaction. Angler behavior can partly be tracked by online platforms and mobile phone applications that provide fishing activities reported by recreational anglers. Moreover, angler behavior is known to be driven by local site attributes. Here, the prediction of citizen-reported angler behavior was investigated by machine-learning methods using auxiliary data on the environment, socioeconomics, fisheries management objectives, and events at a freshwater body. The goal was to determine whether auxiliary data alone could predict the reported behavior. Different spatial and temporal extents and temporal resolutions were considered. Accuracy scores averaged 88% for monthly predictions at single water bodies and 86% for spatial predictions on a day in a specific region across Canada. At other resolutions and scales, the models only achieved low prediction accuracy of around 60%. The study represents a first attempt at predicting angler behavior in time and space at a large scale and establishes a foundation for potential future expansions in various directions.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Dependency-Aware Online Caching
Authors:
Julien Dallot,
Amirmehdi Jafari Fesharaki,
Maciej Pacut,
Stefan Schmid
Abstract:
We consider a variant of the online caching problem where the items exhibit dependencies among each other: an item can reside in the cache only if all its dependent items are also in the cache. The dependency relations can form any directed acyclic graph. These requirements arise e.g., in systems such as CacheFlow (SOSR 2016) that cache forwarding rules for packet classification in IP-based commun…
▽ More
We consider a variant of the online caching problem where the items exhibit dependencies among each other: an item can reside in the cache only if all its dependent items are also in the cache. The dependency relations can form any directed acyclic graph. These requirements arise e.g., in systems such as CacheFlow (SOSR 2016) that cache forwarding rules for packet classification in IP-based communication networks.
First, we present an optimal randomized online caching algorithm which accounts for dependencies among the items. Our randomized algorithm is $O( \log k)$-competitive, where $k$ is the size of the cache, meaning that our algorithm never incurs the cost of $O(\log k)$ times higher than even an optimal algorithm that knows the future input sequence.
Second, we consider the bypassing model, where requests can be served at a fixed price without fetching the item and its dependencies into the cache -- a variant of caching with dependencies introduced by Bienkowski et al. at SPAA 2017. For this setting, we give an $O( \sqrt{k \cdot \log k})$-competitive algorithm, which significantly improves the best known competitiveness. We conduct a small case study, to find out that our algorithm incurs on average 2x lower cost.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Approximation Algorithms for Minimizing Congestion in Demand-Aware Networks
Authors:
Wenkai Dai,
Michael Dinitz,
Klaus-Tycho Foerster,
Long Luo,
Stefan Schmid
Abstract:
Emerging reconfigurable optical communication technologies allow to enhance datacenter topologies with demand-aware links optimized towards traffic patterns. This paper studies the algorithmic problem of jointly optimizing topology and routing in such demand-aware networks to minimize congestion, along two dimensions: (1) splittable or unsplittable flows, and (2) whether routing is segregated, i.e…
▽ More
Emerging reconfigurable optical communication technologies allow to enhance datacenter topologies with demand-aware links optimized towards traffic patterns. This paper studies the algorithmic problem of jointly optimizing topology and routing in such demand-aware networks to minimize congestion, along two dimensions: (1) splittable or unsplittable flows, and (2) whether routing is segregated, i.e., whether routes can or cannot combine both demand-aware and demand-oblivious (static) links.
For splittable and segregated routing, we show that the problem is generally $2$-approximable, but APX-hard even for uniform demands induced by a bipartite demand graph. For unsplittable and segregated routing, we establish upper and lower bounds of $O\left(\log m/ \log\log m \right)$ and $Ω\left(\log m/ \log\log m \right)$, respectively, for polynomial-time approximation algorithms, where $m$ is the number of static links. We further reveal that under un-/splittable and non-segregated routing, even for demands of a single source (resp., destination), the problem cannot be approximated better than $Ω\left(\frac{c_{\max}}{c_{\min}} \right)$ unless P=NP, where $c_{\max}$ (resp., $c_{\min}$) denotes the maximum (resp., minimum) capacity. It remains NP-hard for uniform capacities, but is tractable for a single commodity and uniform capacities.
Our trace-driven simulations show a significant reduction in network congestion compared to existing solutions.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Credence: Augmenting Datacenter Switch Buffer Sharing with ML Predictions
Authors:
Vamsi Addanki,
Maciej Pacut,
Stefan Schmid
Abstract:
Packet buffers in datacenter switches are shared across all the switch ports in order to improve the overall throughput. The trend of shrinking buffer sizes in datacenter switches makes buffer sharing extremely challenging and a critical performance issue. Literature suggests that push-out buffer sharing algorithms have significantly better performance guarantees compared to drop-tail algorithms.…
▽ More
Packet buffers in datacenter switches are shared across all the switch ports in order to improve the overall throughput. The trend of shrinking buffer sizes in datacenter switches makes buffer sharing extremely challenging and a critical performance issue. Literature suggests that push-out buffer sharing algorithms have significantly better performance guarantees compared to drop-tail algorithms. Unfortunately, switches are unable to benefit from these algorithms due to lack of support for push-out operations in hardware. Our key observation is that drop-tail buffers can emulate push-out buffers if the future packet arrivals are known ahead of time. This suggests that augmenting drop-tail algorithms with predictions about the future arrivals has the potential to significantly improve performance.
This paper is the first research attempt in this direction. We propose Credence, a drop-tail buffer sharing algorithm augmented with machine-learned predictions. Credence can unlock the performance only attainable by push-out algorithms so far. Its performance hinges on the accuracy of predictions. Specifically, Credence achieves near-optimal performance of the best known push-out algorithm LQD (Longest Queue Drop) with perfect predictions, but gracefully degrades to the performance of the simplest drop-tail algorithm Complete Sharing when the prediction error gets arbitrarily worse. Our evaluations show that Credence improves throughput by $1.5$x compared to traditional approaches. In terms of flow completion times, we show that Credence improves upon the state-of-the-art approaches by up to $95\%$ using off-the-shelf machine learning techniques that are also practical in today's hardware. We believe this work opens several interesting future work opportunities both in systems and theory that we discuss at the end of this paper.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Heterogeneous Graph-based Trajectory Prediction using Local Map Context and Social Interactions
Authors:
Daniel Grimm,
Maximilian Zipfl,
Felix Hertlein,
Alexander Naumann,
Jürgen Lüttin,
Steffen Thoma,
Stefan Schmid,
Lavdim Halilaj,
Achim Rettinger,
J. Marius Zöllner
Abstract:
Precisely predicting the future trajectories of surrounding traffic participants is a crucial but challenging problem in autonomous driving, due to complex interactions between traffic agents, map context and traffic rules. Vector-based approaches have recently shown to achieve among the best performances on trajectory prediction benchmarks. These methods model simple interactions between traffic…
▽ More
Precisely predicting the future trajectories of surrounding traffic participants is a crucial but challenging problem in autonomous driving, due to complex interactions between traffic agents, map context and traffic rules. Vector-based approaches have recently shown to achieve among the best performances on trajectory prediction benchmarks. These methods model simple interactions between traffic agents but don't distinguish between relation-type and attributes like their distance along the road. Furthermore, they represent lanes only by sequences of vectors representing center lines and ignore context information like lane dividers and other road elements. We present a novel approach for vector-based trajectory prediction that addresses these shortcomings by leveraging three crucial sources of information: First, we model interactions between traffic agents by a semantic scene graph, that accounts for the nature and important features of their relation. Second, we extract agent-centric image-based map features to model the local map context. Finally, we generate anchor paths to enforce the policy in multi-modal prediction to permitted trajectories only. Each of these three enhancements shows advantages over the baseline model HoliGraph.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
On the Complexity of the Virtual Network Embedding in Specific Tree Topologies
Authors:
Sergey Pankratov,
Vitaly Aksenov,
Stefan Schmid
Abstract:
Virtual networks are an innovative abstraction that extends cloud computing concepts to the network: by supporting bandwidth reservations between compute nodes (e.g., virtual machines), virtual networks can provide a predictable performance to distributed and communication-intensive cloud applications. However, in order to make the most efficient use of the shared resources, the Virtual Network Em…
▽ More
Virtual networks are an innovative abstraction that extends cloud computing concepts to the network: by supporting bandwidth reservations between compute nodes (e.g., virtual machines), virtual networks can provide a predictable performance to distributed and communication-intensive cloud applications. However, in order to make the most efficient use of the shared resources, the Virtual Network Embedding (VNE) problem has to be solved: a virtual network should be mapped onto the given physical network so that resource reservations are minimized. The problem has been studied intensively already and is known to be NP-hard in general. In this paper, we revisit this problem and consider it on specific topologies, as they often arise in practice. To be more precise, we study the weighted version of the VNE problem: we consider a virtual weighted network of a specific topology which we want to embed onto a weighted network with capacities and specific topology. As for topologies, we consider most fundamental and commonly used ones: line, star, $2$-tiered star, oversubscribed $2$-tiered star, and tree, in addition to also considering arbitrary topologies. We show that typically the VNE problem is NP-hard even in more specialized cases, however, sometimes there exists a polynomial algorithm: for example, an embedding of the oversubscribed $2$-tiered star onto the tree is polynomial while an embedding of an arbitrary $2$-tiered star is not.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Demand-Aware Network Design with Steiner Nodes and a Connection to Virtual Network Embedding
Authors:
Aleksander Figiel,
Janne H. Korhonen,
Neil Olver,
Stefan Schmid
Abstract:
Emerging optical and virtualization technologies enable the design of more flexible and demand-aware networked systems, in which resources can be optimized toward the actual workload they serve. For example, in a demand-aware datacenter network, frequently communicating nodes (e.g., two virtual machines or a pair of racks in a datacenter) can be placed topologically closer, reducing communication…
▽ More
Emerging optical and virtualization technologies enable the design of more flexible and demand-aware networked systems, in which resources can be optimized toward the actual workload they serve. For example, in a demand-aware datacenter network, frequently communicating nodes (e.g., two virtual machines or a pair of racks in a datacenter) can be placed topologically closer, reducing communication costs and hence improving the overall network performance.
This paper revisits the bounded-degree network design problem underlying such demand-aware networks. Namely, given a distribution over communicating server pairs, we want to design a network with bounded maximum degree that minimizes expected communication distance. In addition to this known problem, we introduce and study a variant where we allow Steiner nodes (i.e., additional routers) to be added to augment the network.
We improve the understanding of this problem domain in several ways. First, we shed light on the complexity and hardness of the aforementioned problems, and study a connection between them and the virtual networking embedding problem. We then provide a constant-factor approximation algorithm for the Steiner node version of the problem, and use it to improve over prior state-of-the-art algorithms for the original version of the problem with sparse communication distributions. Finally, we investigate various heuristic approaches to bounded-degree network design problem, in particular providing a reliable heuristic algorithm with good experimental performance.
We report on an extensive empirical evaluation, using several real-world traffic traces from datacenters, and find that our approach results in improved demand-aware network designs.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
RIFO: Pushing the Efficiency of Programmable Packet Schedulers
Authors:
Habib Mostafaei,
Maciej Pacut,
Stefan Schmid
Abstract:
Packet scheduling is a fundamental networking task that recently received renewed attention in the context of programmable data planes. Programmable packet scheduling systems such as those based on Push-In First-Out (PIFO) abstraction enabled flexible scheduling policies, but are too resource-expensive for large-scale line rate operation. This prompted research into practical programmable schedule…
▽ More
Packet scheduling is a fundamental networking task that recently received renewed attention in the context of programmable data planes. Programmable packet scheduling systems such as those based on Push-In First-Out (PIFO) abstraction enabled flexible scheduling policies, but are too resource-expensive for large-scale line rate operation. This prompted research into practical programmable schedulers (e.g., SP-PIFO, AIFO) approximating PIFO behavior on regular hardware. Yet, their scalability remains limited due to extensive number of memory operations. To address this, we design an effective yet resource-efficient packet scheduler, Range-In First-Out (RIFO), which uses only three mutable memory cells and one FIFO queue per PIFO queue. RIFO is based on multi-criteria decision-making principles and uses small guaranteed admission buffers. Our large-scale simulations in Netbench demonstrate that despite using fewer resources, RIFO generally achieves competitive flow completion times across all studied workloads, and is especially effective in workloads with a significant share of large flows, reducing flow completion time up to 2.9x in Datamining workloads compared to state-of-the-art solutions. Our prototype implementation using P4 on Tofino switches requires only 650 lines of code, is scalable, and runs at line rate.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Robust Routing Made Easy: Reinforcing Networks Against Non-Benign Faults
Authors:
Christoph Lenzen,
Moti Medina,
Mehrdad Saberi,
Stefan Schmid
Abstract:
With the increasing scale of communication networks, the likelihood of failures grows as well. Since these networks form a critical backbone of our digital society, it is important that they rely on robust routing algorithms which ensure connectivity despite such failures. While most modern communication networks feature robust routing mechanisms, these mechanisms are often fairly complex to desig…
▽ More
With the increasing scale of communication networks, the likelihood of failures grows as well. Since these networks form a critical backbone of our digital society, it is important that they rely on robust routing algorithms which ensure connectivity despite such failures. While most modern communication networks feature robust routing mechanisms, these mechanisms are often fairly complex to design and verify, as they need to account for the effects of failures and rerouting on communication.
This paper conceptualizes the design of robust routing mechanisms, with the aim to avoid such complexity. In particular, we showcase \emph{simple} and generic blackbox transformations that increase resilience of routing against independently distributed failures, which allows to simulate the routing scheme on the original network, even in the presence of non-benign node failures (henceforth called faults). This is attractive as the system specification and routing policy can simply be preserved.
We present a scheme for constructing such a reinforced network, given an existing (synchronous) network and a routing scheme. We prove that this algorithm comes with small constant overheads, and only requires a minimal amount of additional node and edge resources; in fact, if the failure probability is smaller than $1/n$, the algorithm can come without any overhead at all. At the same time, it allows to tolerate a large number of independent random (node) faults, asymptotically almost surely. We complement our analytical results with simulations on different real-world topologies.
△ Less
Submitted 9 July, 2023;
originally announced July 2023.
-
MCQUIC -- A Multicast Extension for QUIC
Authors:
Max Franke,
Jake Holland,
Stefan Schmid
Abstract:
Mass live content, such as world cups, the Superbowl or the Olympics, attract audiences of hundreds of millions of viewers. While such events were predominantly consumed on TV, more and more viewers follow big events on the Internet, which poses a scalability challenge: current unicast delivery over the web comes with large overheads and is inefficient. An attractive alternative are multicast-base…
▽ More
Mass live content, such as world cups, the Superbowl or the Olympics, attract audiences of hundreds of millions of viewers. While such events were predominantly consumed on TV, more and more viewers follow big events on the Internet, which poses a scalability challenge: current unicast delivery over the web comes with large overheads and is inefficient. An attractive alternative are multicast-based transmissions, however, current solutions have several drawbacks, mostly related to security and privacy, which prevent them from being implemented in browsers.
In this paper we introduce a multicast extension to QUIC, a widely popular transport protocol standardized by the IETF, that solves several of these problems. It enables multicast delivery by offering encryption as well as integrity verification of packets distributed over multicast and automatic unicast fallback, which solves one of multicasts major obstacles to large scale deployment. It is transparent to applications and can be easily utilized by simply enabling an option in QUIC. This extension is soley focused on the transport layer and uses already existing multicast mechanisms on the network layer.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
Dynamic Probabilistic Reliable Broadcast
Authors:
Veronika Anikina,
João Paulo Bezerra,
Petr Kuznetsov,
Liron Schiff,
Stefan Schmid
Abstract:
Byzantine reliable broadcast is a primitive that allows a set of processes to agree on a message broadcast by a dedicated source process, even when some of them are malicious (Byzantine). It guarantees that no two correct processes deliver different messages, and if a message is delivered by a correct process, every correct process eventually delivers one. The primitive is known not to scale, as i…
▽ More
Byzantine reliable broadcast is a primitive that allows a set of processes to agree on a message broadcast by a dedicated source process, even when some of them are malicious (Byzantine). It guarantees that no two correct processes deliver different messages, and if a message is delivered by a correct process, every correct process eventually delivers one. The primitive is known not to scale, as it requires $Ω(n^2)$ message exchanges, where $n$ is the number of system members. The quadratic cost can be explained by the inherent need for every process to relay a message to every other process.
In this paper, we explore ways to overcome this limitation, by casting the problem to the probabilistic setting. We propose a solution in which every broadcast message is validated by a small set of witnesses, which allows us to maintain low latency and small communication complexity. In order to tolerate a slow adaptive adversary, we dynamically select witnesses through a novel use of locality-preserving hash functions. Our simulations demonstrate significant scalability gains of our solution with respect to existing protocols.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
A Subquadratic Bound for Online Bisection
Authors:
Marcin Bienkowski,
Stefan Schmid
Abstract:
The online bisection problem is a natural dynamic variant of the classic optimization problem, where one has to dynamically maintain a partition of $n$ elements into two clusters of cardinality $n/2$. During runtime, an online algorithm is given a sequence of requests, each being a pair of elements: an inter-cluster request costs one unit while an intra-cluster one is free. The algorithm may chang…
▽ More
The online bisection problem is a natural dynamic variant of the classic optimization problem, where one has to dynamically maintain a partition of $n$ elements into two clusters of cardinality $n/2$. During runtime, an online algorithm is given a sequence of requests, each being a pair of elements: an inter-cluster request costs one unit while an intra-cluster one is free. The algorithm may change the partition, paying a unit cost for each element that changes its cluster.
This natural problem admits a simple deterministic $O(n^2)$-competitive algorithm [Avin et al., DISC 2016]. While several significant improvements over this result have been obtained since the original work, all of them either limit the generality of the input or assume some form of resource augmentation (e.g., larger clusters). Moreover, the algorithm of Avin et al. achieves the best known competitive ratio even if randomization is allowed.
In this paper, we present the first randomized online algorithm that breaks this natural quadratic barrier and achieves a competitive ratio of $\tilde{O}(n^{23/12})$ without resource augmentation and for an arbitrary sequence of requests.
△ Less
Submitted 15 March, 2024; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Polylog-Competitive Algorithms for Dynamic Balanced Graph Partitioning for Ring Demands
Authors:
Harald Räcke,
Stefan Schmid,
Ruslan Zabrodin
Abstract:
The performance of many large-scale and data-intensive distributed systems critically depends on the capacity of the interconnecting network. This paper is motivated by the vision of self-adjusting infrastructures whose resources can be adjusted according to the workload they currently serve, in a demand-aware manner. Such dynamic adjustments can be exploited to improve network utilization and hen…
▽ More
The performance of many large-scale and data-intensive distributed systems critically depends on the capacity of the interconnecting network. This paper is motivated by the vision of self-adjusting infrastructures whose resources can be adjusted according to the workload they currently serve, in a demand-aware manner. Such dynamic adjustments can be exploited to improve network utilization and hence performance, by dynamically moving frequently interacting communication partners closer, e.g., collocating them in the same server or datacenter rack.
In particular, we revisit the online balanced graph partitioning problem which captures the fundamental tradeoff between the benefits and costs of dynamically collocating communication partners. The demand is modelled as a sequence $σ$ (revealed in an online manner) of communication requests between $n$ processes, each of which is running on one of the $\ell$ servers. Each server has capacity $k=n/\ell$, hence, the processes have to be scheduled in a balanced manner across the servers. A request incurs cost $1$, if the requested processes are located on different servers, otherwise the cost is 0. A process can be migrated to a different server at cost $1$.
This paper presents the first online algorithm for online balanced graph partitioning achieving a polylogarithmic competitive ratio for the fundamental case of ring communication patterns. Specifically, our main contribution is a $O(\log^3 n)$-competitive randomized online algorithm for this problem. We further present a randomized online algorithm which is $O(\log^2 n)$-competitive when compared to a static optimal solution. Our two results rely on different algorithms and techniques and hence are of independent interest.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Topological Characterization of Consensus Solvability in Directed Dynamic Networks
Authors:
Hugo Rincon Galeana,
Ulrich Schmid,
Kyrill Winkler,
Ami Paz,
Stefan Schmid
Abstract:
Consensus is one of the most fundamental problems in distributed computing. This paper studies the consensus problem in a synchronous dynamic directed network, in which communication is controlled by an oblivious message adversary. The question when consensus is possible in this model has already been studied thoroughly in the literature from a combinatorial perspective, and is known to be challen…
▽ More
Consensus is one of the most fundamental problems in distributed computing. This paper studies the consensus problem in a synchronous dynamic directed network, in which communication is controlled by an oblivious message adversary. The question when consensus is possible in this model has already been studied thoroughly in the literature from a combinatorial perspective, and is known to be challenging. This paper presents a topological perspective on consensus solvability under oblivious message adversaries, which provides interesting new insights. Our main contribution is a topological characterization of consensus solvability, which also leads to explicit decision procedures. Our approach is based on the novel notion of a communication pseudosphere, which can be seen as the message-passing analog of the well-known standard chromatic subdivision for wait-free shared memory systems. We further push the elegance and expressiveness of the "geometric" reasoning enabled by the topological approach by dealing with uninterpreted complexes, which considerably reduce the size of the protocol complex, and by labeling facets with information flow arrows, which give an intuitive meaning to the implicit epistemic status of the faces in a protocol complex.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Toward Self-Adjusting k-ary Search Tree Networks
Authors:
Evgenii Feder,
Anton Paramonov,
Pavel Mavrin,
Iosif Salem,
Stefan Schmid,
Vitaly Aksenov
Abstract:
Datacenter networks are becoming increasingly flexible with the incorporation of new networking technologies, such as optical circuit switches. These technologies allow for programmable network topologies that can be reconfigured to better serve network traffic, thus enabling a trade-off between the benefits (i.e., shorter routes) and costs of reconfigurations (i.e., overhead). Self-Adjusting Netw…
▽ More
Datacenter networks are becoming increasingly flexible with the incorporation of new networking technologies, such as optical circuit switches. These technologies allow for programmable network topologies that can be reconfigured to better serve network traffic, thus enabling a trade-off between the benefits (i.e., shorter routes) and costs of reconfigurations (i.e., overhead). Self-Adjusting Networks (SANs) aim at addressing this trade-off by exploiting patterns in network traffic, both when it is revealed piecewise (online dynamic topologies) or known in advance (offline static topologies). In this paper, we take the first steps toward Self-Adjusting k-ary tree networks. These are more powerful generalizations of existing binary search tree networks (like SplayNets), which have been at the core of SAN designs. k-ary search tree networks are a natural generalization offering nodes of higher degrees, reduced route lengths for a fixed number of nodes, and local routing in spite of reconfigurations. We first compute an offline (optimal) static network for arbitrary traffic patterns in $O(n^3 \cdot k)$ time via dynamic programming, and also improve the bound to $O(n^2 \cdot k)$ for the special case of uniformly distributed traffic. Then, we present a centroid-based topology of the network that can be used both in the offline static and the online setting. In the offline uniform-workload case, we construct this quasi-optimal network in linear time $O(n)$ and, finally, we present online self-adjusting k-ary search tree versions of SplayNet. We evaluate experimentally our new structure for $k=2$ (allowing for a comparison with existing SplayNets) on real and synthetic network traces. Our results show that this approach works better than SplayNet in most of the real network traces and in average to low locality synthetic traces, and is only little inferior to SplayNet in all remaining traces.
△ Less
Submitted 26 June, 2024; v1 submitted 25 February, 2023;
originally announced February 2023.
-
Time Complexity of Broadcast and Consensus for Randomized Oblivious Message Adversaries
Authors:
Antoine El-Hayek,
Monika Henzinger,
Stefan Schmid
Abstract:
Broadcast and consensus are most fundamental tasks in distributed computing. These tasks are particularly challenging in dynamic networks where communication across the network links may be unreliable, e.g., due to mobility or failures. Indeed, over the last years, researchers have derived several impossibility results and high time complexity lower bounds (i.e., linear in the number of nodes $n$)…
▽ More
Broadcast and consensus are most fundamental tasks in distributed computing. These tasks are particularly challenging in dynamic networks where communication across the network links may be unreliable, e.g., due to mobility or failures. Indeed, over the last years, researchers have derived several impossibility results and high time complexity lower bounds (i.e., linear in the number of nodes $n$) for these tasks, even for oblivious message adversaries where communication networks are rooted trees. However, such deterministic adversarial models may be overly conservative, as many processes in real-world settings are stochastic in nature rather than worst case.
This paper initiates the study of broadcast and consensus on stochastic dynamic networks, introducing a randomized oblivious message adversary. Our model is reminiscent of the SI model in epidemics, however, revolving around trees (which renders the analysis harder due to the apparent lack of independence). In particular, we show that if information dissemination occurs along random rooted trees, broadcast and consensus complete fast with high probability, namely in logarithmic time. Our analysis proves the independence of a key variable, which enables a formal understanding of the dissemination process.
More formally, for a network with $n$ nodes, we first consider the completely random case where in each round the communication network is chosen uniformly at random among rooted trees. We then introduce the notion of randomized oblivious message adversary, where in each round, an adversary can choose $k$ edges to appear in the communication network, and then a rooted tree is chosen uniformly at random among the set of all rooted trees that include these edges. We show that broadcast completes in $O(k+\log n)$ rounds, and that this it is also the case for consensus as long as $k \le 0.1n$.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
Online Algorithms with Randomly Infused Advice
Authors:
Yuval Emek,
Yuval Gil,
Maciej Pacut,
Stefan Schmid
Abstract:
We introduce a novel method for the rigorous quantitative evaluation of online algorithms that relaxes the "radical worst-case" perspective of classic competitive analysis. In contrast to prior work, our method, referred to as randomly infused advice (RIA), does not make any probabilistic assumptions about the input sequence and does not rely on the development of designated online algorithms. Rat…
▽ More
We introduce a novel method for the rigorous quantitative evaluation of online algorithms that relaxes the "radical worst-case" perspective of classic competitive analysis. In contrast to prior work, our method, referred to as randomly infused advice (RIA), does not make any probabilistic assumptions about the input sequence and does not rely on the development of designated online algorithms. Rather, it can be applied to existing online randomized algorithms, introducing a means to evaluate their performance in scenarios that lie outside the radical worst-case regime. More concretely, an online algorithm ALG with RIA benefits from pieces of advice generated by an omniscient but not entirely reliable oracle. The crux of the new method is that the advice is provided to ALG by writing it into the buffer B from which ALG normally reads its random bits, hence allowing us to augment it through a very simple and non-intrusive interface. The (un)reliability of the oracle is captured via a parameter 0 {\le} α {\le} 1 that determines the probability (per round) that the advice is successfully infused by the oracle; if the advice is not infused, which occurs with probability 1 - α, then the buffer B contains fresh random bits (as in the classic online setting).
The applicability of the new RIA method is demonstrated by applying it to three extensively studied online problems: paging, uniform metrical task systems, and online set cover. For these problems, we establish new upper bounds on the competitive ratio of classic online algorithms that improve as the infusion parameter α increases. These are complemented with (often tight) lower bounds on the competitive ratio of online algorithms with RIA for the three problems.
△ Less
Submitted 5 August, 2023; v1 submitted 10 February, 2023;
originally announced February 2023.
-
Performance Analysis of Machine Learning Centered Workload Prediction Models for Cloud
Authors:
Deepika Saxena,
Jitendra Kumar,
Ashutosh Kumar Singh,
Stefan Schmid
Abstract:
The precise estimation of resource usage is a complex and challenging issue due to the high variability and dimensionality of heterogeneous service types and dynamic workloads. Over the last few years, the prediction of resource usage and traffic has received ample attention from the research community. Many machine learning-based workload forecasting models have been developed by exploiting their…
▽ More
The precise estimation of resource usage is a complex and challenging issue due to the high variability and dimensionality of heterogeneous service types and dynamic workloads. Over the last few years, the prediction of resource usage and traffic has received ample attention from the research community. Many machine learning-based workload forecasting models have been developed by exploiting their computational power and learning capabilities. This paper presents the first systematic survey cum performance analysis-based comparative study of diversified machine learning-driven cloud workload prediction models. The discussion initiates with the significance of predictive resource management followed by a schematic description, operational design, motivation, and challenges concerning these workload prediction models. Classification and taxonomy of different prediction approaches into five distinct categories are presented focusing on the theoretical concepts and mathematical functioning of the existing state-of-the-art workload prediction methods. The most prominent prediction approaches belonging to a distinct class of machine learning models are thoroughly surveyed and compared. All five classified machine learning-based workload prediction models are implemented on a common platform for systematic investigation and comparison using three distinct benchmark cloud workload traces via experimental analysis. The essential key performance indicators of state-of-the-art approaches are evaluated for comparison and the paper is concluded by discussing the trade-offs and notable remarks.
△ Less
Submitted 5 February, 2023;
originally announced February 2023.
-
Dynamic Demand-Aware Link Scheduling for Reconfigurable Datacenters
Authors:
Kathrin Hanauer,
Monika Henzinger,
Lara Ost,
Stefan Schmid
Abstract:
Emerging reconfigurable datacenters allow to dynamically adjust the network topology in a demand-aware manner. These datacenters rely on optical switches which can be reconfigured to provide direct connectivity between racks, in the form of edge-disjoint matchings. While state-of-the-art optical switches in principle support microsecond reconfigurations, the demand-aware topology optimization cons…
▽ More
Emerging reconfigurable datacenters allow to dynamically adjust the network topology in a demand-aware manner. These datacenters rely on optical switches which can be reconfigured to provide direct connectivity between racks, in the form of edge-disjoint matchings. While state-of-the-art optical switches in principle support microsecond reconfigurations, the demand-aware topology optimization constitutes a bottleneck.
This paper proposes a dynamic algorithms approach to improve the performance of reconfigurable datacenter networks, by supporting faster reactions to changes in the traffic demand. This approach leverages the temporal locality of traffic patterns in order to update the interconnecting matchings incrementally, rather than recomputing them from scratch. In particular, we present six (batch-)dynamic algorithms and compare them to static ones. We conduct an extensive empirical evaluation on 176 synthetic and 39 real-world traces, and find that dynamic algorithms can both significantly improve the running time and reduce the number of changes to the configuration, especially in networks with high temporal locality, while retaining matching weight.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
SeedTree: A Dynamically Optimal and Local Self-Adjusting Tree
Authors:
Arash Pourdamghani,
Chen Avin,
Robert Sama,
Stefan Schmid
Abstract:
We consider the fundamental problem of designing a self-adjusting tree, which efficiently and locally adapts itself towards the demand it serves (namely accesses to the items stored by the tree nodes), striking a balance between the benefits of such adjustments (enabling faster access) and their costs (reconfigurations). This problem finds applications, among others, in the context of emerging dem…
▽ More
We consider the fundamental problem of designing a self-adjusting tree, which efficiently and locally adapts itself towards the demand it serves (namely accesses to the items stored by the tree nodes), striking a balance between the benefits of such adjustments (enabling faster access) and their costs (reconfigurations). This problem finds applications, among others, in the context of emerging demand-aware and reconfigurable datacenter networks and features connections to self-adjusting data structures. Our main contribution is SeedTree, a dynamically optimal self-adjusting tree which supports local (i.e., greedy) routing, which is particularly attractive under highly dynamic demands. SeedTree relies on an innovative approach which defines a set of unique paths based on randomized item addresses, and uses a small constant number of items per node. We complement our analytical results by showing the benefits of SeedTree empirically, evaluating it on various synthetic and real-world communication traces.
△ Less
Submitted 8 January, 2023;
originally announced January 2023.
-
Dynamic Maintenance of Monotone Dynamic Programs and Applications
Authors:
Monika Henzinger,
Stefan Neumann,
Harald Räcke,
Stefan Schmid
Abstract:
Dynamic programming (DP) is one of the fundamental paradigms in algorithm design. However, many DP algorithms have to fill in large DP tables, represented by two-dimensional arrays, which causes at least quadratic running times and space usages. This has led to the development of improved algorithms for special cases when the DPs satisfy additional properties like, e.g., the Monge property or tota…
▽ More
Dynamic programming (DP) is one of the fundamental paradigms in algorithm design. However, many DP algorithms have to fill in large DP tables, represented by two-dimensional arrays, which causes at least quadratic running times and space usages. This has led to the development of improved algorithms for special cases when the DPs satisfy additional properties like, e.g., the Monge property or total monotonicity.
In this paper, we consider a new condition which assumes (among some other technical assumptions) that the rows of the DP table are monotone. Under this assumption, we introduce a novel data structure for computing $(1+\varepsilon)$-approximate DP solutions in near-linear time and space in the static setting, and with polylogarithmic update times when the DP entries change dynamically. To the best of our knowledge, our new condition is incomparable to previous conditions and is the first which allows to derive dynamic algorithms based on existing DPs. Instead of using two-dimensional arrays to store the DP tables, we store the rows of the DP tables using monotone piecewise constant functions. This allows us to store length-$n$ DP table rows with entries in $[0,W]$ using only polylog$(n,W)$ bits, and to perform operations, such as $(\min,+)$-convolution or rounding, on these functions in polylogarithmic time.
We further present several applications of our data structure. For bicriteria versions of $k$-balanced graph partitioning and simultaneous source location, we obtain the first dynamic algorithms with subpolynomial update times, as well as the first static algorithms using only near-linear time and space. Additionally, we obtain the currently fastest algorithm for fully dynamic knapsack.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
Relation-based Motion Prediction using Traffic Scene Graphs
Authors:
Maximilian Zipfl,
Felix Hertlein,
Achim Rettinger,
Steffen Thoma,
Lavdim Halilaj,
Juergen Luettin,
Stefan Schmid,
Cory Henson
Abstract:
Representing relevant information of a traffic scene and understanding its environment is crucial for the success of autonomous driving. Modeling the surrounding of an autonomous car using semantic relations, i.e., how different traffic participants relate in the context of traffic rule based behaviors, is hardly been considered in previous work. This stems from the fact that these relations are h…
▽ More
Representing relevant information of a traffic scene and understanding its environment is crucial for the success of autonomous driving. Modeling the surrounding of an autonomous car using semantic relations, i.e., how different traffic participants relate in the context of traffic rule based behaviors, is hardly been considered in previous work. This stems from the fact that these relations are hard to extract from real-world traffic scenes. In this work, we model traffic scenes in a form of spatial semantic scene graphs for various different predictions about the traffic participants, e.g., acceleration and deceleration. Our learning and inference approach uses Graph Neural Networks (GNNs) and shows that incorporating explicit information about the spatial semantic relations between traffic participants improves the predicdtion results. Specifically, the acceleration prediction of traffic participants is improved by up to 12% compared to the baselines, which do not exploit this explicit information. Furthermore, by including additional information about previous scenes, we achieve 73% improvements.
△ Less
Submitted 24 November, 2022;
originally announced December 2022.
-
Brief Announcement: Broadcasting Time in Dynamic Rooted Trees is Linear
Authors:
Antoine El-Hayek,
Monika Henzinger,
Stefan Schmid
Abstract:
We study the broadcast problem on dynamic networks with $n$ processes. The processes communicate in synchronous rounds along an arbitrary rooted tree. The sequence of trees is given by an adversary whose goal is to maximize the number of rounds until at least one process reaches all other processes. Previous research has shown a $\lceil{\frac{3n-1}{2}}\rceil-2$ lower bound and an $O(n\log\log n)$…
▽ More
We study the broadcast problem on dynamic networks with $n$ processes. The processes communicate in synchronous rounds along an arbitrary rooted tree. The sequence of trees is given by an adversary whose goal is to maximize the number of rounds until at least one process reaches all other processes. Previous research has shown a $\lceil{\frac{3n-1}{2}}\rceil-2$ lower bound and an $O(n\log\log n)$ upper bound. We show the first linear upper bound for this problem, namely $\lceil{(1 + \sqrt 2) n-1}\rceil \approx 2.4n$. Our result follows from a detailed analysis of the evolution of the adjacency matrix of the network over time.
△ Less
Submitted 30 January, 2023; v1 submitted 21 November, 2022;
originally announced November 2022.
-
Asymptotically Tight Bounds on the Time Complexity of Broadcast and its Variants in Dynamic Networks
Authors:
Antoine El-Hayek,
Monika Henzinger,
Stefan Schmid
Abstract:
Data dissemination is a fundamental task in distributed computing. This paper studies broadcast problems in various innovative models where the communication network connecting $n$ processes is dynamic (e.g., due to mobility or failures) and controlled by an adversary.
In the first model, the processes transitively communicate their ids in synchronous rounds along a rooted tree given in each rou…
▽ More
Data dissemination is a fundamental task in distributed computing. This paper studies broadcast problems in various innovative models where the communication network connecting $n$ processes is dynamic (e.g., due to mobility or failures) and controlled by an adversary.
In the first model, the processes transitively communicate their ids in synchronous rounds along a rooted tree given in each round by the adversary whose goal is to maximize the number of rounds until at least one id is known by all processes.
Previous research has shown a $\lceil{\frac{3n-1}{2}}\rceil-2$ lower bound and an $O(n\log\log n)$ upper bound. We show the first linear upper bound for this problem, namely $\lceil{(1 + \sqrt 2) n-1}\rceil \approx 2.4n$.
We extend these results to the setting where the adversary gives in each round $k$-disjoint forests and their goal is to maximize the number of rounds until there is a set of $k$ ids such that each process knows of at least one of them. We give a $\left\lceil{\frac{3(n-k)}{2}}\right\rceil-1$ lower bound and a $\frac{π^2+6}{6}n+1 \approx 2.6n$ upper bound for this problem.
Finally, we study the setting where the adversary gives in each round a directed graph with $k$ roots and their goal is to maximize the number of rounds until there exist $k$ ids that are known by all processes. We give a $\left\lceil{\frac{3(n-3k)}{2}}\right\rceil+2$ lower bound and a $\lceil { (1+\sqrt{2})n}\rceil+k-1 \approx 2.4n+k$ upper bound for this problem.
For the two latter problems no upper or lower bounds were previously known.
△ Less
Submitted 27 January, 2023; v1 submitted 18 November, 2022;
originally announced November 2022.
-
Chopin: Combining Distributed and Centralized Schedulers for Self-Adjusting Datacenter Networks
Authors:
Neta Rozen Schiff,
Klaus-Tycho Foerster,
Stefan Schmid,
David Hay
Abstract:
The performance of distributed and data-centric applications often critically depends on the interconnecting network. Emerging reconfigurable datacenter networks (RDCNs) are a particularly innovative approach to improve datacenter throughput. Relying on a dynamic optical topology which can be adjusted towards the workload in a demand-aware manner, RDCNs allow to exploit temporal and spatial locali…
▽ More
The performance of distributed and data-centric applications often critically depends on the interconnecting network. Emerging reconfigurable datacenter networks (RDCNs) are a particularly innovative approach to improve datacenter throughput. Relying on a dynamic optical topology which can be adjusted towards the workload in a demand-aware manner, RDCNs allow to exploit temporal and spatial locality in the communication pattern, and to provide topological shortcuts for frequently communicating racks. The key challenge, however, concerns how to realize demand-awareness in RDCNs in a scalable fashion.
This paper presents and evaluates Chopin, a hybrid scheduler for self-adjusting networks that provides demand-awareness at low overhead, by combining centralized and distributed approaches. Chopin allocates optical circuits to elephant flows, through its slower centralized scheduler, utilizing global information. Chopin's distributed scheduler is orders of magnitude faster and can swiftly react to changes in the traffic and adjust the optical circuits accordingly, by using only local information and running at each rack separately.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
The Augmentation-Speed Tradeoff for Consistent Network Updates
Authors:
Monika Henzinger,
Ami Paz,
Arash Pourdamghani,
Stefan Schmid
Abstract:
Emerging software-defined networking technologies enable more adaptive communication infrastructures, allowing for quick reactions to changes in networking requirements by exploiting the workload's temporal structure. However, operating networks adaptively is algorithmically challenging, as meeting networks' stringent dependability requirements relies on maintaining basic consistency and performan…
▽ More
Emerging software-defined networking technologies enable more adaptive communication infrastructures, allowing for quick reactions to changes in networking requirements by exploiting the workload's temporal structure. However, operating networks adaptively is algorithmically challenging, as meeting networks' stringent dependability requirements relies on maintaining basic consistency and performance properties, such as loop freedom and congestion minimization, even during the update process. This paper leverages an augmentation-speed tradeoff to significantly speed up consistent network updates. We show that allowing for a small and short (hence practically tolerable, e.g., using buffering) oversubscription of links allows us to solve many network update instances much faster, as well as to reduce computational complexities (i.e., the running times of the algorithms). We first explore this tradeoff formally, revealing the computational complexity of scheduling updates. We then present and analyze algorithms that maintain logical and performance properties during the update. Using an extensive simulation study, we find that the tradeoff is even more favorable in practice than our analytical bounds suggest. In particular, we find that by allowing just 10% augmentation, update times reduce by more than 32% on average, across a spectrum of real-world networks.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Online Admission Control and Rebalancing in Payment Channel Networks
Authors:
Mahsa Bastankhah,
Krishnendu Chatterjee,
Mohammad Ali Maddah-Ali,
Stefan Schmid,
Jakub Svoboda,
Michelle Yeo
Abstract:
Payment channel networks (PCNs) are a promising technology to improve the scalability of cryptocurrencies. PCNs, however, face the challenge that the frequent usage of certain routes may deplete channels in one direction, and hence prevent further transactions. In order to reap the full potential of PCNs, recharging and rebalancing mechanisms are required to provision channels, as well as an admis…
▽ More
Payment channel networks (PCNs) are a promising technology to improve the scalability of cryptocurrencies. PCNs, however, face the challenge that the frequent usage of certain routes may deplete channels in one direction, and hence prevent further transactions. In order to reap the full potential of PCNs, recharging and rebalancing mechanisms are required to provision channels, as well as an admission control logic to decide which transactions to reject in case capacity is insufficient. This paper presents a formal model of this optimisation problem. In particular, we consider an online algorithms perspective, where transactions arrive over time in an unpredictable manner. Our main contributions are competitive online algorithms which come with provable guarantees over time. We empirically evaluate our algorithms on randomly generated transactions to compare the average performance of our algorithms to our theoretical bounds. We also show how this model and approach differs from related problems in classic communication networks.
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
Optimizing Reconfigurable Optical Datacenters: The Power of Randomization
Authors:
Marcin Bienkowski,
David Fuchssteiner,
Stefan Schmid
Abstract:
Reconfigurable optical topologies are a promising new technology to improve datacenter network performance and cope with the explosive growth of traffic. In particular, these networks allow to directly and adaptively connect racks between which there is currently much traffic, hence making an optimal use of the bandwidth capacity by avoiding multi-hop forwarding.
This paper studies the dynamic o…
▽ More
Reconfigurable optical topologies are a promising new technology to improve datacenter network performance and cope with the explosive growth of traffic. In particular, these networks allow to directly and adaptively connect racks between which there is currently much traffic, hence making an optimal use of the bandwidth capacity by avoiding multi-hop forwarding.
This paper studies the dynamic optimization of such reconfigurable topologies, by adapting the network to the traffic in an online manner. The underlying algorithmic problem can be described as an online maximum weight $b$-matching problem, a~generalization of maximum weight matching where each node has at most $b \geq 1$ incident matching edges.
We make the case for a randomized approach for matching optimization. Our main contribution is a $O(\log b)$-competitive algorithm and we show that it is asymptotically optimal. This algorithm is hence exponentially better than the best possible deterministic online algorithm.
We complement our theoretical results with extensive trace-driven simulations, based on real-world datacenter workloads.
△ Less
Submitted 30 August, 2023; v1 submitted 5 September, 2022;
originally announced September 2022.
-
Model-Based Insights on the Performance, Fairness, and Stability of BBR
Authors:
Simon Scherrer,
Markus Legner,
Adrian Perrig,
Stefan Schmid
Abstract:
Google's BBR is the most prominent result of the recently revived quest for efficient, fair, and flexible congestion-control algorithms (CCAs). While the performance of BBR has been investigated by numerous studies, previous work still leaves gaps in the understanding of BBR performance: Experiment-based studies generally only consider network settings that researchers can set up with manageable e…
▽ More
Google's BBR is the most prominent result of the recently revived quest for efficient, fair, and flexible congestion-control algorithms (CCAs). While the performance of BBR has been investigated by numerous studies, previous work still leaves gaps in the understanding of BBR performance: Experiment-based studies generally only consider network settings that researchers can set up with manageable effort, and model-based studies neglect important issues like convergence.
To complement previous BBR analyses, this paper presents a fluid model of BBRv1 and BBRv2, allowing both efficient simulation under a wide variety of network settings and analytical treatment such as stability analysis. By experimental validation, we show that our fluid model provides highly accurate predictions of BBR behavior. Through extensive simulations and theoretical analysis, we arrive at several insights into both BBR versions, including a previously unknown bufferbloat issue in BBRv2.
△ Less
Submitted 23 August, 2022; v1 submitted 22 August, 2022;
originally announced August 2022.
-
Self-Adjusting Linear Networks with Ladder Demand Graph
Authors:
Anton Paramonov,
Iosif Salem,
Stefan Schmid,
Vitaly Aksenov
Abstract:
Self-adjusting networks (SANs) have the ability to adapt to communication demand by dynamically adjusting the workload (or demand) embedding, i.e., the mapping of communication requests into the network topology. SANs can thus reduce routing costs for frequently communicating node pairs by paying a cost for adjusting the embedding. This is particularly beneficial when the demand has structure, whi…
▽ More
Self-adjusting networks (SANs) have the ability to adapt to communication demand by dynamically adjusting the workload (or demand) embedding, i.e., the mapping of communication requests into the network topology. SANs can thus reduce routing costs for frequently communicating node pairs by paying a cost for adjusting the embedding. This is particularly beneficial when the demand has structure, which the network can adapt to. Demand can be represented in the form of a demand graph, which is defined by the set of network nodes (vertices) and the set of pairwise communication requests (edges). Thus, adapting to the demand can be interpreted by embedding the demand graph to the network topology. This can be challenging both when the demand graph is known in advance (offline) and when it revealed edge-by-edge (online). The difficulty also depends on whether we aim at constructing a static topology or a dynamic (self-adjusting) one that improves the embedding as more parts of the demand graph are revealed. Yet very little is known about these self-adjusting embeddings.
In this paper, the network topology is restricted to a line and the demand graph to a ladder graph, i.e., a $2^n$ grid, including all possible subgraphs of the ladder. We present an online self-adjusting network that matches the known lower bound asymptotically and is $12$-competitive in terms of request cost. As a warm up result, we present an asymptotically optimal algorithm for the cycle demand graph. We also present an oracle-based algorithm for an arbitrary demand graph that has a constant overhead.
△ Less
Submitted 23 February, 2023; v1 submitted 8 July, 2022;
originally announced July 2022.
-
Wiser: Increasing Throughput in Payment Channel Networks with Transaction Aggregation
Authors:
Samarth Tiwari,
Michelle Yeo,
Zeta Avarikioti,
Iosif Salem,
Krzysztof Pietrzak,
Stefan Schmid
Abstract:
Payment channel networks (PCNs) are one of the most prominent solutions to the limited transaction throughput of blockchains. Nevertheless, PCNs suffer themselves from a throughput limitation due to the capital constraints of their channels. A similar dependence on high capital is also found in inter-bank payment settlements, where the so-called netting technique is used to mitigate liquidity dema…
▽ More
Payment channel networks (PCNs) are one of the most prominent solutions to the limited transaction throughput of blockchains. Nevertheless, PCNs suffer themselves from a throughput limitation due to the capital constraints of their channels. A similar dependence on high capital is also found in inter-bank payment settlements, where the so-called netting technique is used to mitigate liquidity demands.
In this work, we alleviate this limitation by introducing the notion of transaction aggregation: instead of executing transactions sequentially through a PCN, we enable senders to aggregate multiple transactions and execute them simultaneously to benefit from several amounts that may "cancel out". Two direct advantages of our proposal is the decrease in intermediary fees paid by senders as well as the obfuscation of the transaction data from the intermediaries.
We formulate the transaction aggregation as a computational problem, a generalization of the Bank Clearing Problem. We present a generic framework for the transaction aggregation execution, and thereafter we propose Wiser as an implementation of this framework in a specific hub-based setting. To overcome the NP-hardness of the transaction aggregation problem, in Wiser we propose a fixed-parameter linear algorithm for a special case of transaction aggregation as well as the Bank Clearing Problem. Wiser can also be seen as a modern variant of the Hawala money transfer system, as well as a decentralized implementation of the overseas remittance service of Wise.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Weighted Packet Selection for Rechargeable Links: Complexity and Approximation
Authors:
Stefan Schmid,
Jakub Svoboda,
Michelle Yeo
Abstract:
We consider a natural problem dealing with weighted packet selection across a rechargeable link, which e.g., finds applications in cryptocurrency networks. The capacity of a link $(u,v)$ is determined by how much players $u$ and $v$ allocate for this link. Specifically, the input is a finite ordered sequence of packets that arrive in both directions along a link. Given $(u, v)$ and a packet of wei…
▽ More
We consider a natural problem dealing with weighted packet selection across a rechargeable link, which e.g., finds applications in cryptocurrency networks. The capacity of a link $(u,v)$ is determined by how much players $u$ and $v$ allocate for this link. Specifically, the input is a finite ordered sequence of packets that arrive in both directions along a link. Given $(u, v)$ and a packet of weight $x$ going from $u$ to $v$, player $u$ can either accept or reject the packet. If player $u$ accepts the packet, their capacity on link $(u,v)$ decreases by $x$. Correspondingly, player $v$ capacity on $(u,v)$ increases by $x$. If a player rejects the packet, this will entail a cost linear in the weight of the packet. A link is "rechargeable" in the sense that the total capacity of the link has to remain constant, but the allocation of capacity at the ends of the link can depend arbitrarily on players' decisions. The goal is to minimise the sum of the capacity injected into the link and the cost of rejecting packets. We show the problem is NP-hard, but can be approximated efficiently with a ratio of $(1+ \varepsilon)\cdot (1+\sqrt{3})$ for some arbitrary $\varepsilon >0$.
△ Less
Submitted 9 May, 2022; v1 submitted 28 April, 2022;
originally announced April 2022.
-
Deterministic Self-Adjusting Tree Networks Using Rotor Walks
Authors:
Chen Avin,
Marcin Bienkowski,
Iosif Salem,
Robert Sama,
Stefan Schmid,
Paweł Schmidt
Abstract:
We revisit the design of self-adjusting single-source tree networks. The problem can be seen as a generalization of the classic list update problem to trees, and finds applications in reconfigurable datacenter networks. We are given a fixed balanced binary tree T connecting n nodes V = {v_1, ... , v_n}. A source node v_0, attached to the root of the tree, issues communication requests to nodes in…
▽ More
We revisit the design of self-adjusting single-source tree networks. The problem can be seen as a generalization of the classic list update problem to trees, and finds applications in reconfigurable datacenter networks. We are given a fixed balanced binary tree T connecting n nodes V = {v_1, ... , v_n}. A source node v_0, attached to the root of the tree, issues communication requests to nodes in V, in an online and adversarial manner; the access cost of a request to a node v, is given by the current depth of v in T. The online algorithm can try to reduce the access cost by performing swap operations, with which the position of a node is exchanged with the position of its parent in the tree; a swap operation costs one unit. The objective is to design an online algorithm which minimizes the total access cost plus adjustment cost (swapping). Avin et al. recently presented Random-Push, a constant competitive online algorithm for this problem, based on random walks, together with an analysis exploiting the most recently used (MRU) property of random walks.
We study analytically and empirically, online algorithms for this problem. In particular, we explore how to derandomize Random-Push. We consider a simple derandomized algorithm which we call Rotor-Push, as its behavior is reminiscent of rotor walks. We first prove that Rotor-Push is constant competitive: its competitive ratio is 12 and hence by a factor of five lower than the best existing competitive ratio. In contrast to Random-Push, the algorithm does not feature the MRU property, which requires a new analysis. We present a significantly improved and simpler analysis for the randomized algorithm, showing that it is 16-competitive. We compare empirically all self-adjusting single-source tree networks, using synthetic and real data with varying locality and observe that Rotor-Push and Random-Push have almost identical performance.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
On the Price of Locality in Static Fast Rerouting
Authors:
Klaus-Tycho Foerster,
Juho Hirvonen,
Yvonne-Anne Pignolet,
Stefan Schmid,
Gilles Tredan
Abstract:
Modern communication networks feature fully decentralized flow rerouting mechanisms which allow them to quickly react to link failures. This paper revisits the fundamental algorithmic problem underlying such local fast rerouting mechanisms. Is it possible to achieve perfect resilience, i.e., to define local routing tables which preserve connectivity as long as the underlying network is still conne…
▽ More
Modern communication networks feature fully decentralized flow rerouting mechanisms which allow them to quickly react to link failures. This paper revisits the fundamental algorithmic problem underlying such local fast rerouting mechanisms. Is it possible to achieve perfect resilience, i.e., to define local routing tables which preserve connectivity as long as the underlying network is still connected? Feigenbaum et al. (ACM PODC'12) and Foerster et al. (SIAM APOCS'21) showed that, unfortunately, it is impossible in general.
This paper charts a more complete landscape of the feasibility of perfect resilience. We first show a perhaps surprisingly large price of locality in static fast rerouting mechanisms: even when source and destination remain connected by a linear number of link-disjoint paths after link failures, local rerouting algorithms cannot find any of them which leads to a disconnection on the routing level. This motivates us to study resilience in graphs which exclude certain dense minors, such as cliques or a complete bipartite graphs, and in particular, provide characterizations of the possibility of perfect resilience in different routing models. We provide further insights into the price of locality by showing impossibility results for few failures and investigate perfect resilience on Topology Zoo networks.
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
P4RROT: Generating P4 Code for the Application Layer
Authors:
Csaba Györgyi,
Sándor Laki,
Stefan Schmid
Abstract:
Throughput and latency critical applications could often benefit of performing computations close to the client. To enable this, distributed computing paradigms such as edge computing have recently emerged. However, with the advent of programmable data planes, computations cannot only be performed by servers but they can be offloaded to network switches. Languages like P4 enable to flexibly reprog…
▽ More
Throughput and latency critical applications could often benefit of performing computations close to the client. To enable this, distributed computing paradigms such as edge computing have recently emerged. However, with the advent of programmable data planes, computations cannot only be performed by servers but they can be offloaded to network switches. Languages like P4 enable to flexibly reprogram the entire packet processing pipeline. Though these devices promise high throughput and ultra-low response times, implementing application-layer tasks in the data plane programming language P4 is still challenging for an application developer who is not familiar with networking domain. In this paper, we first identify and examine obstacles and pain points one can experience when offloading server-based computations to the network. Then we present P4RROT, a code generator (in form of a library) which allows to overcome these limitations by providing a user-friendly API to describe computations to be offloaded. After discussing the design choices behind P4RROT, we introduce our proof-of-concept implementation for two P4 targets: Netronome SmartNIC and BMv2.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Mars: Near-Optimal Throughput with Shallow Buffers in Reconfigurable Datacenter Networks
Authors:
Vamsi Addanki,
Chen Avin,
Stefan Schmid
Abstract:
The performance of large-scale computing systems often critically depends on high-performance communication networks. Dynamically reconfigurable topologies, e.g., based on optical circuit switches, are emerging as an innovative new technology to deal with the explosive growth of datacenter traffic. Specifically, \emph{periodic} reconfigurable datacenter networks (RDCNs) such as RotorNet (SIGCOMM 2…
▽ More
The performance of large-scale computing systems often critically depends on high-performance communication networks. Dynamically reconfigurable topologies, e.g., based on optical circuit switches, are emerging as an innovative new technology to deal with the explosive growth of datacenter traffic. Specifically, \emph{periodic} reconfigurable datacenter networks (RDCNs) such as RotorNet (SIGCOMM 2017), Opera (NSDI 2020) and Sirius (SIGCOMM 2020) have been shown to provide high throughput, by emulating a \emph{complete graph} through fast periodic circuit switch scheduling.
However, to achieve such a high throughput, existing reconfigurable network designs pay a high price: in terms of potentially high delays, but also, as we show as a first contribution in this paper, in terms of the high buffer requirements. In particular, we show that under buffer constraints, emulating the high-throughput complete graph is infeasible at scale, and we uncover a spectrum of unvisited and attractive alternative RDCNs, which emulate regular graphs, but with lower node degree than the complete graph.
We present Mars, a periodic reconfigurable topology which emulates a $d$-regular graph with near-optimal throughput. In particular, we systematically analyze how the degree~$d$ can be optimized for throughput given the available buffer and delay tolerance of the datacenter. We further show empirically that Mars achieves higher throughput compared to existing systems when buffer sizes are bounded.
△ Less
Submitted 28 December, 2022; v1 submitted 5 April, 2022;
originally announced April 2022.
-
Time Complexity of Consensus in Dynamic Networks Under Oblivious Message Adversaries
Authors:
Ami Paz,
Hugo Rincon Galeana,
Stefan Schmid,
Ulrich Schmid,
Kyrill Winkler
Abstract:
Consensus is a most fundamental task in distributed computing. This paper studies the consensus problem for a set of processes connected by a dynamic directed network, in which computation and communication is lock-step synchronous but controlled by an oblivious message adversary. In this basic model, determining consensus solvability and designing consensus algorithms in the case where it is poss…
▽ More
Consensus is a most fundamental task in distributed computing. This paper studies the consensus problem for a set of processes connected by a dynamic directed network, in which computation and communication is lock-step synchronous but controlled by an oblivious message adversary. In this basic model, determining consensus solvability and designing consensus algorithms in the case where it is possible, has been shown to be surprisingly difficult. We present an explicit decision procedure to determine if consensus is possible under a given adversary. This in turn enables us, for the first time, to study the time complexity of consensus in this model. In particular, we derive time complexity upper bounds for consensus solvability both for a centralized decision procedure as well as for solving distributed consensus. We complement these results with time complexity lower bounds. Intriguingly, we find that reaching consensus under an oblivious message adversary can take exponentially longer than broadcasting the input value of some process to all other processes.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
Kevin: de Bruijn-based topology with demand-aware links and greedy routing
Authors:
Johannes Zerwas,
Csaba Györgyi,
Andreas Blenk,
Stefan Schmid,
Chen Avin
Abstract:
We propose Kevin, a novel demand-aware reconfigurable rack-to-rack datacenter network realized with a simple and efficient control plane. In particular, Kevin makes effective use of the network capacity by supporting integrated and multi-hop routing as well as work-conserving scheduling. To this end, Kevin relies on local greedy routing with small forwarding tables which require local updates only…
▽ More
We propose Kevin, a novel demand-aware reconfigurable rack-to-rack datacenter network realized with a simple and efficient control plane. In particular, Kevin makes effective use of the network capacity by supporting integrated and multi-hop routing as well as work-conserving scheduling. To this end, Kevin relies on local greedy routing with small forwarding tables which require local updates only during topological reconfigurations, making this approach ideal for dynamic networks. Specifically, Kevin is based on a de Bruijn topology (using a small number of optical circuit switches) in which static links are enhanced with opportunistic links.
△ Less
Submitted 23 February, 2022; v1 submitted 11 February, 2022;
originally announced February 2022.
-
A Centrality Analysis of the Lightning Network
Authors:
Philipp Zabka,
Klaus-Tycho Foerster,
Stefan Schmid,
Christian Decker
Abstract:
Payment channel networks (PCNs) such as the Lightning Network offer an appealing solution to the scalability problem faced by many cryptocurrencies operating on a blockchain such as Bitcoin. However, PCNs also inherit the stringent dependability requirements of blockchain. In particular, in order to mitigate liquidity bottlenecks as well as on-path attacks, it is important that payment channel net…
▽ More
Payment channel networks (PCNs) such as the Lightning Network offer an appealing solution to the scalability problem faced by many cryptocurrencies operating on a blockchain such as Bitcoin. However, PCNs also inherit the stringent dependability requirements of blockchain. In particular, in order to mitigate liquidity bottlenecks as well as on-path attacks, it is important that payment channel networks maintain a high degree of decentralization. Motivated by this requirement, we conduct an empirical centrality analysis of the popular Lightning Network, and in particular, the betweenness centrality distribution of the routing system. Based on our extensive data set (using several millions of channel update messages), we implemented a TimeMachine tool which enables us to study the network evolution over time. We find that although the network is generally fairly decentralized, a small number of nodes can attract a significant fraction of the transactions, introducing skew. Furthermore, our analysis suggests that over the last two years, the centrality has increased significantly, e.g., the inequality (measured by the Gini index) has increased by more than 10%.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
Fast and Heavy Disjoint Weighted Matchings for Demand-Aware Datacenter Topologies
Authors:
Kathrin Hanauer,
Monika Henzinger,
Stefan Schmid,
Jonathan Trummer
Abstract:
Reconfigurable optical topologies promise to improve the performance in datacenters by dynamically optimizing the physical network in a demand-aware manner. State-of-the-art optical technologies allow to establish and update direct connectivity (in the form of edge-disjoint matchings) between top-of-rack switches within microseconds or less. However, to fully exploit temporal structure in the dema…
▽ More
Reconfigurable optical topologies promise to improve the performance in datacenters by dynamically optimizing the physical network in a demand-aware manner. State-of-the-art optical technologies allow to establish and update direct connectivity (in the form of edge-disjoint matchings) between top-of-rack switches within microseconds or less. However, to fully exploit temporal structure in the demand, such fine-grained reconfigurations also require fast algorithms for optimizing the interconnecting matchings.
Motivated by the desire to offload a maximum amount of demand to the reconfigurable network, this paper initiates the study of fast algorithms to find k disjoint heavy matchings in graphs. We present and analyze six algorithms, based on iterative matchings, b-matching, edge coloring, and node-rankings. We show that the problem is generally NP-hard and study the achievable approximation ratios.
An extensive empirical evaluation of our algorithms on both real-world and synthetic traces (88 in total), including traces collected in Facebook datacenters and in HPC clusters reveals that all our algorithms provide high-quality matchings, and also very fast ones come within 95% or more of the best solution. However, the running times differ significantly and what is the best algorithm depends on k and the acceptable runtime-quality tradeoff.
△ Less
Submitted 17 January, 2022;
originally announced January 2022.
-
PowerTCP: Pushing the Performance Limits of Datacenter Networks
Authors:
Vamsi Addanki,
Oliver Michel,
Stefan Schmid
Abstract:
Increasingly stringent throughput and latency requirements in datacenter networks demand fast and accurate congestion control. We observe that the reaction time and accuracy of existing datacenter congestion control schemes are inherently limited. They either rely only on explicit feedback about the network state (e.g., queue lengths in DCTCP) or only on variations of state (e.g., RTT gradient in…
▽ More
Increasingly stringent throughput and latency requirements in datacenter networks demand fast and accurate congestion control. We observe that the reaction time and accuracy of existing datacenter congestion control schemes are inherently limited. They either rely only on explicit feedback about the network state (e.g., queue lengths in DCTCP) or only on variations of state (e.g., RTT gradient in TIMELY). To overcome these limitations, we propose a novel congestion control algorithm, PowerTCP, which achieves much more fine-grained congestion control by adapting to the bandwidth-window product (henceforth called power). PowerTCP leverages in-band network telemetry to react to changes in the network instantaneously without loss of throughput and while keeping queues short. Due to its fast reaction time, our algorithm is particularly well-suited for dynamic network environments and bursty traffic patterns. We show analytically and empirically that PowerTCP can significantly outperform the state-of-the-art in both traditional datacenter topologies and emerging reconfigurable datacenters where frequent bandwidth changes make congestion control challenging. In traditional datacenter networks, PowerTCP reduces tail flow completion times of short flows by 80% compared to DCQCN and TIMELY, and by 33% compared to HPCC even at 60% network load. In reconfigurable datacenters, PowerTCP achieves 85% circuit utilization without incurring additional latency and cuts tail latency by at least 2x compared to existing approaches.
△ Less
Submitted 28 December, 2021;
originally announced December 2021.
-
Automap: Towards Ergonomic Automated Parallelism for ML Models
Authors:
Michael Schaarschmidt,
Dominik Grewe,
Dimitrios Vytiniotis,
Adam Paszke,
Georg Stefan Schmid,
Tamara Norman,
James Molloy,
Jonathan Godwin,
Norman Alexander Rink,
Vinod Nair,
Dan Belov
Abstract:
The rapid rise in demand for training large neural network architectures has brought into focus the need for partitioning strategies, for example by using data, model, or pipeline parallelism. Implementing these methods is increasingly supported through program primitives, but identifying efficient partitioning strategies requires expensive experimentation and expertise. We present the prototype o…
▽ More
The rapid rise in demand for training large neural network architectures has brought into focus the need for partitioning strategies, for example by using data, model, or pipeline parallelism. Implementing these methods is increasingly supported through program primitives, but identifying efficient partitioning strategies requires expensive experimentation and expertise. We present the prototype of an automated partitioner that seamlessly integrates into existing compilers and existing user workflows. Our partitioner enables SPMD-style parallelism that encompasses data parallelism and parameter/activation sharding. Through a combination of inductive tactics and search in a platform-independent partitioning IR, automap can recover expert partitioning strategies such as Megatron sharding for transformer layers.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.