-
Federated Learning and AI Regulation in the European Union: Who is Responsible? -- An Interdisciplinary Analysis
Authors:
Herbert Woisetschläger,
Simon Mertel,
Christoph Krönke,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
The European Union Artificial Intelligence Act mandates clear stakeholder responsibilities in developing and deploying machine learning applications to avoid substantial fines, prioritizing private and secure data processing with data remaining at its origin. Federated Learning (FL) enables the training of generative AI Models across data siloes, sharing only model parameters while improving data…
▽ More
The European Union Artificial Intelligence Act mandates clear stakeholder responsibilities in developing and deploying machine learning applications to avoid substantial fines, prioritizing private and secure data processing with data remaining at its origin. Federated Learning (FL) enables the training of generative AI Models across data siloes, sharing only model parameters while improving data security. Since FL is a cooperative learning paradigm, clients and servers naturally share legal responsibility in the FL pipeline. Our work contributes to clarifying the roles of both parties, explains strategies for shifting responsibilities to the server operator, and points out open technical challenges that we must solve to improve FL's practical applicability under the EU AI Act.
△ Less
Submitted 12 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition
Authors:
Kai Shao,
Rui Wang,
Yixue Hao,
Long Hu,
Min Chen,
Hans Arno Jacobsen
Abstract:
Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal…
▽ More
Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal physiological signals representation learning framework using Siamese architecture via multiscale contrasting for depression recognition (MRLMC). First, fNIRS and EEG are transformed into different but correlated data based on a time-domain data augmentation strategy. Then, we design a spatio-temporal contrasting module to learn the representation of fNIRS and EEG through weight-sharing multiscale spatio-temporal convolution. Furthermore, to enhance the learning of semantic representation associated with stimulation tasks, a semantic consistency contrast module is proposed, aiming to maximize the semantic similarity of fNIRS and EEG. Extensive experiments on publicly available and self-collected multimodal physiological signals datasets indicate that MRLMC outperforms the state-of-the-art models. Moreover, our proposed framework is capable of transferring to multimodal time series downstream tasks.
△ Less
Submitted 25 June, 2024; v1 submitted 22 June, 2024;
originally announced June 2024.
-
Should my Blockchain Learn to Drive? A Study of Hyperledger Fabric
Authors:
Jeeta Ann Chacko,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Similar to other transaction processing frameworks, blockchain systems need to be dynamically reconfigured to adapt to varying workloads and changes in network conditions. However, achieving optimal reconfiguration is particularly challenging due to the complexity of the blockchain stack, which has diverse configurable parameters. This paper explores the concept of self-driving blockchains, which…
▽ More
Similar to other transaction processing frameworks, blockchain systems need to be dynamically reconfigured to adapt to varying workloads and changes in network conditions. However, achieving optimal reconfiguration is particularly challenging due to the complexity of the blockchain stack, which has diverse configurable parameters. This paper explores the concept of self-driving blockchains, which have the potential to predict workload changes and reconfigure themselves for optimal performance without human intervention. We compare and contrast our discussions with existing research on databases and highlight aspects unique to blockchains. We identify specific parameters and components in Hyperledger Fabric, a popular permissioned blockchain system, that are suitable for autonomous adaptation and offer potential solutions for the challenges involved. Further, we implement three demonstrative locally autonomous systems, each targeting a different layer of the blockchain stack, and conduct experiments to understand the feasibility of our findings. Our experiments indicate up to 11% improvement in success throughput and a 30% decrease in latency, making this a significant step towards implementing a fully autonomous blockchain system in the future.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Federated Computing -- Survey on Building Blocks, Extensions and Systems
Authors:
René Schwermer,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
In response to the increasing volume and sensitivity of data, traditional centralized computing models face challenges, such as data security breaches and regulatory hurdles. Federated Computing (FC) addresses these concerns by enabling collaborative processing without compromising individual data privacy. This is achieved through a decentralized network of devices, each retaining control over its…
▽ More
In response to the increasing volume and sensitivity of data, traditional centralized computing models face challenges, such as data security breaches and regulatory hurdles. Federated Computing (FC) addresses these concerns by enabling collaborative processing without compromising individual data privacy. This is achieved through a decentralized network of devices, each retaining control over its data, while participating in collective computations. The motivation behind FC extends beyond technical considerations to encompass societal implications. As the need for responsible AI and ethical data practices intensifies, FC aligns with the principles of user empowerment and data sovereignty. FC comprises of Federated Learning (FL) and Federated Analytics (FA). FC systems became more complex over time and they currently lack a clear definition and taxonomy describing its moving pieces. Current surveys capture domain-specific FL use cases, describe individual components in an FC pipeline individually or decoupled from each other, or provide a quantitative overview of the number of published papers. This work surveys more than 150 papers to distill the underlying structure of FC systems with their basic building blocks, extensions, architecture, environment, and motivation. We capture FL and FA systems individually and point out unique difference between those two.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Federated Learning Priorities Under the European Union Artificial Intelligence Act
Authors:
Herbert Woisetschläger,
Alexander Erben,
Bill Marino,
Shiqiang Wang,
Nicholas D. Lane,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
The age of AI regulation is upon us, with the European Union Artificial Intelligence Act (AI Act) leading the way. Our key inquiry is how this will affect Federated Learning (FL), whose starting point of prioritizing data privacy while performing ML fundamentally differs from that of centralized learning. We believe the AI Act and future regulations could be the missing catalyst that pushes FL tow…
▽ More
The age of AI regulation is upon us, with the European Union Artificial Intelligence Act (AI Act) leading the way. Our key inquiry is how this will affect Federated Learning (FL), whose starting point of prioritizing data privacy while performing ML fundamentally differs from that of centralized learning. We believe the AI Act and future regulations could be the missing catalyst that pushes FL toward mainstream adoption. However, this can only occur if the FL community reprioritizes its research focus. In our position paper, we perform a first-of-its-kind interdisciplinary analysis (legal and ML) of the impact the AI Act may have on FL and make a series of observations supporting our primary position through quantitative and qualitative analysis. We explore data governance issues and the concern for privacy. We establish new challenges regarding performance and energy efficiency within lifecycle monitoring. Taken together, our analysis suggests there is a sizable opportunity for FL to become a crucial component of AI Act-compliant ML systems and for the new regulation to drive the adoption of FL techniques in general. Most noteworthy are the opportunities to defend against data bias and enhance private and secure computation
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Choosing a Classical Planner with Graph Neural Networks
Authors:
Jana Vatter,
Ruben Mayer,
Hans-Arno Jacobsen,
Horst Samulowitz,
Michael Katz
Abstract:
Online planner selection is the task of choosing a solver out of a predefined set for a given planning problem. As planning is computationally hard, the performance of solvers varies greatly on planning problems. Thus, the ability to predict their performance on a given problem is of great importance. While a variety of learning methods have been employed, for classical cost-optimal planning the p…
▽ More
Online planner selection is the task of choosing a solver out of a predefined set for a given planning problem. As planning is computationally hard, the performance of solvers varies greatly on planning problems. Thus, the ability to predict their performance on a given problem is of great importance. While a variety of learning methods have been employed, for classical cost-optimal planning the prevailing approach uses Graph Neural Networks (GNNs). In this work, we continue the line of work on using GNNs for online planner selection. We perform a thorough investigation of the impact of the chosen GNN model, graph representation and node features, as well as prediction task. Going further, we propose using the graph representation obtained by a GNN as an input to the Extreme Gradient Boosting (XGBoost) model, resulting in a more resource-efficient yet accurate approach. We show the effectiveness of a variety of GNN-based online planner selection methods, opening up new exciting avenues for research on online planner selection.
△ Less
Submitted 25 January, 2024;
originally announced February 2024.
-
A Survey on Efficient Federated Learning Methods for Foundation Model Training
Authors:
Herbert Woisetschläger,
Alexander Isenko,
Shiqiang Wang,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients. However, new approaches to FL often discuss their contributions involving small deep-learning models only and focus on training full models on clients. In the wake of Foundation Models (FM), the reality is different for many deep learning applications.…
▽ More
Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients. However, new approaches to FL often discuss their contributions involving small deep-learning models only and focus on training full models on clients. In the wake of Foundation Models (FM), the reality is different for many deep learning applications. Typically, FMs have already been pre-trained across a wide variety of tasks and can be fine-tuned to specific downstream tasks over significantly smaller datasets than required for full model training. However, access to such datasets is often challenging. By its design, FL can help to open data silos. With this survey, we introduce a novel taxonomy focused on computational and communication efficiency, the vital elements to make use of FMs in FL systems. We discuss the benefits and drawbacks of parameter-efficient fine-tuning (PEFT) for FL applications, elaborate on the readiness of FL frameworks to work with FMs and provide future research opportunities on how to evaluate generative models in FL as well as the interplay of privacy and PEFT.
△ Less
Submitted 7 February, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.
-
How Does Stake Distribution Influence Consensus? Analyzing Blockchain Decentralization
Authors:
Shashank Motepalli,
Hans-Arno Jacobsen
Abstract:
In the PoS blockchain landscape, the challenge of achieving full decentralization is often hindered by a disproportionate concentration of staked tokens among a few validators. This study analyses this challenge by first formalizing decentralization metrics for weighted consensus mechanisms. An empirical analysis across ten permissionless blockchains uncovers significant weight concentration among…
▽ More
In the PoS blockchain landscape, the challenge of achieving full decentralization is often hindered by a disproportionate concentration of staked tokens among a few validators. This study analyses this challenge by first formalizing decentralization metrics for weighted consensus mechanisms. An empirical analysis across ten permissionless blockchains uncovers significant weight concentration among validators, underscoring the need for an equitable approach. To counter this, we introduce the Square Root Stake Weight (SRSW) model, which effectively recalibrates staking weight distribution. Our examination of the SRSW model demonstrates notable improvements in the decentralization metrics: the Gini index improves by 37.16% on average, while Nakamoto coefficients for liveness and safety see mean enhancements of 101.04% and 80.09%, respectively. This research is a pivotal step toward a more fair and equitable distribution of staking weight, advancing the decentralization in blockchain consensus mechanisms.
△ Less
Submitted 20 May, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
An End-to-End Performance Comparison of Seven Permissioned Blockchain Systems
Authors:
Frank Christian Geyer,
Hans-Arno Jacobsen,
Ruben Mayer,
Peter Mandl
Abstract:
The emergence of more and more blockchain solutions with innovative approaches to optimising performance, scalability, privacy and governance complicates performance analysis. Reasons for the difficulty of benchmarking blockchains include, for example, the high number of system parameters to configure and the effort to deploy a blockchain network. In addition, performance data, which mostly comes…
▽ More
The emergence of more and more blockchain solutions with innovative approaches to optimising performance, scalability, privacy and governance complicates performance analysis. Reasons for the difficulty of benchmarking blockchains include, for example, the high number of system parameters to configure and the effort to deploy a blockchain network. In addition, performance data, which mostly comes from system vendors, is often intransparent. We investigate and evaluate the performance of seven permissioned blockchain systems using different parameter settings in a reproducible manner. We employ an end-to-end approach, where the clients sending the transactions are fully involved in the data collection approach. Our results highlight the peculiarities and limitations of the systems under investigation. Due to the insights given, our work forms the basis for continued research to optimise the performance of blockchain systems.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
FabricCRDT: A Conflict-Free Replicated Datatypes Approach to Permissioned Blockchains
Authors:
Pezhman Nasirifard,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
With the increased adaption of blockchain technologies, permissioned blockchains such as Hyperledger Fabric provide a robust ecosystem for developing production-grade decentralized applications. However, the additional latency between executing and committing transactions, due to Fabric's three-phase transaction lifecycle of Execute-Order-Validate (EOV), is a potential scalability bottleneck. The…
▽ More
With the increased adaption of blockchain technologies, permissioned blockchains such as Hyperledger Fabric provide a robust ecosystem for developing production-grade decentralized applications. However, the additional latency between executing and committing transactions, due to Fabric's three-phase transaction lifecycle of Execute-Order-Validate (EOV), is a potential scalability bottleneck. The added latency increases the probability of concurrent updates on the same keys by different transactions, leading to transaction failures caused by Fabric's concurrency control mechanism. The transaction failures increase the application development complexity and decrease Fabric's throughput. Conflict-free Replicated Datatypes (CRDTs) provide a solution for merging and resolving conflicts in the presence of concurrent updates. In this work, we introduce FabricCRDT, an approach for integrating CRDTs to Fabric. Our evaluations show that in general, FabricCRDT offers higher throughput of successful transactions than Fabric, while successfully committing and merging all conflicting transactions without any failures.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly
Authors:
Herbert Woisetschläger,
Alexander Isenko,
Shiqiang Wang,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Large Language Models (LLM) and foundation models are popular as they offer new opportunities for individuals and businesses to improve natural language processing, interact with data, and retrieve information faster. However, training or fine-tuning LLMs requires a vast amount of data, which can be challenging to access due to legal or technical restrictions and may require private computing reso…
▽ More
Large Language Models (LLM) and foundation models are popular as they offer new opportunities for individuals and businesses to improve natural language processing, interact with data, and retrieve information faster. However, training or fine-tuning LLMs requires a vast amount of data, which can be challenging to access due to legal or technical restrictions and may require private computing resources. Federated Learning (FL) is a solution designed to overcome these challenges and expand data access for deep learning applications.
This paper takes a hardware-centric approach to explore how LLMs can be brought to modern edge computing systems. Our study fine-tunes the FLAN-T5 model family, ranging from 80M to 3B parameters, using FL for a text summarization task. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions. Our contribution is twofold: First, we evaluate the current capabilities of edge computing systems and their potential for LLM FL workloads. Second, by comparing these systems with a data-center GPU, we demonstrate the potential for improvement and the next steps toward achieving greater computational efficiency at the edge.
△ Less
Submitted 2 May, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Lifting the Fog of Uncertainties: Dynamic Resource Orchestration for the Containerized Cloud
Authors:
Yuqiu Zhang,
Tongkun Zhang,
Gengrui Zhang,
Hans-Arno Jacobsen
Abstract:
The advances in virtualization technologies have sparked a growing transition from virtual machine (VM)-based to container-based infrastructure for cloud computing. From the resource orchestration perspective, containers' lightweight and highly configurable nature not only enables opportunities for more optimized strategies, but also poses greater challenges due to additional uncertainties and a l…
▽ More
The advances in virtualization technologies have sparked a growing transition from virtual machine (VM)-based to container-based infrastructure for cloud computing. From the resource orchestration perspective, containers' lightweight and highly configurable nature not only enables opportunities for more optimized strategies, but also poses greater challenges due to additional uncertainties and a larger configuration parameter search space. Towards this end, we propose Drone, a resource orchestration framework that adaptively configures resource parameters to improve application performance and reduce operational cost in the presence of cloud uncertainties. Built on Contextual Bandit techniques, Drone is able to achieve a balance between performance and resource cost on public clouds, and optimize performance on private clouds where a hard resource constraint is present. We show that our algorithms can achieve sub-linear growth in cumulative regret, a theoretically sound convergence guarantee, and our extensive experiments show that Drone achieves an up to 45% performance improvement and a 20% resource footprint reduction across batch processing jobs and microservice workloads.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
An Experimental Comparison of Partitioning Strategies for Distributed Graph Neural Network Training
Authors:
Nikolai Merkel,
Daniel Stoll,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Recently, graph neural networks (GNNs) have gained much attention as a growing area of deep learning capable of learning on graph-structured data. However, the computational and memory requirements for training GNNs on large-scale graphs can exceed the capabilities of single machines or GPUs, making distributed GNN training a promising direction for large-scale GNN training. A prerequisite for dis…
▽ More
Recently, graph neural networks (GNNs) have gained much attention as a growing area of deep learning capable of learning on graph-structured data. However, the computational and memory requirements for training GNNs on large-scale graphs can exceed the capabilities of single machines or GPUs, making distributed GNN training a promising direction for large-scale GNN training. A prerequisite for distributed GNN training is to partition the input graph into smaller parts that are distributed among multiple machines of a compute cluster. Although graph partitioning has been extensively studied with regard to graph analytics and graph databases, its effect on GNN training performance is largely unexplored.
In this paper, we study the effectiveness of graph partitioning for distributed GNN training. Our study aims to understand how different factors such as GNN parameters, mini-batch size, graph type, features size, and scale-out factor influence the effectiveness of graph partitioning. We conduct experiments with two different GNN systems using vertex and edge partitioning. We found that graph partitioning is a crucial pre-processing step that can heavily reduce the training time and memory footprint. Furthermore, our results show that invested partitioning time can be amortized by reduced GNN training, making it a relevant optimization.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
PrestigeBFT: Revolutionizing View Changes in BFT Consensus Algorithms with Reputation Mechanisms
Authors:
Gengrui Zhang,
Fei Pan,
Sofia Tijanic,
Hans-Arno Jacobsen
Abstract:
This paper proposes PrestigeBFT, a novel leader-based BFT consensus algorithm that addresses the weaknesses of passive view-change protocols. Passive protocols blindly rotate leadership among servers on a predefined schedule, potentially selecting unavailable or slow servers as leaders. PrestigeBFT proposes an active view-change protocol using reputation mechanisms that calculate a server's potent…
▽ More
This paper proposes PrestigeBFT, a novel leader-based BFT consensus algorithm that addresses the weaknesses of passive view-change protocols. Passive protocols blindly rotate leadership among servers on a predefined schedule, potentially selecting unavailable or slow servers as leaders. PrestigeBFT proposes an active view-change protocol using reputation mechanisms that calculate a server's potential correctness based on historic behavior. The active protocol enables servers to campaign for leadership by performing reputation-associated work. As such, up-to-date and correct servers with good reputations are more likely to be elected as leaders as they perform less work, whereas faulty servers with bad reputations are suppressed from becoming leaders by being required to perform more work. Under normal operation, PrestigeBFT achieves 5X higher throughput than the baseline that uses passive view-change protocols. In addition, PrestigeBFT remains unaffected under benign faults and experiences only a 24% drop in throughput under a variety of Byzantine faults, while the baseline throughput drops by 62% and 69%, respectively.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems
Authors:
Herbert Woisetschläger,
Alexander Isenko,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Federated Machine Learning (FL) has received considerable attention in recent years. FL benchmarks are predominantly explored in either simulated systems or data center environments, neglecting the setups of real-world systems, which are often closely linked to edge computing. We close this research gap by introducing FLEdge, a benchmark targeting FL workloads in edge computing systems. We systema…
▽ More
Federated Machine Learning (FL) has received considerable attention in recent years. FL benchmarks are predominantly explored in either simulated systems or data center environments, neglecting the setups of real-world systems, which are often closely linked to edge computing. We close this research gap by introducing FLEdge, a benchmark targeting FL workloads in edge computing systems. We systematically study hardware heterogeneity, energy efficiency during training, and the effect of various differential privacy levels on training in FL systems. To make this benchmark applicable to real-world scenarios, we evaluate the impact of client dropouts on state-of-the-art FL strategies with failure rates as high as 50%. FLEdge provides new insights, such as that training state-of-the-art FL workloads on older GPU-accelerated embedded devices is up to 3x more energy efficient than on modern server-grade GPUs.
△ Less
Submitted 13 June, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.
-
How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental Study
Authors:
Alexander Erben,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
This paper aims to answer the question: Can deep learning models be cost-efficiently trained on a global market of spot VMs spanning different data centers and cloud providers? To provide guidance, we extensively evaluate the cost and throughput implications of training in different zones, continents, and clouds for representative CV, NLP, and ASR models. To expand the current training options fur…
▽ More
This paper aims to answer the question: Can deep learning models be cost-efficiently trained on a global market of spot VMs spanning different data centers and cloud providers? To provide guidance, we extensively evaluate the cost and throughput implications of training in different zones, continents, and clouds for representative CV, NLP, and ASR models. To expand the current training options further, we compare the scalability potential for hybrid-cloud scenarios by adding cloud resources to on-premise hardware to improve training throughput. Finally, we show how leveraging spot instance pricing enables a new cost-efficient way to train models with multiple cheap VMs, trumping both more centralized and powerful hardware and even on-demand cloud offerings at competitive prices.
△ Less
Submitted 2 June, 2024; v1 submitted 5 June, 2023;
originally announced June 2023.
-
Analyzing Geospatial Distribution in Blockchains
Authors:
Shashank Motepalli,
Hans-Arno Jacobsen
Abstract:
Blockchains are decentralized; are they genuinely? We analyze blockchain decentralization's often-overlooked but quantifiable dimension: geospatial distribution of transaction processing. Blockchains bring with them the potential for geospatially distributed transaction processing. They enable validators from geospatially distant locations to partake in consensus protocols; we refer to them as min…
▽ More
Blockchains are decentralized; are they genuinely? We analyze blockchain decentralization's often-overlooked but quantifiable dimension: geospatial distribution of transaction processing. Blockchains bring with them the potential for geospatially distributed transaction processing. They enable validators from geospatially distant locations to partake in consensus protocols; we refer to them as minority validators. Based on our observations, in practice, most validators are often geographically concentrated in close proximity. Furthermore, we observed that minority validators tend not to meet the performance requirements, often misidentified as crash failures. Consequently, they are subject to punishment by jailing (removal from the validator set) and/or slashing (penalty in native tokens). Our emulations, under controlled conditions, demonstrate the same results, raising serious concerns about the potential for the geospatial centralization of validators. To address this, we developed a solution that easily integrates with consensus protocols, and we demonstrated its effectiveness.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
The Evolution of Distributed Systems for Graph Neural Networks and their Origin in Graph Processing and Deep Learning: A Survey
Authors:
Jana Vatter,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Graph Neural Networks (GNNs) are an emerging research field. This specialized Deep Neural Network (DNN) architecture is capable of processing graph structured data and bridges the gap between graph processing and Deep Learning (DL). As graphs are everywhere, GNNs can be applied to various domains including recommendation systems, computer vision, natural language processing, biology and chemistry.…
▽ More
Graph Neural Networks (GNNs) are an emerging research field. This specialized Deep Neural Network (DNN) architecture is capable of processing graph structured data and bridges the gap between graph processing and Deep Learning (DL). As graphs are everywhere, GNNs can be applied to various domains including recommendation systems, computer vision, natural language processing, biology and chemistry. With the rapid growing size of real world graphs, the need for efficient and scalable GNN training solutions has come. Consequently, many works proposing GNN systems have emerged throughout the past few years. However, there is an acute lack of overview, categorization and comparison of such systems. We aim to fill this gap by summarizing and categorizing important methods and techniques for large-scale GNN solutions. In addition, we establish connections between GNN systems, graph processing systems and DL systems.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Partitioner Selection with EASE to Optimize Distributed Graph Processing
Authors:
Nikolai Merkel,
Ruben Mayer,
Tawkir Ahmed Fakir,
Hans-Arno Jacobsen
Abstract:
For distributed graph processing on massive graphs, a graph is partitioned into multiple equally-sized parts which are distributed among machines in a compute cluster. In the last decade, many partitioning algorithms have been developed which differ from each other with respect to the partitioning quality, the run-time of the partitioning and the type of graph for which they work best. The plethor…
▽ More
For distributed graph processing on massive graphs, a graph is partitioned into multiple equally-sized parts which are distributed among machines in a compute cluster. In the last decade, many partitioning algorithms have been developed which differ from each other with respect to the partitioning quality, the run-time of the partitioning and the type of graph for which they work best. The plethora of graph partitioning algorithms makes it a challenging task to select a partitioner for a given scenario. Different studies exist that provide qualitative insights into the characteristics of graph partitioning algorithms that support a selection. However, in order to enable automatic selection, a quantitative prediction of the partitioning quality, the partitioning run-time and the run-time of subsequent graph processing jobs is needed. In this paper, we propose a machine learning-based approach to provide such a quantitative prediction for different types of edge partitioning algorithms and graph processing workloads. We show that training based on generated graphs achieves high accuracy, which can be further improved when using real-world data. Based on the predictions, the automatic selection reduces the end-to-end run-time on average by 11.1% compared to a random selection, by 17.4% compared to selecting the partitioner that yields the lowest cut size, and by 29.1% compared to the worst strategy, respectively. Furthermore, in 35.7% of the cases, the best strategy was selected.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Diba: A Re-configurable Stream Processor
Authors:
Mohammadreza Najafi,
Thamir M. Qadah,
Mohammad Sadoghi,
Hans-Arno Jacobsen
Abstract:
Stream processing acceleration is driven by the continuously increasing volume and velocity of data generated on the Web and the limitations of storage, computation, and power consumption. Hardware solutions provide better performance and power consumption, but they are hindered by the high research and development costs and the long time to market. In this work, we propose our re-configurable str…
▽ More
Stream processing acceleration is driven by the continuously increasing volume and velocity of data generated on the Web and the limitations of storage, computation, and power consumption. Hardware solutions provide better performance and power consumption, but they are hindered by the high research and development costs and the long time to market. In this work, we propose our re-configurable stream processor (Diba), a complete rethinking of a previously proposed customized and flexible query processor that targets real-time stream processing. Diba uses a unidirectional dataflow not dedicated to any specific type of query (operator) on streams, allowing a straightforward placement of processing components on a general data path that facilitates query mapping. In Diba, the concepts of the distribution network and processing components are implemented as two separate entities connected using generic interfaces. This approach allows the adoption of a versatile architecture for a family of queries rather than forcing a rigid chain of processing components to implement such queries. Our experimental evaluations of representative queries from TPC-H yielded processing times of 300, 1220, and 3520 milliseconds for data streams with scale factor sizes of one, four, and ten gigabytes, respectively.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
V-Guard: An Efficient Permissioned Blockchain for Achieving Consensus under Dynamic Memberships in V2X
Authors:
Gengrui Zhang,
Yunhao Mao,
Shiquan Zhang,
Shashank Motepalli,
Fei Pan,
Hans-Arno Jacobsen
Abstract:
This paper presents V-Guard, a new permissioned blockchain that achieves consensus for vehicular data under changing memberships, targeting the problem in V2X networks where vehicles are often intermittently connected on the roads. To achieve this goal, V-Guard integrates membership management into the consensus process for agreeing on data entries. It binds a data entry with a membership configur…
▽ More
This paper presents V-Guard, a new permissioned blockchain that achieves consensus for vehicular data under changing memberships, targeting the problem in V2X networks where vehicles are often intermittently connected on the roads. To achieve this goal, V-Guard integrates membership management into the consensus process for agreeing on data entries. It binds a data entry with a membership configuration profile that describes responsible vehicles for achieving consensus for the data entry. As such, V-Guard produces chained consensus results of both data entries and their residing membership profiles, which enables consensus to be achieved seamlessly under changing memberships. In addition, V-Guard separates the ordering of transactions from consensus, allowing concurrent ordering instances and periodic consensus instances to order and commit data entries. These features make V-Guard efficient for achieving consensus under dynamic memberships with high throughput and latency performance.
△ Less
Submitted 3 April, 2023; v1 submitted 15 January, 2023;
originally announced January 2023.
-
How To Optimize My Blockchain? A Multi-Level Recommendation Approach
Authors:
Jeeta Ann Chacko,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Aside from the conception of new blockchain architectures, existing blockchain optimizations in the literature primarily focus on system or data-oriented optimizations within prevailing blockchains. However, since blockchains handle multiple aspects ranging from organizational governance to smart contract design, a holistic approach that encompasses all the different layers of a given blockchain s…
▽ More
Aside from the conception of new blockchain architectures, existing blockchain optimizations in the literature primarily focus on system or data-oriented optimizations within prevailing blockchains. However, since blockchains handle multiple aspects ranging from organizational governance to smart contract design, a holistic approach that encompasses all the different layers of a given blockchain system is required to ensure that all optimization opportunities are taken into consideration. In this vein, we define a multi-level optimization recommendation approach that identifies optimization opportunities within a blockchain at the system, data, and user level. Multiple metrics and attributes are derived from a blockchain log and nine optimization recommendations are formalized. We implement an automated optimization recommendation tool, BlockOptR, based on these concepts. The system is extensively evaluated with a wide range of workloads covering multiple real-world scenarios. After implementing the recommended optimizations, we observe an average of 20% improvement in the success rate of transactions and an average of 40% improvement in latency.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
A Serverless Publish/Subscribe System
Authors:
Pezhman Nasirifard,
Hans-Arno Jacobsen
Abstract:
Operating a scalable and reliable server application, such as publish/subscribe (pub/sub) systems, requires tremendous development efforts and resources. The emerging serverless paradigm simplifies the development and deployment of highly available applications by delegating most operational concerns to the cloud providers. The serverless paradigm describes a programming model where the developers…
▽ More
Operating a scalable and reliable server application, such as publish/subscribe (pub/sub) systems, requires tremendous development efforts and resources. The emerging serverless paradigm simplifies the development and deployment of highly available applications by delegating most operational concerns to the cloud providers. The serverless paradigm describes a programming model where the developers break the application downs into smaller microservices that run on the cloud in response to events. This paper proposes designing a serverless pub/sub system based on the IBM Bluemix cloud platform. Our pub/sub system performs topic-based, content-based, and function-based matchings. The function-based matching is a novel matching approach where the subscribers can define a highly customizable subscription function that the broker applies to the publications in the cloud. The evaluations of the designed system verify the practicality of the designed system. However, the vendor-specific constraints of the IBM Bluemix resources are a bottleneck to the scalability of the broker.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
i13DR: A Real-Time Demand Response Infrastructure for Integrating Renewable Energy Resources
Authors:
Pezhman Nasirifard,
Hans-Arno Jacobsen
Abstract:
With the ongoing integration of Renewable Energy Sources (RES), the complexity of power grids is increasing. Due to the fluctuating nature of RES, ensuring the reliability of power grids can be challenging. One possible approach for addressing these challenges is Demand Response (DR) which is described as matching the demand for electrical energy according to the changes and the availability of su…
▽ More
With the ongoing integration of Renewable Energy Sources (RES), the complexity of power grids is increasing. Due to the fluctuating nature of RES, ensuring the reliability of power grids can be challenging. One possible approach for addressing these challenges is Demand Response (DR) which is described as matching the demand for electrical energy according to the changes and the availability of supply. However, implementing a DR system to monitor and control a broad set of electrical appliances in real-time introduces several new complications, including ensuring the reliability and financial feasibility of the system. In this work, we address these issues by designing and implementing a distributed real-time DR infrastructure for laptops, which estimates and controls the power consumption of a network of connected laptops in response to the fast, irregular changes of RES. Furthermore, since our approach is entirely software-based, we dramatically reduce the initial costs of the demand side participants. The result of our field experiments confirms that our system successfully schedules and executes rapid and effective DR events. However, the accuracy of the estimated power consumption of all participating laptops is relatively low, directly caused by our software-based approach.
△ Less
Submitted 31 October, 2022; v1 submitted 14 October, 2022;
originally announced October 2022.
-
OrderlessChain: Do Permissioned Blockchains Need Total Global Order of Transactions?
Authors:
Pezhman Nasirifard,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Existing permissioned blockchains often rely on coordination-based consensus protocols to ensure the safe execution of applications in a Byzantine environment. Furthermore, these protocols serialize the transactions by ordering them into a total global order. The serializability preserves the correctness of the application's state stored on the blockchain. However, using coordination-based protoco…
▽ More
Existing permissioned blockchains often rely on coordination-based consensus protocols to ensure the safe execution of applications in a Byzantine environment. Furthermore, these protocols serialize the transactions by ordering them into a total global order. The serializability preserves the correctness of the application's state stored on the blockchain. However, using coordination-based protocols to attain the global order of transactions can limit the throughput and induce high latency. In contrast, application-level correctness requirements exist that are not dependent on the order of transactions, known as invariant-confluence (I-confluence). The I-confluent applications can execute in a coordination-free manner benefiting from the improved performance compared to the coordination-based approaches. The safety and liveness of I-confluent applications are studied in non-Byzantine environments, but the correct execution of such applications remains a challenge in Byzantine coordination-free environments. This work introduces OrderlessChain, a coordination-free permissioned blockchain for the safe and live execution of I-confluent applications in a Byzantine environment. We implemented a prototype of our system, and our evaluation results demonstrate that our coordination-free approach performs better than coordination-based blockchains.
△ Less
Submitted 24 October, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Representation Learning for Appliance Recognition: A Comparison to Classical Machine Learning
Authors:
Matthias Kahl,
Daniel Jorde,
Hans-Arno Jacobsen
Abstract:
Non-intrusive load monitoring (NILM) aims at energy consumption and appliance state information retrieval from aggregated consumption measurements, with the help of signal processing and machine learning algorithms. Representation learning with deep neural networks is successfully applied to several related disciplines. The main advantage of representation learning lies in replacing an expert-driv…
▽ More
Non-intrusive load monitoring (NILM) aims at energy consumption and appliance state information retrieval from aggregated consumption measurements, with the help of signal processing and machine learning algorithms. Representation learning with deep neural networks is successfully applied to several related disciplines. The main advantage of representation learning lies in replacing an expert-driven, hand-crafted feature extraction with hierarchical learning from many representations in raw data format. In this paper, we show how the NILM processing-chain can be improved, reduced in complexity and alternatively designed with recent deep learning algorithms. On the basis of an event-based appliance recognition approach, we evaluate seven different classification models: a classical machine learning approach that is based on a hand-crafted feature extraction, three different deep neural network architectures for automated feature extraction on raw waveform data, as well as three baseline approaches for raw data processing. We evaluate all approaches on two large-scale energy consumption datasets with more than 50,000 events of 44 appliances. We show that with the use of deep learning, we are able to reach and surpass the performance of the state-of-the-art classical machine learning approach for appliance recognition with an F-Score of 0.75 and 0.86 compared to 0.69 and 0.87 of the classical approach.
△ Less
Submitted 26 August, 2022;
originally announced September 2022.
-
The DEBS 2022 Grand Challenge: Detecting Trading Trends in Financial Tick Data
Authors:
Sebastian Frischbier,
Jawad Tahir,
Christoph Doblander,
Arne Hormann,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
The DEBS Grand Challenge (GC) is an annual programming competition open to practitioners from both academia and industry. The GC 2022 edition focuses on real-time complex event processing of high-volume tick data provided by Infront Financial Technology GmbH. The goal of the challenge is to efficiently compute specific trend indicators and detect patterns in these indicators like those used by rea…
▽ More
The DEBS Grand Challenge (GC) is an annual programming competition open to practitioners from both academia and industry. The GC 2022 edition focuses on real-time complex event processing of high-volume tick data provided by Infront Financial Technology GmbH. The goal of the challenge is to efficiently compute specific trend indicators and detect patterns in these indicators like those used by real-life traders to decide on buying or selling in financial markets. The data set Trading Data used for benchmarking contains 289 million tick events from approximately 5500+ financial instruments that had been traded on the three major exchanges Amsterdam (NL), Paris (FR), and Frankfurt am Main (GER) over the course of a full week in 2021. The data set is made publicly available. In addition to correctness and performance, submissions must explicitly focus on reusability and practicability. Hence, participants must address specific nonfunctional requirements and are asked to build upon open-source platforms. This paper describes the required scenario and the data set Trading Data, defines the queries of the problem statement, and explains the enhancements made to the evaluation platform Challenger that handles data distribution, dynamic subscriptions, and remote evaluation of the submissions.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
Towards Data-Driven Precision Agriculture using Open Data and Open Source Software
Authors:
Jacob Høxbroe Jeppesen,
Rune Hylsberg Jacobsen,
Rasmus Nyholm Jørgensen,
Thomas Skjødeberg Toftegaard
Abstract:
Information and communications technology (ICT) within the agricultural sector is characterized by a widespread use of proprietary data formats, a strong lack of interoperability standards, and a tight connection to specific hardware implementations resulting from vendor lock-in. This partly explains why ICT has not yet had its full impact within the domain. By utilizing the vast amount of publicl…
▽ More
Information and communications technology (ICT) within the agricultural sector is characterized by a widespread use of proprietary data formats, a strong lack of interoperability standards, and a tight connection to specific hardware implementations resulting from vendor lock-in. This partly explains why ICT has not yet had its full impact within the domain. By utilizing the vast amount of publicly available open data, ranging from topographic maps to multispectral satellite images, the economically and environmentally optimal farming practices can be advanced beyond state of the art. This paper addresses the potential of applying publicly available information sources to improve crop production, with emphasis on yield optimization. This potential is evaluated based on free public data for the growth season 2016 by examining winter wheat production for a selected region in Denmark. Data aggregation is performed by promoting opensource software tools as a foundation for decision support. That allows the farmer, or another domain expert, to query a certain crop type, merge this information with other data sets, and perform analysis on data ranging from sub-field analysis to statistics on national/regional scale. The registration of field polygons and sowed crop types for fields in Denmark, alongside with detailed geographic data and free satellite images, enable us to exploit publicly available data of high quality, which can be applied to perform further analysis.
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
Reaching Consensus in the Byzantine Empire: A Comprehensive Review of BFT Consensus Algorithms
Authors:
Gengrui Zhang,
Fei Pan,
Yunhao Mao,
Sofia Tijanic,
Michael Dang'ana,
Shashank Motepalli,
Shiquan Zhang,
Hans-Arno Jacobsen
Abstract:
Byzantine fault-tolerant (BFT) consensus algorithms are at the core of providing safety and liveness guarantees for distributed systems that must operate in the presence of arbitrary failures. Recently, numerous new BFT algorithms have been proposed, not least due to the traction blockchain technologies have garnered in the search for consensus solutions that offer high throughput, low latency, an…
▽ More
Byzantine fault-tolerant (BFT) consensus algorithms are at the core of providing safety and liveness guarantees for distributed systems that must operate in the presence of arbitrary failures. Recently, numerous new BFT algorithms have been proposed, not least due to the traction blockchain technologies have garnered in the search for consensus solutions that offer high throughput, low latency, and robust system designs. In this paper, we conduct a systematic survey of selected and distinguished BFT algorithms that have received extensive attention in academia and industry alike. We perform a qualitative comparison among all algorithms we review considering message and time complexities. Furthermore, we decompose each consensus algorithm into its constituent subprotocols for replication and view change backed by intuitive figures to illustrate the message-passing pattern. We also elaborate on the strengths and weaknesses of each algorithm as compared to the state-of-the-art approaches.
△ Less
Submitted 5 December, 2023; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Out-of-Core Edge Partitioning at Linear Run-Time
Authors:
Ruben Mayer,
Kamil Orujzade,
Hans-Arno Jacobsen
Abstract:
Graph edge partitioning is an important preprocessing step to optimize distributed computing jobs on graph-structured data. The edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Out-of-core edge partitioning algorithms are able to tackle the problem with low memory overhead. Exsisting out-of-core algorithms m…
▽ More
Graph edge partitioning is an important preprocessing step to optimize distributed computing jobs on graph-structured data. The edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Out-of-core edge partitioning algorithms are able to tackle the problem with low memory overhead. Exsisting out-of-core algorithms mainly work in a streaming manner and can be grouped into two types. While \emph{stateless} streaming edge partitioning is fast and yields low partitioning quality, stateful streaming edge partitioning yields better quality, but is expensive, as it requires a scoring function to be evaluated for every edge on every partition, leading to a time complexity of $\mathcal{O}(|E|*k)$. In this paper, we propose 2PS-L, a novel out-of-core edge partitioning algorithm that builds upon the stateful streaming model, but achieves linear run-time (i.e., $\mathcal{O}(|E|)$). 2PS-L consists of two phases. In the first phase, vertices are separated into clusters by a lightweight streaming clustering algorithm. In the second phase, the graph is re-streamed and vertex clustering from the first phase is exploited to reduce the search space of graph partitioning to only two target partitions for every edge. Our evaluations show that 2PS-L can achieve better partitioning quality than existing stateful streaming edge partitioners while having a much lower run-time. As a consequence, the total run-time of partitioning and subsequent distributed graph processing can be significantly reduced.
△ Less
Submitted 23 March, 2022;
originally announced March 2022.
-
Decentralizing Permissioned Blockchain with Delay Towers
Authors:
Shashank Motepalli,
Hans-Arno Jacobsen
Abstract:
Growing excitement around permissionless blockchains is uncovering its latent scalability concerns. Permissioned blockchains offer high transactional throughput and low latencies while compromising decentralization. In the quest for a decentralized, scalable blockchain fabric, i.e., to offer the scalability of permissioned blockchain in a permissionless setting, we present L4L to encourage decentr…
▽ More
Growing excitement around permissionless blockchains is uncovering its latent scalability concerns. Permissioned blockchains offer high transactional throughput and low latencies while compromising decentralization. In the quest for a decentralized, scalable blockchain fabric, i.e., to offer the scalability of permissioned blockchain in a permissionless setting, we present L4L to encourage decentralization over the permissioned Libra network without compromising its sustainability. L4L employs delay towers, -- puzzle towers that leverage verifiable delay functions -- for establishing identity in a permissionless setting. Delay towers cannot be parallelized due to their sequential execution, making them an eco-friendly alternative. We also discuss methodologies to replace validators participating in consensus to promote compliant behavior. Our evaluations found that the cost of enabling decentralization over permissioned networks is almost negligible. Furthermore, delay towers offer an alternative to existing permissionless consensus mechanisms without requiring airdrops or pre-sale of tokens.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
ESCAPE to Precaution against Leader Failures
Authors:
Gengrui Zhang,
Hans-Arno Jacobsen
Abstract:
Leader-based consensus protocols must undergo a view-change phase to elect a new leader when the current leader fails. The new leader is often decided upon a candidate server that collects votes from a quorum of servers. However, voting-based election mechanisms intrinsically cause competition in leadership candidacy when each candidate collects only partial votes. This split-vote scenario can res…
▽ More
Leader-based consensus protocols must undergo a view-change phase to elect a new leader when the current leader fails. The new leader is often decided upon a candidate server that collects votes from a quorum of servers. However, voting-based election mechanisms intrinsically cause competition in leadership candidacy when each candidate collects only partial votes. This split-vote scenario can result in no leadership winner and prolong the undesired view-change period. In this paper, we investigate a case study of Raft's leader election mechanism and propose a new leader election protocol, called ESCAPE, that fundamentally solves split votes by prioritizing servers based on their log responsiveness. ESCAPE dynamically assigns servers with a configuration that offers different priorities through Raft's periodic heartbeat. In each assignment, ESCAPE keeps track of server log responsiveness and assigns configurations that are inclined to win an election to more up-to-date servers, thereby preparing a pool of prioritized candidates. Consequently, when the next election takes place, the candidate with the highest priority will defeat its counterparts and becomes the next leader without competition. The evaluation results show that ESCAPE progressively reduces the leader election time when the cluster scales up, and the improvement becomes more significant under message loss.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines
Authors:
Alexander Isenko,
Ruben Mayer,
Jeffrey Jedele,
Hans-Arno Jacobsen
Abstract:
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with hardware innovations (e.g., faster GPUs, TPUs, and inter-connects) and advanced parallelization techniques that yield better scalability. At the same time, the amou…
▽ More
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with hardware innovations (e.g., faster GPUs, TPUs, and inter-connects) and advanced parallelization techniques that yield better scalability. At the same time, the amount of training data needed in order to train increasingly complex models is growing. As a consequence of this development, data preprocessing and provisioning are becoming a severe bottleneck in end-to-end deep learning pipelines.
In this paper, we provide an in-depth analysis of data preprocessing pipelines from four different machine learning domains. We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines and extract individual trade-offs to optimize throughput, preprocessing time, and storage consumption. Additionally, we provide an open-source profiling library that can automatically decide on a suitable preprocessing strategy to maximize throughput. By applying our generated insights to real-world use-cases, we obtain an increased throughput of 3x to 13x compared to an untuned system while keeping the pipeline functionally identical. These findings show the enormous potential of data pipeline tuning.
△ Less
Submitted 25 March, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
A Review on Communication Protocols for Autonomous Unmanned Aerial Vehicles for Inspection Application
Authors:
Liping Shi,
Néstor J. Hernández Marcano,
Rune Hylsberg Jacobsen
Abstract:
The communication system is a critical part of the system design for the autonomous UAV. It has to address different considerations, including efficiency, reliability and mobility of the UAV. In addition, a multi-UAV system requires a communication system to assist information sharing, task allocation and collaboration in a team of UAVs. In this paper, we review communication solutions for support…
▽ More
The communication system is a critical part of the system design for the autonomous UAV. It has to address different considerations, including efficiency, reliability and mobility of the UAV. In addition, a multi-UAV system requires a communication system to assist information sharing, task allocation and collaboration in a team of UAVs. In this paper, we review communication solutions for supporting a team of UAVs while considering an application in the power line inspection industry. We provide a review of candidate wireless communication technologies {for supporting communication in UAV applications. Performance measurements and UAV-related channel modeling of those candidate technologies are reviewed. A discussion of current technologies for building UAV mesh networks is presented. We then analyze the structure, interface and performance of robotic communication middleware, ROS and ROS2. Based on our review, the features and dependencies of candidate solutions in each layer of the communication system are presented.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
Reward Mechanism for Blockchains Using Evolutionary Game Theory
Authors:
Shashank Motepalli,
Hans-Arno Jacobsen
Abstract:
Blockchains have witnessed widespread adoption in the past decade in various fields. The growing demand makes their scalability and sustainability challenges more evident than ever. As a result, more and more blockchains have begun to adopt proof-of-stake (PoS) consensus protocols to address those challenges. One of the fundamental characteristics of any blockchain technology is its crypto-economi…
▽ More
Blockchains have witnessed widespread adoption in the past decade in various fields. The growing demand makes their scalability and sustainability challenges more evident than ever. As a result, more and more blockchains have begun to adopt proof-of-stake (PoS) consensus protocols to address those challenges. One of the fundamental characteristics of any blockchain technology is its crypto-economics and incentives. Lately, each PoS blockchain has designed a unique reward mechanism, yet, many of them are prone to free-rider and nothing-at-stake problems. To better understand the ad-hoc design of reward mechanisms, in this paper, we develop a reward mechanism framework that could apply to many PoS blockchains. We formulate the block validation game wherein the rewards are distributed for validating the blocks correctly. Using evolutionary game theory, we analyze how the participants' behaviour could potentially evolve with the reward mechanism. Also, penalties are found to play a central role in maintaining the integrity of blockchains.
△ Less
Submitted 8 July, 2021; v1 submitted 12 April, 2021;
originally announced April 2021.
-
Hybrid Edge Partitioner: Partitioning Large Power-Law Graphs under Memory Constraints
Authors:
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Distributed systems that manage and process graph-structured data internally solve a graph partitioning problem to minimize their communication overhead and query run-time. Besides computational complexity -- optimal graph partitioning is NP-hard -- another important consideration is the memory overhead. Real-world graphs often have an immense size, such that loading the complete graph into memory…
▽ More
Distributed systems that manage and process graph-structured data internally solve a graph partitioning problem to minimize their communication overhead and query run-time. Besides computational complexity -- optimal graph partitioning is NP-hard -- another important consideration is the memory overhead. Real-world graphs often have an immense size, such that loading the complete graph into memory for partitioning is not economical or feasible. Currently, the common approach to reduce memory overhead is to rely on streaming partitioning algorithms. While the latest streaming algorithms lead to reasonable partitioning quality on some graphs, they are still not completely competitive to in-memory partitioners. In this paper, we propose a new system, Hybrid Edge Partitioner (HEP), that can partition graphs that fit partly into memory while yielding a high partitioning quality. HEP can flexibly adapt its memory overhead by separating the edge set of the graph into two sub-sets. One sub-set is partitioned by NE++, a novel, efficient in-memory algorithm, while the other sub-set is partitioned by a streaming approach. Our evaluations on large real-world graphs show that in many cases, HEP outperforms both in-memory partitioning and streaming partitioning at the same time. Hence, HEP is an attractive alternative to existing solutions that cannot fine-tune their memory overheads. Finally, we show that using HEP, we achieve a significant speedup of distributed graph processing jobs on Spark/GraphX compared to state-of-the-art partitioning algorithms.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Why Do My Blockchain Transactions Fail? A Study of Hyperledger Fabric (Extended version)*
Authors:
Jeeta Ann Chacko,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Permissioned blockchain systems promise to provide both decentralized trust and privacy. Hyperledger Fabric is currently one of the most wide-spread permissioned blockchain systems and is heavily promoted both in industry and academia. Due to its optimistic concurrency model, the transaction failure rates in Fabric can become a bottleneck. While there is active research to reduce failures, there i…
▽ More
Permissioned blockchain systems promise to provide both decentralized trust and privacy. Hyperledger Fabric is currently one of the most wide-spread permissioned blockchain systems and is heavily promoted both in industry and academia. Due to its optimistic concurrency model, the transaction failure rates in Fabric can become a bottleneck. While there is active research to reduce failures, there is a lack of understanding on their root cause and, consequently, a lack of guidelines on how to configure Fabric optimally for different scenarios. To close this gap, in this paper, we first introduce a formal definition of the different types of transaction failures in Fabric. Then, we develop a comprehensive testbed and benchmarking system, HyperLedgerLab, along with four different chaincodes that represent realistic use cases and a chaincode/workload generator. Using HyperLedgerLab, we conduct exhaustive experiments to analyze the impact of different parameters of Fabric such as block size, endorsement policies, and others, on transaction failures. We further analyze three recently proposed optimizations from the literature, Fabric++, Streamchain and FabricSharp, and evaluate under which conditions they reduce the failure rates. Finally, based on our results, we provide recommendations for Fabric practitioners on how to configure the system and also propose new research directions.
△ Less
Submitted 8 March, 2021;
originally announced March 2021.
-
2PS: High-Quality Edge Partitioning with Two-Phase Streaming
Authors:
Ruben Mayer,
Kamil Orujzade,
Hans-Arno Jacobsen
Abstract:
Graph partitioning is an important preprocessing step to distributed graph processing. In edge partitioning, the edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Streaming is a viable approach to partition graphs that exceed the memory capacities of a single server. The graph is ingested as a stream of edges…
▽ More
Graph partitioning is an important preprocessing step to distributed graph processing. In edge partitioning, the edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Streaming is a viable approach to partition graphs that exceed the memory capacities of a single server. The graph is ingested as a stream of edges, and one edge at a time is immediately and irrevocably assigned to a partition based on a scoring function. However, streaming partitioning suffers from the uninformed assignment problem: At the time of partitioning early edges in the stream, there is no information available about the rest of the edges. As a consequence, edge assignments are often driven by balancing considerations, and the achieved replication factor is comparably high. In this paper, we propose 2PS, a novel two-phase streaming algorithm for high-quality edge partitioning. In the first phase, vertices are separated into clusters by a lightweight streaming clustering algorithm. In the second phase, the graph is re-streamed and edge partitioning is performed while taking into account the clustering of the vertices from the first phase. Our evaluations show that 2PS can achieve a replication factor that is comparable to heavy-weight random access partitioners while inducing orders of magnitude lower memory overhead.
△ Less
Submitted 20 January, 2020;
originally announced January 2020.
-
ParPaRaw: Massively Parallel Parsing of Delimiter-Separated Raw Data
Authors:
Elias Stehle,
Hans-Arno Jacobsen
Abstract:
Parsing is essential for a wide range of use cases, such as stream processing, bulk loading, and in-situ querying of raw data. Yet, the compute-intense step often constitutes a major bottleneck in the data ingestion pipeline, since parsing of inputs that require more involved parsing rules is challenging to parallelise. This work proposes a massively parallel algorithm for parsing delimiter-separa…
▽ More
Parsing is essential for a wide range of use cases, such as stream processing, bulk loading, and in-situ querying of raw data. Yet, the compute-intense step often constitutes a major bottleneck in the data ingestion pipeline, since parsing of inputs that require more involved parsing rules is challenging to parallelise. This work proposes a massively parallel algorithm for parsing delimiter-separated data formats on GPUs. Other than the state-of-the-art, the proposed approach does not require an initial sequential pass over the input to determine a thread's parsing context. That is, how a thread, beginning somewhere in the middle of the input, should interpret a certain symbol (e.g., whether to interpret a comma as a delimiter or as part of a larger string enclosed in double-quotes). Instead of tailoring the approach to a single format, we are able to perform a massively parallel FSM simulation, which is more flexible and powerful, supporting more expressive parsing rules with general applicability. Achieving a parsing rate of as much as 14.2 GB/s, our experimental evaluation on a GPU with 3584 cores shows that the presented approach is able to scale to thousands of cores and beyond. With an end-to-end streaming approach, we are able to exploit the full-duplex capabilities of the PCIe bus and hide latency from data transfers. Considering the end-to-end performance, the algorithm parses 4.8 GB in as little as 0.44 seconds, including data transfers.
△ Less
Submitted 15 April, 2020; v1 submitted 31 May, 2019;
originally announced May 2019.
-
Appliance Event Detection -- A Multivariate, Supervised Classification Approach
Authors:
Matthias Kahl,
Thomas Kriechbaumer,
Daniel Jorde,
Anwar Ul Haq,
Hans-Arno Jacobsen
Abstract:
Non-intrusive load monitoring (NILM) is a modern and still expanding technique, helping to understand fundamental energy consumption patterns and appliance characteristics. Appliance event detection is an elementary step in the NILM pipeline. Unfortunately, several types of appliances (e.g., switching mode power supply (SMPS) or multi-state) are known to challenge state-of-the-art event detection…
▽ More
Non-intrusive load monitoring (NILM) is a modern and still expanding technique, helping to understand fundamental energy consumption patterns and appliance characteristics. Appliance event detection is an elementary step in the NILM pipeline. Unfortunately, several types of appliances (e.g., switching mode power supply (SMPS) or multi-state) are known to challenge state-of-the-art event detection systems due to their noisy consumption profiles. Classical rule-based event detection system become infeasible and complex for these appliances. By stepping away from distinct event definitions, we can learn from a consumer-configured event model to differentiate between relevant and irrelevant event transients.
We introduce a boosting oriented adaptive training, that uses false positives from the initial training area to reduce the number of false positives on the test area substantially. The results show a false positive decrease by more than a factor of eight on a dataset that has a strong focus on SMPS-driven appliances. To obtain a stable event detection system, we applied several experiments on different parameters to measure its performance. These experiments include the evaluation of six event features from the spectral and time domain, different types of feature space normalization to eliminate undesired feature weighting, the conventional and adaptive training, and two common classifiers with its optimal parameter settings. The evaluations are performed on two publicly available energy datasets with high sampling rates: BLUED and BLOND-50.
△ Less
Submitted 24 April, 2019;
originally announced April 2019.
-
Uplink Grant-Free Random Access Solutions for URLLC services in 5G New Radio
Authors:
Nurul Huda Mahmood,
Renato Abreu,
Ronald Böhnke,
Martin Schubert,
Gilberto Berardinelli,
Thomas H. Jacobsen
Abstract:
The newly introduced ultra-reliable low latency communication service class in 5G New Radio depends on innovative low latency radio resource management solutions that can guarantee high reliability. Grant-free random access, where channel resources are accessed without undergoing assignment through a handshake process, is proposed in 5G New Radio as an important latency reducing solution. However,…
▽ More
The newly introduced ultra-reliable low latency communication service class in 5G New Radio depends on innovative low latency radio resource management solutions that can guarantee high reliability. Grant-free random access, where channel resources are accessed without undergoing assignment through a handshake process, is proposed in 5G New Radio as an important latency reducing solution. However, this comes at an increased likelihood of collisions resulting from uncontrolled channel access, when the same resources are preallocated to a group of users. Novel reliability enhancement techniques are therefore needed. This article provides an overview of grant-free random access in 5G New Radio focusing on the ultra-reliable low latency communication service class, and presents two reliability-enhancing solutions. The first proposes retransmissions over shared resources, whereas the second proposal incorporates grant-free transmission with non-orthogonal multiple access with overlapping transmissions being resolved through the use of advanced receivers. Both proposed solutions result in significant performance gains, in terms of reliability as well as resource efficiency. For example, the proposed non-orthogonal multiple access scheme can support a normalized load of more than 1.5 users/slot at packet loss rates of ~10^{-5} - a significant improvement over the maximum supported load with conventional grant-free schemes like slotted-ALOHA.
△ Less
Submitted 11 April, 2019;
originally announced April 2019.
-
Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools
Authors:
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-art results in various domains such as image recognition and natural language processing. One of the reasons for this success is the increasing size of DL models and the proliferation of vast amounts of training data being available. To keep on improving the performance of DL, increasing the scalability of DL…
▽ More
Deep Learning (DL) has had an immense success in the recent past, leading to state-of-the-art results in various domains such as image recognition and natural language processing. One of the reasons for this success is the increasing size of DL models and the proliferation of vast amounts of training data being available. To keep on improving the performance of DL, increasing the scalability of DL systems is necessary. In this survey, we perform a broad and thorough investigation on challenges, techniques and tools for scalable DL on distributed infrastructures. This incorporates infrastructures for DL, methods for parallel DL training, multi-tenant resource scheduling and the management of training and model data. Further, we analyze and compare 11 current open-source DL frameworks and tools and investigate which of the techniques are commonly implemented in practice. Finally, we highlight future research trends in DL systems that deserve further research.
△ Less
Submitted 25 September, 2019; v1 submitted 27 March, 2019;
originally announced March 2019.
-
Parallel Index-based Stream Join on a Multicore CPU
Authors:
Amirhesam Shahvarani,
Hans-Arno Jacobsen
Abstract:
There is increasing interest in using multicore processors to accelerate stream processing. For example, indexing sliding window content to enhance the performance of streaming queries is greatly improved by utilizing the computational capabilities of a multicore processor. However, designing an effective concurrency control mechanism that addresses the problem of concurrent indexing in highly dyn…
▽ More
There is increasing interest in using multicore processors to accelerate stream processing. For example, indexing sliding window content to enhance the performance of streaming queries is greatly improved by utilizing the computational capabilities of a multicore processor. However, designing an effective concurrency control mechanism that addresses the problem of concurrent indexing in highly dynamic settings remains a challenge. In this paper, we introduce an index data structure, called the Partitioned In-memory Merge-Tree, to address the challenges that arise when indexing highly dynamic data, which are common in streaming settings. To complement the index, we design an algorithm to realize a parallel index-based stream join that exploits the computational power of multicore processors. Our experiments using an octa-core processor show that our parallel stream join achieves up to 5.5 times higher throughput than a single-threaded approach.
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
PanJoin: A Partition-based Adaptive Stream Join
Authors:
Fei Pan,
Hans-Arno Jacobsen
Abstract:
In stream processing, stream join is one of the critical sources of performance bottlenecks. The sliding-window-based stream join provides a precise result but consumes considerable computational resources. The current solutions lack support for the join predicates on large windows. These algorithms and their hardware accelerators are either limited to equi-join or use a nested loop join to proces…
▽ More
In stream processing, stream join is one of the critical sources of performance bottlenecks. The sliding-window-based stream join provides a precise result but consumes considerable computational resources. The current solutions lack support for the join predicates on large windows. These algorithms and their hardware accelerators are either limited to equi-join or use a nested loop join to process all the requests.
In this paper, we present a new algorithm called PanJoin which has high throughput on large windows and supports both equi-join and non-equi-join. PanJoin implements three new data structures to reduce computations during the probing phase of stream join. We also implement the most hardware-friendly data structure, called BI-Sort, on FPGA. Our evaluation shows that PanJoin outperforms several recently proposed stream join methods by more than 1000x, and it also adapts well to highly skewed data.
△ Less
Submitted 12 November, 2018;
originally announced November 2018.
-
Waveform Signal Entropy and Compression Study of Whole-Building Energy Datasets
Authors:
Thomas Kriechbaumer,
Hans-Arno Jacobsen
Abstract:
Electrical energy consumption has been an ongoing research area since the coming of smart homes and Internet of Things devices. Consumption characteristics and usages profiles are directly influenced by building occupants and their interaction with electrical appliances. Extracted information from these data can be used to conserve energy and increase user comfort levels. Data analysis together wi…
▽ More
Electrical energy consumption has been an ongoing research area since the coming of smart homes and Internet of Things devices. Consumption characteristics and usages profiles are directly influenced by building occupants and their interaction with electrical appliances. Extracted information from these data can be used to conserve energy and increase user comfort levels. Data analysis together with machine learning models can be utilized to extract valuable information for the benefit of occupants themselves, power plants, and grid operators. Public energy datasets provide a scientific foundation to develop and benchmark these algorithms and techniques. With datasets exceeding tens of terabytes, we present a novel study of five whole-building energy datasets with high sampling rates, their signal entropy, and how a well-calibrated measurement can have a significant effect on the overall storage requirements. We show that some datasets do not fully utilize the available measurement precision, therefore leaving potential accuracy and space savings untapped. We benchmark a comprehensive list of 365 file formats, transparent data transformations, and lossless compression algorithms. The primary goal is to reduce the overall dataset size while maintaining an easy-to-use file format and access API. We show that with careful selection of file format and encoding scheme, we can reduce the size of some datasets by up to 73%.
△ Less
Submitted 25 October, 2018;
originally announced October 2018.
-
A Framework for Detecting and Translating User Behavior from Smart Meter Data
Authors:
Egon Kidmose,
Emad Ebeid,
Rune Hylsberg Jacobsen
Abstract:
The European adoption of smart electricity meters triggers the developments of new value-added service for smart energy and optimal consumption. Recently, several algorithms and tools have been built to analyze smart meter's data. This paper introduces an open framework and prototypes for detecting and presenting user behavior from its smart meter power consumption data. The framework aims at pres…
▽ More
The European adoption of smart electricity meters triggers the developments of new value-added service for smart energy and optimal consumption. Recently, several algorithms and tools have been built to analyze smart meter's data. This paper introduces an open framework and prototypes for detecting and presenting user behavior from its smart meter power consumption data. The framework aims at presenting the detected user behavior in natural language reports. In order to validate the proposed framework, an experiment has been performed and the results have been presented.
△ Less
Submitted 13 June, 2018;
originally announced July 2018.
-
Real-time Load Prediction with High Velocity Smart Home Data Stream
Authors:
Christoph Doblander,
Martin Strohbach,
Holger Ziekow,
Hans-Arno Jacobsen
Abstract:
This paper addresses the use of smart-home sensor streams for continuous prediction of energy loads of individual households which participate as an agent in local markets. We introduces a new device level energy consumption dataset recorded over three years wich includes high resolution energy measurements from electrical devices collected within a pilot program. Using data from that pilot, we an…
▽ More
This paper addresses the use of smart-home sensor streams for continuous prediction of energy loads of individual households which participate as an agent in local markets. We introduces a new device level energy consumption dataset recorded over three years wich includes high resolution energy measurements from electrical devices collected within a pilot program. Using data from that pilot, we analyze the applicability of various machine learning mechanisms for continuous load prediction. Specifically, we address short-term load prediction that is required for load balancing in electrical micro-grids. We report on the prediction performance and the computational requirements of a broad range of prediction mechanisms. Furthermore we present an architecture and experimental evaluation when this prediction is applied in the stream.
△ Less
Submitted 12 August, 2017;
originally announced August 2017.
-
A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs
Authors:
Elias Stehle,
Hans-Arno Jacobsen
Abstract:
Sorting is at the core of many database operations, such as index creation, sort-merge joins, and user-requested output sorting. As GPUs are emerging as a promising platform to accelerate various operations, sorting on GPUs becomes a viable endeavour. Over the past few years, several improvements have been proposed for sorting on GPUs, leading to the first radix sort implementations that achieve a…
▽ More
Sorting is at the core of many database operations, such as index creation, sort-merge joins, and user-requested output sorting. As GPUs are emerging as a promising platform to accelerate various operations, sorting on GPUs becomes a viable endeavour. Over the past few years, several improvements have been proposed for sorting on GPUs, leading to the first radix sort implementations that achieve a sorting rate of over one billion 32-bit keys per second. Yet, state-of-the-art approaches are heavily memory bandwidth-bound, as they require substantially more memory transfers than their CPU-based counterparts.
Our work proposes a novel approach that almost halves the amount of memory transfers and, therefore, considerably lifts the memory bandwidth limitation. Being able to sort two gigabytes of eight-byte records in as little as 50 milliseconds, our approach achieves a 2.32-fold improvement over the state-of-the-art GPU-based radix sort for uniform distributions, sustaining a minimum speed-up of no less than a factor of 1.66 for skewed distributions.
To address inputs that either do not reside on the GPU or exceed the available device memory, we build on our efficient GPU sorting approach with a pipelined heterogeneous sorting algorithm that mitigates the overhead associated with PCIe data transfers. Comparing the end-to-end sorting performance to the state-of-the-art CPU-based radix sort running 16 threads, our heterogeneous approach achieves a 2.06-fold and a 1.53-fold improvement for sorting 64 GB key-value pairs with a skewed and a uniform distribution, respectively.
△ Less
Submitted 19 May, 2017; v1 submitted 3 November, 2016;
originally announced November 2016.
-
A Step towards Advanced Metering for the Smart Grid: A Survey of Energy Monitors
Authors:
Anwar Ul Haq,
Hans-Arno Jacobsen
Abstract:
The smart grid initiative has encouraged utility companies worldwide to rollout new and smarter versions of energy meters. Before an extensive rollout, which is both labor-intensive and incurs high capital costs, consumers need to be incentivized to reap the long-term benefits of smart grid. Off-the-shelf energy monitors can provide consumers with an insight of such potential benefits. Since energ…
▽ More
The smart grid initiative has encouraged utility companies worldwide to rollout new and smarter versions of energy meters. Before an extensive rollout, which is both labor-intensive and incurs high capital costs, consumers need to be incentivized to reap the long-term benefits of smart grid. Off-the-shelf energy monitors can provide consumers with an insight of such potential benefits. Since energy monitors are owned by the consumer, the consumer has greater control over data which significantly reduces privacy and data confidentiality concerns. We evaluate several existing energy monitors using an online technical survey and online product literature. For consumers, the use of different off-the-shelf energy monitors can help demonstrate the potential gains of smart grid. Our survey indicates a trend towards incorporation of state-of-the-art capabilities, like appliance level monitoring through load disaggregation in energy monitors, which can encourage effective consumer participation. Multiple sensor types and ratings allow some monitors to operate in various configurations and environments.
△ Less
Submitted 26 July, 2016;
originally announced July 2016.
-
DualTable: A Hybrid Storage Model for Update Optimization in Hive
Authors:
Songlin Hu,
Wantao Liu,
Tilmann Rabl,
Shuo Huang,
Ying Liang,
Zheng Xiao,
Hans-Arno Jacobsen,
Xubin Pei,
Jiye Wang
Abstract:
Hive is the most mature and prevalent data warehouse tool providing SQL-like interface in the Hadoop ecosystem. It is successfully used in many Internet companies and shows its value for big data processing in traditional industries. However, enterprise big data processing systems as in Smart Grid applications usually require complicated business logics and involve many data manipulation operation…
▽ More
Hive is the most mature and prevalent data warehouse tool providing SQL-like interface in the Hadoop ecosystem. It is successfully used in many Internet companies and shows its value for big data processing in traditional industries. However, enterprise big data processing systems as in Smart Grid applications usually require complicated business logics and involve many data manipulation operations like updates and deletes. Hive cannot offer sufficient support for these while preserving high query performance. Hive using the Hadoop Distributed File System (HDFS) for storage cannot implement data manipulation efficiently and Hive on HBase suffers from poor query performance even though it can support faster data manipulation.There is a project based on Hive issue Hive-5317 to support update operations, but it has not been finished in Hive's latest version. Since this ACID compliant extension adopts same data storage format on HDFS, the update performance problem is not solved.
In this paper, we propose a hybrid storage model called DualTable, which combines the efficient streaming reads of HDFS and the random write capability of HBase. Hive on DualTable provides better data manipulation support and preserves query performance at the same time. Experiments on a TPC-H data set and on a real smart grid data set show that Hive on DualTable is up to 10 times faster than Hive when executing update and delete operations.
△ Less
Submitted 1 December, 2014; v1 submitted 28 April, 2014;
originally announced April 2014.