Local project leader
Prof. Hans P. Reiser
Research team members
Networked IT systems today have high demands on reliability, availability and security. Replication of services is a fundamental mechanism to meet these requirements. In order to achieve scalability at the same time, many approaches exist, especially with regard to storage services which require weak consistency requirements however stronger consistency guarantee for e.g. for replicated data. This applies, for example, to coordination services such as ZooKeeper, HDFS's nameode, or identity management services. If Byzantine fault models are used, weak consistency models are also unsuitable, since for consistency reasons divergent values are not distinguishable from faulty ones. This project is therefore focused on replication procedures suitable for services with strong consistency requirements and for Byzantine fault models. State-machine replication (SMR) is an established method for replication. In this approach a distributed agreement or a totally ordered multicast and a deterministic execution of all activities are used. These mechanisms are very complex and open up a wide range of configurable parameters ranging from selecting different protocols to setting timeout values. Today the deterministic execution is usually achieved by sequential processing of the requests, which is unacceptable with the increasing distribution of multi-core systems.
In practice, the behavior of a system depends, among other things, on the communication latencies, the network throughput, the frequency of errors, the number of parallel CPUs, and the internal concurrency of the application. The overall goal of this project is to explore dynamically adaptable algorithms for group communication and deterministic multithreading as well as strategies for self-configuration and self-optimization for both sub-aspects. As results, we expect elementary knowledge about the relationships between environmental conditions, application behavior and configuration parameters, or the various algorithms that can be used.
A prototype implementation for a reconfigurable and self-adapting group communication system as well as for a self-optimizing deterministic scheduler is to be designed and integrated into a framework for replicated services with which finally practical evaluations are possible. We expect self-adapting systems to behave better than rigidly configured or non-configurable systems. The project is thus a basic contribution to ultimately lead SMR-based systems closer to practice.
Resource-Efficient State-Machine Replication with Multithreading and Vertical Scaling
Proc. of the 14th European Dependable Computing Conference (EDCC)
WebBFT: Byzantine fault tolerance for resilient interactive web applications
Proc. of the 18th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS 2018)
Visualizing BFT SMR distributed systems -- example of BFT-SMaRt
DSN Workshop on Byzantine Consensus and Resilient Blockchains
Emusphere: Evaluating Planetary-Scale Distributed Systems in Automated Emulation Environments
The 35th International Symposium on Reliable Distributed Systems Workshops (SRDSW 2016)
Johannes Köstler, Hans P. Reiser
|Johannes Köstler, Hans P. Reiser|
PEDSEWAN: Platform for the Evaluation of Distributed Systems in Emulated Wide-Area Networks
Presentation, Frühjahrstreffen der GI-Fachgruppe Betriebssysteme, Graz, Feb. 2016