Logo of the University of Passau

OptSCORE 2

Local project leader

Prof. Hans P. Reiser

Research team members

Johannes Köstler

Project partners

Summary

The goals of OptSCORE's second funding phase include the finalization of previous goals. These include the adaption of system parameters to the specific needs of arbitrary applications and system environments. This way, we will reach an autonomous optimization of State Machine Replication (SMR) systems. Apart from that, we will extend our system with preventive as well as reactive fault handling measures and add new optimization dimensions. We will assess the various evaluation aspects with extensive testing using custom evaluation strategies in both artificial and realistic application scenarios. This research project aims at making SMR based replication more practicable and finding efficient implementation measures.

SMR is a promising approach for providing resilience guarantees to IT systems. Replicated state machines are able to mask Byzantine failures and guarantee a strong consistent view on the replicated data. The transition from a simple service to a replicated service implies typically a much higher resource utilization as well as further performance decreases like a smaller throughput. In the first funding phase measurements like deterministic multithreading (DMT) and individually weighted replicas have been studied to remedy those performance decreases. In addition to that, a variety of configuration parameters that are reconfigurable during run-time have been identified and analyzed.

In the second funding phase, an autonomous and automatic adaption of those parameters will now be realized, so that the throughput and the request latency are optimized for given applications, usage scenarios and execution conditions like network latencies and error rates. We further want to integrate machine learning to cope with the immense complexity introduced by those configurable parameters and to enable an effective coordination in order to optimize every aspect of the system performance. Hereby, the biggest challenges are the selection of a suitable machine learning approach as well as the creation of appropriate training data. Furthermore, the adaption of the system components must be done in a way that dynamic adaptions of parameters are always possible during runtime and that the consistency and availability guarantees are not affected.

Additional goals are on the one hand the development of a security concept that addresses the yet unsolved problem that nowadays systems are satisfied with only tolerating faults. Such systems undertake no efforts in detecting such faults and in recovering from them. Doing so, however, should minimize the effect of faults on the performance and increase the resilience compared to current SMR systems. Apart from that, our prototype system from the first funding phase will be extended with further optimization strategies. Hereby we want to find out if replacing the total order of request with a partial one will always result in better performance in the area of both the group communication and the deterministic scheduling. Moreover, we want to minimize the overhead that is introduced by the periodic state checkpointing.

Funding

Deutsche Forschungsgemeinschaft

Project-related publications

2021

Network Federation for Inter-Cloud Operations

J. Köstler, S. Gebauer and H. P. Reiser, "Network Federation for Inter-Cloud Operations" in Proc. of the 21th IFIP International Conference on Distributed A pplications and Interoperable Systems (DAIS 2021) , 2021.

SmartStream: Towards Byzantine Resilient Data Streaming

H. P. Reiser, G. Habiger and F. J. Hauck, "SmartStream: Towards Byzantine Resilient Data Streaming" in Proc. of the 36th ACM/SIGAPP Symposium on Applied Computing (SAC '21) , 2021.

SmartStream: Towards Efficient Byzantine Resilient Data Streaming through Speculation and Sharding

J. Köstler, H. P. Reiser, G. Habiger and F. J. Hauck, "SmartStream: Towards Efficient Byzantine Resilient Data Streaming through Speculation and Sharding" , SIGAPP Appl. Comput. Rev. , vol. 21, no. 3, pp. 19-32, 2021. Association for Computing Machinery.

DOI: 10.1145/3493499.3493501

File: https://doi.org/10.1145/3493499.3493501

2020

AWARE: Adaptive Wide-Area Replication for Fast and Resilient Byzantine Consensus

C. Berger, H. P. Reiser, J. Sousa and A. Bessani, "AWARE: Adaptive Wide-Area Replication for Fast and Resilient Byzantine Consensus" , IEEE Transactions on Dependable and Secure Computing , 2020.

DOI: 10.1109/TDSC.2020.3030605

Self-optimising Application-agnostic Multithreading for Replicated State Machines

G. Habiger, F. J. Hauck, H. P. Reiser and J. Köstler, "Self-optimising Application-agnostic Multithreading for Replicated State Machines" in Proc. of the 39st IEEE Symposium on Reliable Distributed Systems (SRDS 2020) , 2020.

2019

Resilient Wide-Area Byzantine Consensus Using Adaptive Weighted Replication

C. Berger, H. P. Reiser, J. Sousa and A. Bessani, "Resilient Wide-Area Byzantine Consensus Using Adaptive Weighted Replication" in Proc. of the 38th IEEE Symposium on Reliable Distributed Systems (SRDS'19) , 2019.

2018

Resource-Efficient State-Machine Replication with Multithreading and Vertical Scaling

G. Habiger, F. J. Hauck, J. Köstler and H. P. Reiser, "Resource-Efficient State-Machine Replication with Multithreading and Vertical Scaling" in Proc. of the 14th European Dependable Computing Conference (EDCC) , 2018.

Visualizing BFT SMR distributed systems -- example of BFT-SMaRt

N. Rakotondravony and H. P. Reiser, "Visualizing BFT SMR distributed systems -- example of BFT-SMaRt" in DSN Workshop on Byzantine Consensus and Resilient Blockchains , 2018.

WebBFT: Byzantine fault tolerance for resilient interactive web applications

C. Berger and H. P. Reiser, "WebBFT: Byzantine fault tolerance for resilient interactive web applications" in Proc. of the 18th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS 2018) , 2018.

2016

Emusphere: Evaluating Planetary-Scale Distributed Systems in Automated Emulation Environments

J. Köstler, J. Seidemann and H. P. Reiser, "Emusphere: Evaluating Planetary-Scale Distributed Systems in Automated Emulation Environments" in The 35th International Symposium on Reliable Distributed Systems Workshops (SRDSW 2016) , 2016.

I agree that a connection to the Vimeo server will be established when the video is played and that personal data (e.g. your IP address) will be transmitted.
I agree that a connection to the YouTube server will be established when the video is played and that personal data (e.g. your IP address) will be transmitted.
Show video