ResumeNet - framework for network resilience
The ResumeNet project investigated a framework for network resilience. The framework consists of a number of components, including approaches to evaluate network resilience based on metrics, and architectures that can be used to detect challenges and mitigate them in real-time. The project's outcomes on resilience metrics was heavily cited by the European Network and Information Security Agency (ENISA).
In addition to the ResumeNet framework, innovative resilience mechanisms were developed, including novel multipath routing schemes and resilient Session Initiation Protocol (SIP) schemes. The ResumeNet framework has been experimentally evaluated in future Internet scenarios: wireless mesh networks; disruption tolerant networks; a multimedia service provisioning context; and in an Internet of Things environment. The ResumeNet project was rated as "excellent" at its final review.
|Principal Investigator(s) at the University||Prof. Dr. Hermann de Meer (Lehrstuhl für Informatik mit Schwerpunkt Rechnernetze und Rechnerkommunikation)|
|Project period||01.09.2008 - 31.12.2011|
|Source of funding|
Europäische Union (EU) > EU - 7. Forschungsrahmenprogramm (7. FRP)
Society increasingly depends on networks in general and the Internet in particular, for just about every aspect of daily lives. Consumers use the Internet to access information, obtain products and services, manage finances, and communicate with one another. Businesses use the Internet to conduct business with consumers and other businesses. Nations rely on the Internet to conduct the affairs of government, deliver services to their citizens, and, to some extent, manage homeland security and conduct military operations.
As the Internet increases its reach in global scope, services traditionally implemented on separate networks are increasingly subsumed by the Internet, either as overlays, gateway access, or replacement for the legacy networks. These include the PSTN (public switched telephone network -- wired and wireless), SCADA (supervisory control and data acquisition) networks for managing the power grid and other critical infrastructure, sensor networks, mobile ad hoc networks, and military networks.
With this increasing dependence on Internet and the integration of services in it, increasingly severe consequences come from the disruption of networked services. Life of individuals and the quality of life, the economic viability of businesses and organizations, and the security of nations are directly linked to the resilience, survivability, and dependability of the Global Internet.
Ironically, the increased dependence and sophistication of services make the Internet more vulnerable to problems. Mobile wireless Internet access is more susceptible to the challenges of dynamicity, weakly connected channels, and unpredictable delay. The Internet is an increasingly attractive target to recreational crackers, industrial espionage, terrorists, and information warfare.
It is also generally recognized that the Internet has evolved over many years without the resilience, manageability, and security needed for the future. Enhancements to the existing Internet infrastructure are hampered by the need for backward compatibility, and this in turn has resulted in important, yet isolated, tweaks to particular parts of the infrastructure, such as the optical ring restoration mechanisms. There has been very little research on a systemic and systematic approach to Internet resilience.
We propose a fundamentally new architectural approach to Internet resilience that is multilevel, systemic, and systematic. At the same time, we aim to maximize interoperability with legacy network components.
We believe in a green-field approach that asks the question: "How does one approach the architecture, design, deployment, and operations of the Internet so that resilience emerges as a fundamental property?" By multilevel we mean all levels of the network architecture and design along three dimensions:
- protocol layer (that is, from physical links through network organization and end-to-end transport, to communicating applications, including overlays)
- data, control, and management planes, and
- fault-tolerant network components, from survivable network topologies, to the global Internet including end systems.
We define resilience as the ability of the network to provide and maintain an acceptable level of service in the face of various faults and challenges to normal operation. This service includes the ability for users and applications to access information when needed (e.g., Web browsing and sensor monitoring), the maintenance of end-to-end communication association (e.g., tele- and video conferences), and the operation of distributed processing and networked storage. The challenges that may impact normal operation include:
- unintentional misconfiguration or operational mistakes
- large-scale natural disasters (e.g., hurricanes, earthquakes, ice storms, tsunami, floods)
- malicious attacks from intelligent adversaries against the network hardware, software, or protocol infrastructure including DDoS (distributed denial of service) attacks
- environmental challenges of mobility, weak channels, and unpredictably long delay
- unusual but legitimate traffic load such as a flash crowds
Our definition of resilience is therefore a superset of commonly used definitions for survivability, dependability, and fault tolerance.
We summarize our architectural approach as follows:
Firstly we develop a set of architectural principles on which resilient systems in general, and the Internet in particular, should be based. Examples of such principles are
- resource tradeoffs
We also define the relationship between these principles. For example, while redundancy clearly can increase the resilience of a system, a key question is how and where to apply this redundancy. Infinite redundancy may maximize resilience, but at infinite cost. Therefore the question becomes how to apply redundancy in the system in the right places, at the proper levels, given the requirements of users and applications and given real cost constraints.
Secondly, we characterize the challenges for the network operation, to understand the threats against which the network must be resilient. The resilience aim can be generally achieved via a six-step strategy, which could be neatly described with the help of the castle analogy:
- Defence, according to which the Internet is made robust to challenges and attacks (analogy: strong castle wall);
- Detection of an adverse event or challenge that has impaired normal operation of the Internet and degraded services (analogy: guards on the castle wall);
- Remediation in which action is autonomously taken to continue operations as much as possible and to mitigate the damage (analogy: boiling oil and fortification of internal walls when the castle wall is breached by a trebuchet);
- Recovery to original normal operations once the adverse event has ended or the attacker has been repelled (analogy: cleaning up the oil and repairing the hole in the castle wall);
- Diagnose the root cause of the challenge that impaired normal operation. This could be used to improve the system design and effect the recovery to a better state (analogy: determine the way in which enemy soldiers entered the inner walls of the castle); and
- Refinement of future behavior based on reflection of the previous cycle (analogy: construction of a thicker wall that will defend against current and predicted trebuchet technology).
This high-level model can then be applied to particular contexts of network design, such as routing and end-to-end protocols, resulting in particular mechanisms that address specific challenges, both being a subset of the aforementioned categorizations.
In ResumeNet, besides detailing and quantifying the aforementioned framework, we will also look into particular mechanisms that can be viewed as its building blocks (monitoring, learning processes, decision engines). It is, in fact, the synthesis of these blocks that will enforce resilience to the various network layers and one of the questions pursued in the proposed project is to what extent could their systematic definition ease their reuse and result in scalable solutions.
Last, but not least, the proposed project picks particular network-level and service provision scenarios for deepening into the mechanism-level analysis and carrying out their experimental evaluation. The scenarios to be implemented on top of existing to-be-enhanced test beds are a well balanced mix of networking scenarios with both short-term and longer-term potential for commercial exploitation.
This publication material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.