1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
|
.. image:: opnfv-logo.png
:height: 40
:width: 200
:alt: OPNFV
:align: left
******************
Introduction
******************
This High Availability Requirement Analysis Document is used for eliciting High Availability
Requirements of OPNFV. The document will refine high-level High Availability goals, into
detailed HA mechanism design. And HA mechanisms are related with potential failures on
different layers in OPNFV. Moreover, this document can be used as reference for HA Testing
scenarios design.
A requirement engineering model KAOS is used in this document.
******************
Terminologies and Symbols
******************
The following concepts in KAOS will be used in the diagrams of this document.
- **Goal**: The objective to be met by the target system.
- **Obstacle**: Condition whose satisfaction may prevent some goals from being achieved.
- **Agent**: Active Object performing operations to achieve goals.
- **Requirement**: Goal assigned to an agent of the software being studied.
- **Domain Property**: Descriptive assertion about objects in the environment of the software.
- **Refinement**: Relationship linking a goal to other goals that are called its subgoals.
Each subgoal contributes to the satisfaction of the goal it refines. There are two types of
refinements: AND refinement and OR refinement, which means whether the goal can be archived by
satisfying all of its sub goals or any one of its sub goals.
- **Conflict**: Relationship linking an obstacle to a goal if the obstacle obstructs the goal
from being satisfied.
- **Resolution**: Relationship linking a goal to an obstacle if the goal can resolve the
obstacle.
- **Responsibility**: Relationship between an agent and a requirement. Holds when an agent is
assigned the responsibility of achieving the linked requirement.
Figure 1 shows how these concepts are displayed in a KAOS diagram.
.. figure:: images/fig1_KAOS_Sample.png
:alt: KAOS Sample
:figclass: align-center
Fig 1. A KAOS Sample Diagram
******************
High Availability Goals of OPNFV
******************
Overall Goals
>>>>>>>>>>>>>>>>>>
The Final Goal of OPNFV High Availability is to provide high available VNF services. And the
following objectives are required to meet:
- There should be no single point of failure in the NFV framework.
- All resiliency mechanisms shall be designed for a multi-vendor environment, where for example
the NFVI, NFV-MANO, and VNFs may be supplied by different vendors.
- Resiliency related information shall always be explicitly specified and communicated using
the reference interfaces (including policies/templates) of the NFV framework.
Service Level Agreements of OPNFV HA
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Service Level Agreements of OPNFV HA are mainly focused on time constraints of service outage,
failure detection, failure recovery. The following table outlines the SLA metrics of different
service availability levels described in ETSI GS NFV-REL 001 V1.1.1 (2015-01). Table 1 shows
time constraints of different Service Availability Levels. In this document, SAL1 is the
default benchmark value required to meet.
*Table 1. Time Constraints for Different Service Availability Levels*
+--------------------------------+----------------------------+------------------------+
| Service Availability Level | Failure Detection Time | Failure Recovery Time |
+================================+============================+========================+
| SAL1 | <1s | 5-6s |
+--------------------------------+----------------------------+------------------------+
| SAL2 | <5s | 10-15s |
+--------------------------------+----------------------------+------------------------+
| SAL3 | <10s | 20-25s |
+--------------------------------+----------------------------+------------------------+
******************
Overall Analysis
******************
Figure 2 shows the overall decomposition of high availability goals. The high availability of
VNF Services can be refined to high availability of VNFs, MANO, and the NFVI where VNFs are
deployed; the high availability of NFVI Service can be refined to high availability of Virtual
Compute Instances, Virtual Storage and Virtual Network Services; the high availability of
virtual instance is either the high availability of containers or the high availability of VMs,
and these high availability goals can be further decomposed by how the NFV environment is
deployed.
.. figure:: images/fig2_Total_Framework.png
:alt: Overall HA Analysis of OPNFV
:figclass: align-center
Fig 2. Overall HA Analysis of OPNFV
Thus the high availability requirement of VNF services can be classified into high availability
requirements on different layers in OPNFV. The following layers are mainly discussed in this
document:
- VNF HA
- MANO HA
- Virtual Infrastructure HA (container HA or VM HA)
- VIM HA
- SDN HA
- Hypervisor HA
- Host OS HA
- Hardware HA
The next section will illustrate detailed analysis of HA requirements on these layers.
******************
Detailed Analysis
******************
VNF HA
>>>>>>>>>>>>>>>>>>
.. TBD
MANO HA
>>>>>>>>>>>>>>>>>>
.. TBD
Virtual Infrastructure HA
>>>>>>>>>>>>>>>>>>
The Virtual Infrastructure HA in OPNFV includes container HA and VM HA.
VM HA
::::::::::::::::::::::::::::::::::::::
This part describes a set of new optional capabilities where the OpenStack Cloud messages into the Guest
VMs in order to provide improved Availability of the Host VMs.
Table 2 shows the potential faults of VMs and corresponding initial solution capabilities or methods.
*Table 2. Potential Faults of VMs and the initial solution capabilities*
+---------------------------+------------------------------------+--------------------------------------------+
| Fault | Description | solution capabilities |
+===========================+====================================+============================================+
| VM faults | General internal VM faults | VM Heartbeating and Health Checking |
+---------------------------+------------------------------------+--------------------------------------------+
| VM Server Group faults | such as split brain | VM Peer State Notification and Messaging |
+---------------------------+------------------------------------+--------------------------------------------+
.. figure:: images/fig3_VM_HA_Analysis.png
:alt: VM HA
:figclass: align-center
Fig 3. VM HA Analysis
NOTE: A Server Group here is the OpenStack Nova Server Group concept where VMs
are grouped together for purposes of scheduling. E.g. A specific Server Group
instance can specify whether the VMs within the group should be scheduled to
run on the same compute host or different compute hosts. A 'peer' VM in the
context of this section refers to a VM within the same Nova Server Group.
The initial set of new capabilities include: enabling the
detection of and recovery from internal VM faults and providing
a simple out-of-band messaging service to prevent scenarios such
as split brain.
More detailed description is located in R5_HA_API/OPNFV_HA_Guest_APIs-Overview_HLD.rst in this project.
The Host-to-Guest messaging APIs used by the services discussed
in this Virtual Infrastructure HA part use a JSON-formatted application messaging layer
on top of a virtio serial device between QEMU on the OpenStack Host
and the Guest VM. Use of the virtio serial device provides a
simple, direct communication channel between host and guest which is
independent of the Guest's L2/L3 networking.
The upper layer JSON messaging format is actually structured as a
hierarchical JSON format containing a Base JSON Message Layer and an
Application JSON Message Layer:
- the Base Layer provides the ability to multiplex different groups of message types on top of a single virtio serial device
e.g.
+ heartbeating and healthchecks,
+ server group messaging,
and
- the Application Layer provides the specific message types and fields of a particular group of message types.
A) VM Heartbeating and Health Checking
.. figure:: images/fig4_Heartbeating_and_Healthchecks.png
:alt: Heartbeating and Healthchecks
:figclass: align-center
Fig 4. Heartbeating and Healthchecks
VM Heartbeating and Health Checking provides a heartbeat service to enhance
the monitoring of the health of guest application(s) within a VM running
under the OpenStack Cloud. Loss of heartbeat or a failed health check status
will result in a fault event being reported to OPNFV's DOCTOR infrastructure
for alarm identification, impact analysis and reporting. This would then enable
VNF Managers (VNFMs) listening to OPNFV's DOCTOR External Alarm Reporting through
Telemetry's AODH, to initiate any required fault recovery actions.
Guest heartbeat works on a challenge response model. The OpenStack Guest Heartbeat
Service on the compute node will challenge the registered Guest VM daemon with a
message each interval. The registered Guest VM daemon must respond prior to the
next interval with a message indicating good health. If the OpenStack Host does
not receive a valid response, or if the response specifies that the VM is in ill
health, then a fault event for the Guest VM is reported to the OpenStack Guest
Heartbeat Service on the controller node which will report the event to OPNFV's
DOCTOR (i.e. thru the Doctor SouthBound (SB) APIs).
In summary, the Guest Heartbeating Messaging Specification is quite simple,
including the following PDUs: Init, Init-Ack, Challenge-Request,
Challenge-Response, Exit. The Challenge-Response returning a healthy /
not-healthy boolean.
The registered Guest VM daemon's response to the challenge can be as simple
as just immediately responding with OK. This alone allows for detection of
a failed or hung QEMU/KVM instance, or a failure of the OS within the VM to
schedule the registered Guest VM's daemon or failure to route basic IO within
the Guest VM.
However the registered Guest VM daemon's response to the challenge can be more
complex, running anything from a quick simple sanity check of the health of
applications running in the Guest VM, to a more thorough audit of the
application state and data. In either case returning the status of the
health check enables the OpenStack host to detect and report the event in order
to initiate recovery from application level errors or failures within the Guest VM.
B) VM Peer State Notification and Messaging
.. figure:: images/fig5_VM_Peer_State_Notification_and_Messaging.png
:alt: VM Peer State Notification and Messaging
:figclass: align-center
Fig 5. VM Peer State Notification and Messaging
Server Group State Notification and Messaging is a service to provide
simple low-bandwidth datagram messaging and notifications for servers that
are part of the same server group. This messaging channel is available
regardless of whether IP networking is functional within the server, and
it requires no knowledge within the server about the other members of the group.
This Server Group Messaging service provides three types of messaging:
- Broadcast: this allows a server to send a datagram (size of up to 3050 bytes)
to all other servers within the server group.
- Notification: this provides servers with information about changes to the
(Nova) state of other servers within the server group.
- Status: this allows a server to query the current (Nova) state of all servers within
the server group (including itself).
A Server Group Messaging entity on both the controller node and the compute nodes manage
the routing of of VM-to-VM messages through the platform, leveraging Nova to determine
Server Group membership and compute node locations of VMs. The Server Group Messaging
entity on the controller also listens to Nova VM state change notifications and querys
VM state data from Nova, in order to provide the VM query and notification functionality
of this service.
This service is not intended for high bandwidth or low-latency operations. It is best-effort,
not reliable. Applications should do end-to-end acks and retries if they care about reliability.
This service provides building block type capabilities for the Guest VMs that
contribute to higher availability of the VMs in the Guest VM Server Group. Notifications
of VM Status changes potentially provide a faster and more accurate notification
of failed peer VMs than traditional peer VM monitoring over Tenant Networks. While
the Broadcast Messaging mechanism provides an out-of-band messaging mechanism to
monitor and control a peer VM under fault conditions; e.g. providing the ability to
avoid potential split brain scenarios between 1:1 VMs when faults in Tenant
Networking occur.
Container HA
::::::::::::::::::::::::::::
The container HA in OPNFV is mainly focus on Kubernetes(K8s) platform. And using the Pod as
the smallest unit of management, creation, and planning, the K8s' container HA actually means
the High Availability of running Pods.
Table 3 shows the potential faults of running pods in K8s. when it happens, the ReplicationController
or ReplicaSet can prevent the services provided by the pod from being unavailable, as is shown in
figure 6.
*Table 3. Potential Faults in VIM level*
+------------+--------------+----------------------------------------------------+----------------+
| Service | Fault | Description | Severity |
+============+==============+====================================================+================+
| | | All Containers in the Pod have terminated, and | |
| Running by | Pod failure | at least one Container has terminated in failure. | Critical |
| pods | | That is, the Container either exited with non-zero | |
| | | status or was terminated by the system. | |
+------------+--------------+----------------------------------------------------+----------------+
.. figure:: images/fig6_Container_HA_analysis_in_K8s.png
:alt: VIM HA Analysis
:figclass: align-center
Fig 6. Container HA analysis in K8s
The Replication Controller or ReplicaSet (ReplicaSet is the next-generation Replication Controller)
is a kind of K8s Master Components, which ensures that a specified number of pod replicas are running
at any one time.
The following requirements are elicited for Pod HA:
**[Req 5.3.1]** A pod or a homogeneous set of pods is always up and available until terminated properly.
**[Req 5.3.2]** The ReplicationController or ReplicaSet should terminate the extra pods If there are
more pods than specified number.
**[Req 5.3.3]** The ReplicationController or ReplicaSet should start more pods If there are fewer pods
than specified number.
**[Req 5.3.4]** The new Pod should be scheduled to other Nodes, if detecting the failure state of the
host or container.
VIM HA
>>>>>>>>>>>>>>>>>>
OpenStack High Availability
::::::::::::::::::::::::::::
The VIM in the NFV reference architecture contains different components of Openstack, SDN
controllers and other virtual resource controllers. VIM components can be classified into three
types:
- **Entry Point Components**: Components that give VIM service interfaces to users, like nova-
api, neutron-server.
- **Middlewares**: Components that provide load balancer services, messaging queues, cluster
management services, etc.
- **Subcomponents**: Components that implement VIM functions, which are called by Entry Point
Components but not by users directly.
Table 4 shows the potential faults that may happen on VIM layer. Currently the main focus of
VIM HA is the service crash of VIM components, which may occur on all types of VIM components.
To prevent VIM services from being unavailable, Active/Active Redundancy, Active/Passive
Redundancy and Message Queue are used for different types of VIM components, as is shown in
figure 7.
*Table 4. Potential Faults in VIM level*
+------------+------------------+-------------------------------------------------+----------------+
| Service | Fault | Description | Severity |
+============+==================+=================================================+================+
| General | Service Crash | The processes of a service crashed unnormally. | Critical |
+------------+------------------+-------------------------------------------------+----------------+
.. figure:: images/fig7_VIM_Analysis.png
:alt: VIM HA Analysis
:figclass: align-center
Fig 7. VIM HA Analysis
A) Active/Active Redundancy
Active/Active Redundancy manages both the main and redundant systems concurrently. If there is
a failure happens on a component, the backups are already online and users are unlikely to
notice that the failed VIM component is under fixing. A typical Active/Active Redundancy will
have redundant instances, and these instances are load balanced via a virtual IP address and a
load balancer such as HAProxy.
When one of the redundant VIM component fails, the load balancer should be aware of the
instance failure, and then isolate the failed instance from being called until it is recovered.
The requirement decomposition of Active/Active Redundancy is shown in Figure 8.
.. figure:: images/fig8_Active_Active_Redundancy.png
:alt: Active/Active Redundancy Requirement Decomposition
:figclass: align-center
Fig 8. Active/Active Redundancy Requirement Decomposition
The following requirements are elicited for VIM Active/Active Redundancy:
**[Req 5.4.1]** Redundant VIM components should be load balanced by a load balancer.
**[Req 5.4.2]** The load balancer should check the health status of VIM component instances.
**[Req 5.4.3]** The load balancer should isolate the failed VIM component instance until it is
recovered.
**[Req 5.4.4]** The alarm information of VIM component failure should be reported.
**[Req 5.4.5]** Failed VIM component instances should be recovered by a cluster manager.
Table 5 shows the current VIM components using Active/Active Redundancy and the corresponding
HA test cases to verify them.
*Table 5. VIM Components using Active/Active Redundancy*
+-------------------+-------------------------------------------------------+----------------------+
| Component | Description | Related HA Test Case |
+===================+=======================================================+======================+
| nova-api | endpoint component of Openstack Compute Service Nova | yardstick_tc019 |
+-------------------+-------------------------------------------------------+----------------------+
| nova-novncproxy | server daemon that serves the Nova noVNC Websocket | |
| | Proxy service, which provides a websocket proxy that | |
| | is compatible with OpenStack Nova noVNC consoles. | |
+-------------------+-------------------------------------------------------+----------------------+
| neeutron-server | endpoint component of Openstack Networking Service | yardstick_tc045 |
| | Neutron | |
+-------------------+-------------------------------------------------------+----------------------+
| keystone | component of Openstack Identity Service Service | yardstick_tc046 |
| | Keystone | |
+-------------------+-------------------------------------------------------+----------------------+
| glance-api | endpoint component of Openstack Image Service Glance | yardstick_tc047 |
+-------------------+-------------------------------------------------------+----------------------+
| glance-registry | server daemon that serves image metadata through a | |
| | REST-like API. | |
+-------------------+-------------------------------------------------------+----------------------+
| cinder-api | endpoint component of Openstack Block Storage Service | yardstick_tc048 |
| | Service Cinder | |
+-------------------+-------------------------------------------------------+----------------------+
| swift-proxy | endpoint component of Openstack Object Storage | yardstick_tc049 |
| | Swift | |
+-------------------+-------------------------------------------------------+----------------------+
| horizon | component of Openstack Dashboard Service Horizon | |
+-------------------+-------------------------------------------------------+----------------------+
| heat-api | endpoint component of Openstack Stack Service Heat | yardstick_tc091 |
+-------------------+-------------------------------------------------------+----------------------+
| mysqld | database service of VIM components | yardstick_tc090 |
+-------------------+-------------------------------------------------------+----------------------+
B)Active/Passive Redundancy
Active/Passive Redundancy maintains a redundant instance that can be brought online when the
active service fails. A typical Active/Passive Redundancy maintains replacement resources that
can be brought online when required. Requests are handled using a virtual IP address (VIP) that
facilitates returning to service with minimal reconfiguration. A cluster manager (such as
Pacemaker or Corosync) monitors these components, bringing the backup online as necessary.
When the main instance of a VIM component is failed, the cluster manager should be aware of the
failure and switch the backup instance online. And the failed instance should also be recovered
to another backup instance. The requirement decomposition of Active/Passive Redundancy is shown
in Figure 9.
.. figure:: images/fig9_Active_Passive_Redundancy.png
:alt: Active/Passive Redundancy Requirement Decomposition
:figclass: align-center
Fig 9. Active/Passive Redundancy Requirement Decomposition
The following requirements are elicited for VIM Active/Passive Redundancy:
**[Req 5.4.6]** The cluster manager should replace the failed main VIM component instance with
a backup instance.
**[Req 5.4.7]** The cluster manager should check the health status of VIM component instances.
**[Req 5.4.8]** Failed VIM component instances should be recovered by the cluster manager.
**[Req 5.4.9]** The alarm information of VIM component failure should be reported.
Table 6 shows the current VIM components using Active/Passive Redundancy and the corresponding
HA test cases to verify them.
*Table 6. VIM Components using Active/Passive Redundancy*
+-------------------+-------------------------------------------------------+----------------------+
| Component | Description | Related HA Test Case |
+===================+=======================================================+======================+
| haproxy | load balancer component of VIM components | yardstick_tc053 |
+-------------------+-------------------------------------------------------+----------------------+
| rabbitmq-server | messaging queue service of VIM components | yardstick_tc056 |
+-------------------+-------------------------------------------------------+----------------------+
| corosync | cluster management component of VIM components | yardstick_tc057 |
+-------------------+-------------------------------------------------------+----------------------+
C) Message Queue
Message Queue provides an asynchronous communication protocol. In Openstack, some projects (
like Nova, Cinder) use Message Queue to call their sub components. Although Message Queue
itself is not an HA mechanism, how it works ensures the high availaibility when redundant
components subscribe to the Messsage Queue. When a VIM sub component fails, since there are
other redundant components are subscribing to the Message Queue, requests still can be processed.
And fault isolation can also be archived since failed components won't fetch requests actively.
Also, the recovery of failed components is required. Figure 10 shows the requirement
decomposition of Message Queue.
.. figure:: images/fig10_Message_Queue.png
:alt: Message Queue Requirement Decomposition
:figclass: align-center
Fig 10. Message Queue Redundancy Requirement Decomposition
The following requirements are elicited for Message Queue:
**[Req 5.4.10]** Redundant component instances should subscribe to the Message Queue, which is
implemented by the installer.
**[Req 5.4.11]** Failed VIM component instances should be recovered by the cluster manager.
**[Req 5.4.12]** The alarm information of VIM component failure should be reported.
Table 7 shows the current VIM components using Message Queue and the corresponding HA test cases
to verify them.
*Table 7. VIM Components using Messaging Queue*
+-------------------+-------------------------------------------------------+----------------------+
| Component | Description | Related HA Test Case |
+===================+=======================================================+======================+
| nova-scheduler | Openstack compute component determines how to | yardstick_tc088 |
| | dispatch compute requests | |
+-------------------+-------------------------------------------------------+----------------------+
| nova-cert | Openstack compute component that serves the Nova Cert | |
| | service for X509 certificates. Used to generate | |
| | certificates for euca-bundle-image. | |
+-------------------+-------------------------------------------------------+----------------------+
| nova-conductor | server daemon that serves the Nova Conductor service, | yardstick_tc089 |
| | which provides coordination and database query | |
| | support for Nova. | |
+-------------------+-------------------------------------------------------+----------------------+
| nova-compute | Handles all processes relating to instances (guest | |
| | vms). nova-compute is responsible for building a disk | |
| | image, launching it via the underlying virtualization | |
| | driver, responding to calls to check its state, | |
| | attaching persistent storage, and terminating it. | |
+-------------------+-------------------------------------------------------+----------------------+
| nova-consoleauth | Openstack compute component for Authentication of | |
| | nova consoles. | |
+-------------------+-------------------------------------------------------+----------------------+
| cinder-scheduler | Openstack volume storage component decides on | |
| | placement for newly created volumes and forwards the | |
| | request to cinder-volume. | |
+-------------------+-------------------------------------------------------+----------------------+
| cinder-volume | Openstack volume storage component receives volume | |
| | management requests from cinder-api and | |
| | cinder-scheduler, and routes them to storage backends | |
| | using vendor-supplied drivers. | |
+-------------------+-------------------------------------------------------+----------------------+
| heat-engine | Openstack Heat project server with an internal RPC | |
| | api called by the heat-api server. | |
+-------------------+-------------------------------------------------------+----------------------+
VIM HA in K8s
::::::::::::::::::::::::::::
The VIM HA in K8s can be generally analyzed from the following two concepts:
- **Master Components HA**: the HA of k8s components in Master. (for example, Kube-apiserver,
Kube-scheduler, Kube-controller-manager)
- **Data Storage HA**: the HA of etcd cluster. Actually etcd is a master component used as
Kubernetes' backing store for all cluster data. Considering that etcd is the only stateful service
in k8s and that its HA policy can be deployed independent on K8s, it is necessary to discuss the
HA of etcd separately.
Table 8 shows the potential faults that may happen in K8s.
*Table 8. Potential Faults in K8s*
+--------------------+------------------+----------------------------------------+----------------+
| Service | Fault | Description | Severity |
+====================+==================+========================================+================+
| Provided by Master | Master | A Master component crashed and can't | Critical |
| Components | Component crash | provide normal service. | |
+--------------------+------------------+----------------------------------------+----------------+
| Data storage | Etcd Crash | The Etcd cluster crashed unnormally. | Critical |
+--------------------+------------------+----------------------------------------+----------------+
.. figure:: images/fig11_VIM_HA_analysis_in_K8s.png
:alt: Message Queue Requirement Decomposition
:figclass: align-center
Fig 11. VIM HA analysis in K8s
Master components can be run on any machine in the cluster. However, for simplicity, all master
components are typically started on the same machine, and do not run user containers on this machine.
In this case, the K8s is based on a single Master, and only has container HA on application layer
realized by ReplicationController or ReplicaSet Master Component as mentioned in the container HA
part above.
The HA of Mater and its components in K8s must depend on the multi-master setup.
The Data Storage HA can use an existing Etcd HA cluster to realize, or can be realized as a master
component through multiple master implementation.
.. figure:: images/fig12_VIM_HA_analysis_in_K8s_2.png
:alt: Message Queue Requirement Decomposition
:figclass: align-center
Fig 12. VIM HA analysis in K8s(2)
In Multi-Master K8s, the Master Components HA is mainly based on the Leader Election function of Etcd
cluster. And load balancer is used to realize the HA of Kube-apiserver Master component.
The following requirements are elicited for Master components HA:
**[Req 5.4.13]** The Load Balancer should always forward the request to an available Kube-apiserver
instance.
**[Req 5.4.14]** The Master Component in the Leader state should confirm its Leader state to all
follower Components regularly through Heatbeat.
**[Req 5.4.15]** When a Master Component in the Leader state crashed, an available Master Component
should be elected as Leader.
Hypervisor HA
>>>>>>>>>>>>>>>>>>
.. TBD
Host OS HA
>>>>>>>>>>>>>>>>>>
.. TBD
Hardware HA
>>>>>>>>>>>>>>>>>>
.. TBD
******************
References
******************
- A KAOS Tutorial: http://www.objectiver.com/fileadmin/download/documents/KaosTutorial.pdf
- ETSI GS NFV-REL 001 V1.1.1(2015-01):
http://www.etsi.org/deliver/etsi_gs/NFV-REL/001_099/001/01.01.01_60/gs_NFV-REL001v010101p.pdf
- Openstack High Availability Guide: https://docs.openstack.org/ha-guide/
- Highly Available (Mirrored) Queues: https://www.rabbitmq.com/ha.html
|