summaryrefslogtreecommitdiffstats
path: root/docs/development/overview/OPNFV_HA_Guest_APIs-Overview_HLD.rst
blob: 7356f4c82d273b44b2ee8539cee0878d6fceccb2 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
Overview
=====================================================================

:author: Greg Waines
:organization: Wind River Systems
:organization: OPNFV - High Availability
:status: Draft - PROPOSAL
:date: April 2017
:revision: 1.6

:abstract: This document describes a set of new optional
   capabilities where the OpenStack Cloud messages into the Guest
   VMs in order to provide improved Availability of the hosted VMs.
   The initial set of new capabilities include: enabling the
   detection of and recovery from internal VM faults and providing
   a simple out-of-band messaging service to prevent scenarios such
   as split brain.


.. sectnum::

.. contents:: Table of Contents



Introduction
=====================================================================

   This document provides an overview and rationale for a
   set of new capabilities where the OpenStack Cloud messages
   into the Guest VMs in order to provide improved Availability
   of the hosted VMs.

   The initial set of new capabilities specifically include:

        - VM Heartbeating and Health Checking
        - VM Peer State Notification and Messaging

   All of these capabilities leverage Host-to-Guest Messaging
   Interfaces / APIs which are built on a messaging service between the
   OpenStack Host and the Guest VM that uses a simple low-bandwidth
   datagram messaging capability in the hypervisor and therefore has no
   requirements on OpenStack Networking, and is available very early
   after spawning the VM.

   For each capability, the document outlines the interaction with
   the Guest VM, any key technologies involved, the integration into
   the larger OpenStack and OPNFV Architectures (e.g. interactions
   with VNFM), specific OPNFV HA Team deliverables, and the use cases
   for how availability of the hosted VM is improved.




Messaging Layer
========================================================================

   The Host-to-Guest messaging APIs used by the services discussed
   in this document use a JSON-formatted application messaging layer
   on top of a ‘virtio serial device’ between QEMU on the OpenStack Host
   and the Guest VM.  JSON formatting provides a simple, humanly readable
   messaging format which can be easily parsed and formatted using any
   high level programming language being used in the Guest VM (e.g. C/C++,
   Python, Java, etc.).  Use of the ‘virtio serial device’ provides a
   simple, direct communication channel between host and guest which is
   independent of the Guest’s L2/L3 networking.

   The upper layer JSON messaging format is actually structured as a
   hierarchical JSON format containing a Base JSON Message Layer and an
   Application JSON Message Layer:

        - the Base Layer provides the ability to multiplex different groups
          of message types on top of a single ‘virtio serial device’
          e.g.

           + heartbeating and healthchecks,
           + server group messaging,

          and

        - the Application Layer provides the specific message types and
          fields of a particular group of message types.



VM Heartbeating and Health Checking
============================================================================

   Normally OpenStack monitoring of the health of a Guest VM is limited
   to a black-box approach of simply monitoring the presence of the
   QEMU/KVM PID containing the VM, and/or by enabling libvirt's emulated
   hardware watchdog.

   VM Heartbeating and Health Checking provides a heartbeat service to enhance
   the monitoring of the health of guest application(s) within a VM running
   under the OpenStack Cloud.  Loss of heartbeat or a failed health check status
   will result in a fault event being reported to OPNFV's DOCTOR infrastructure
   for alarm identification, impact analysis and reporting.  This would then enable
   VNF Managers (VNFMs) listening to OPNFV's DOCTOR External Alarm Reporting through
   Telemetry's AODH, to initiate any required fault recovery actions.

   .. image:: OPNFV_HA_Guest_APIs-Overview_HLD-Guest_Heartbeat-FIGURE-1.png

   Or, in the context of the OPNFV DOCTOR's Fault Management Architecture:

   .. image:: OPNFV_HA_Guest_APIs-Overview_HLD-Guest_Heartbeat-FIGURE-1b.png

   The VM Heartbeating and Health Checking functionality is enabled on
   a VM through a new flavor extraspec indicating that the VM supports
   and wants to enable Guest Heartbeating.  An extension to Nova Compute uses
   this extraspec to setup the required 'virtio serial device' for Host-to-Guest
   messaging, on the QEMU/KVM instance created for the VM.

   A daemon within the Guest VM will register with the OpenStack Guest
   Heartbeat Service on the compute node to initiate the heartbeating on itself
   (i.e. the Guest VM).  The OpenStack Compute Node will start heartbeating the
   Guest VM, and if the heartbeat fails, the OpenStack Compute Node will report
   the VM Fault thru DOCTOR and ultimately VNFM will see this thru NOVA VM
   State Change Notifications thru AODH.  I.e. VNFM wouild see the VM Heartbeat
   Failure events in the same way it sees all other VM Faults, thru DOCTOR
   initiated VM state changes.

   Part of the Guest VM's registration process is the specification of the
   heartbeat interval in msecs.  I.e. the registering Guest VM specifies the
   heartbeating interval.

   Guest heartbeat works on a challenge response model.  The OpenStack
   Guest Heartbeat Service on the compute node will challenge the registered
   Guest VM daemon with a message each interval.  The registered Guest VM daemon
   must respond prior to the next interval with a message indicating good health.
   If the OpenStack Host does not receive a valid response, or if the response
   specifies that the VM is in ill health, then a fault event for the Guest VM
   is reported to the OpenStack Guest Heartbeat Service on the controller node which
   will report the event to OPNFV's DOCTOR (i.e. thru the Doctor SouthBound (SB)
   APIs).

   In summary, the Guest Heartbeating Messaging Specification is quite simple,
   including the following PDUs: Init, Init-Ack, Challenge-Request,
   Challenge-Response, Exit.  The Challenge-Response returning a healthy /
   not-healthy boolean.

   The registered Guest VM daemon's response to the challenge can be as simple
   as just immediately responding with OK.  This alone allows for detection of
   a failed or hung QEMU/KVM instance, or a failure of the OS within the VM to
   schedule the registered Guest VM's daemon or failure to route basic IO within
   the Guest VM.

   However the registered Guest VM daemon's response to the challenge can be more
   complex, running anything from a quick simple sanity check of the health of
   applications running in the Guest VM, to a more thorough audit of the
   application state and data.  In either case returning the status of the
   health check enables the OpenStack host to detect and report the event in order
   to initiate recovery from application level errors or failures within the Guest VM.

   In summary, the deliverables of this activity would be:

   - Host Deliverables:    (OpenStack and OPNFV blueprints and implementation)

   + an OpenStack Nova or libvirt extension to interpret the new flavor extraspec and
     if present setup the required 'virtio serial device' for Host-to-Guest
     heartbeat / health-check messaging, on the QEMU/KVM instance created
     for the VM,
   + an OPNFV Base Host-to-Guest Msging Layer Agent for multiplexing of Application
     Layer messaging over the 'virtio serial device' to the VM,
   + an OPNFV Heartbeat / Health-Check Compute Agent for local heartbeating of VM
     and reporting of failures to the OpenStack Controller,
   + an OPNFV Heartbeat / Health-check Server on the OpenStack Controller for
     receiving VM failure notifications and reporting these to Vitrage thru
     Vitrage's Data Source API,

   - Guest Deliverables:

   + a Heartbeat / Health-Check Message Specification covering

      - Heartbeat / Health-Check Application Layer JSON Protocol,
      - Base Host-to-Guest JSON Protocol,
      - Details on the use of the underlying 'virtio serial device',

   + a Reference Implementation of the Guest-side support of
     Heartbeat / Health-check containing the peer protocol layers
     within the Guest.

      - will provide code and compile instructions,
      - Guest will compile based on its specific OS.

   NOTE that the described VM Heartbeating and Healthchecking functionality provides
   enhanced monitoring over and above libvirt's emulated hardware watchdog.  VM
   Heartbeating and Healthchecking can detect a wider range of issues than simply
   lack of cpu time scheduling for a lower priority process feeding the hardware
   watchdog.  VM Heartbeating and Healthchecking can ensure that specific key processes
   within the application are not blocked, kernel resources for basic IO within
   the Guest VM are available, and/or ensure the application-specific health of the VM
   is good.

   This proposal has been reviewed with both the OPNFV's Doctor and Management
   and Orchestration teams, and general agreement was that the proposal integrated
   / inter-worked correctly with the OPNFV DOCTOR's Vitrage, Congress and the overall
   OPNFV fault reporting architecture.



VM Peer State Notification and Messaging
===================================================================================

   Server Group State Notification and Messaging is a service to provide
   simple low-bandwidth datagram messaging and notifications for servers that
   are part of the same server group.  This messaging channel is available
   regardless of whether IP networking is functional within the server, and
   it requires no knowledge within the server about the other members of the group.

   NOTE: A Server Group here is the OpenStack Nova Server Group concept where VMs
   are grouped together for purposes of scheduling.  E.g. A specific Server Group
   instance can specify whether the VMs within the group should be scheduled to
   run on the same compute host or different compute hosts.  A 'peer' VM in the
   context of this section refers to a VM within the same Nova Server Group.

   This Server Group Messaging service provides three types of messaging:

        - Broadcast: this allows a server to send a datagram (size of up to 3050 bytes)
          to all other servers within the server group.
        - Notification: this provides servers with information about changes to the
          (Nova) state of other servers within the server group.
        - Status: this allows a server to query the current (Nova) state of all servers within
          the server group (including itself).

   A Server Group Messaging entity on both the controller node and the compute nodes
   manage the routing of of VM-to-VM messages through the platform, leveraging Nova
   to determine Server Group membership and compute node locations of VMs.  The Server
   Group Messaging entity on the controller also listens to Nova VM state change notifications
   and querys VM state data from Nova, in order to provide the VM query and notification
   functionality of this service.

   .. image:: OPNFV_HA_Guest_APIs-Overview_HLD-Peer_Messaging-FIGURE-2.png

   This service is not intended for high bandwidth or low-latency operations.  It
   is best-effort, not reliable.  Applications should do end-to-end acks and
   retries if they care about reliability.

   This service provides building block type capabilities for the Guest VMs that
   contribute to higher availability of the VMs in the Guest VM Server Group.  Notifications
   of VM Status changes potentially provide a faster and more accurate notification
   of failed peer VMs than traditional peer VM monitoring over Tenant Networks.  While
   the Broadcast Messaging mechanism provides an out-of-band messaging mechanism to
   monitor and control a peer VM under fault conditions; e.g. providing the ability to
   avoid potential split brain scenarios between 1:1 VMs when faults in Tenant
   Networking occur.

   In summary, the deliverables for Server Group Messaging would be:

   - Host Deliverables:

   + a Nova or libvirt extension to interpret the new flavor extraspec and
     if present setup the required 'virtio serial device' for Host-to-Guest
     Server Group Messaging, on the QEMU/KVM instance created
     for the VM,
   + [ leveraging the Base Host-to-Guest Msging Layer Agent from previous section ],
   + a Server Group Messaging Compute Agent for implementing the Application Layer
     Server Group Messaging JSON Protocol with the VM, and forwarding the
     messages to/from the Server Group Messaging Server on the Controller,
   + a Server Group Messaging Server on the Controller for routing broadcast
     messages to the proper Computes and VMs, as well as listening for Nova
     VM State Change Notifications and forwarding these to applicable Computes
     and VMs,

   - Guest Deliverables:

   + a Server Group Messaging Message Specification covering

      - Server Group Messaging Application Layer JSON Protocol,
      - [ leveraging Base Host-to-Guest JSON Protocol from previous section ],
      - [ leveraging Details on the use of the underlying 'virtio serial device' from previous section ],

   + a Reference Implementation of the Guest-side support of
     Server Group Messaging containing the peer protocol layers
     and Guest Application hooks within the Guest.

   This proposal has been reviewed with both the OPNFV's Doctor and Management
   and Orchestration teams, and general agreement was that the proposal did not
   conflict with the OPNFV Doctor Architecture, and provided, at the very least,
   an alternative messaging and state-change-notification mechanism for hosted
   VMs in various HA use cases.



Conclusion
======================================================================================

   The Reach-thru Guest Monitoring and Services described in this document
   leverage Host-to-Guest messaging to provide a number of extended capabilities
   that improve the Availability of the hosted VMs.  These new capabilities
   enable detection of and recovery from internal VM faults and provides a simple
   out-of-band messaging service to prevent scenarios such as split brain.

   The next steps in progressing this proposal will be to submit blueprints to
   the appropriate OpenStack working groups;  Vitrage for VM Heartbeating and
   Healthchecking and Nova for VM Server Group Messaging.