diff options
author | gwaines <greg.waines@windriver.com> | 2017-03-17 05:29:31 -0400 |
---|---|---|
committer | gwaines <greg.waines@windriver.com> | 2017-03-17 06:11:43 -0400 |
commit | 26dbdfa73fd376f8bd20011dc93628a945027304 (patch) | |
tree | 5d740bd51f63a2d927d2830bf34458cbf617abae /docs/development/overview/OPNFV_HA_Guest_APIs-Overview_HLD.rst.bak | |
parent | e83e826789396a4e1a9cd113976bec6860a3ab9f (diff) |
Updating HA GUEST API OVERVIEW DOCUMENT based on Bertrand Souville's comments.
- adding standard Doctor / Vitrage Fault Mgmt Architecture Diagram, and
- refering to Doctor SB API
-------------
Updating HA GUEST API OVERVIEW Proposal after review with OPNFV DOCTOR Team.
Minutes from meeting, that were addressed as part of this update are below.
- VM Heartbeating & Health Checking
* libvirt watchdog and its integration / use with OpenStack
> https://blueprints.launchpad.net/nova/+spec/libvirt-watchdog
> https://wiki.openstack.org/wiki/LibvirtWatchdog#Notifications
> Need to review behaviour of this capability as compared to the
proposed VM Heartbeating/Health-checking
> i believe there is a bit of overlap
but believe that VM Heartbeating / Health-checking provides a
more complete solution
> i'll update document with a NOTE on comparing the proposal with
the libvirt watchdog
* Update the architecture diagram to be consistent with the most current
DOCTOR Architecture Diagrams
> e.g. Vitrage & Congress are not necessrily deployed at same time
> use DOCTOR terminology where applicable, e.g. "inspector modules"
> OPNFV DOCTOR includes patches/components of AODH, NOVA, NEUTRON
... not just Vitrage and Congress, as shown in diagram
- although also had comment that should remove OPNFV DOCTOR outline
from diagram as OPNFV DOCTOR is a reuirement specification and
not an implementation
> indicate (possibly just in text below diagram) that the "Guest
Heartbeat / Health-check Server' on Controller Node is possibly not
required, as the Vitrage data source interface can be remotely
reached by "Guest Heartbeat / Health-check Compute" on the Compute
Node
* In text, provide a little more detail on content of actual messaging
e.g. PDUs and rough message content
* OVERALL
> believe there was a general agreement in the way that VM Heartbeating
& Health-checking was integrated / inter-worked with OPNFV DOCTOR's
Vitrage / Congress and overall OPNFV fault reporting architecture
> key feedback was to understand and highlight the additional value of
the VM Heartbeating & Health-checking functionality over the existing
libvirt watchdog integration into OpenStack.
- Server Group Messaging
* suggestion that Rabbit MQ pub/sub messaging could be an alternative for
routing of messages
> this is an implementation detail though
* general discussions on "HA use cases" for how this messaging could be
leveraged
> e.g. split-brain avoidance, faster peer VM state change notifications
* OVERALL
> agreement that the Server Group Messaging Architecture did NOT conflict
with Doctor Architecture
> need further review with OPNFV MANO Team as to how they would position
this functionality
e.g.
- position it as an alternative for various HA use cases ?
- versus
- mandating that this service group messaging be used for specific
HA user cases
Change-Id: Icd54bbf8889017cfe3f617656ddf483cbb171e63
Signed-off-by: gwaines <greg.waines@windriver.com>
Diffstat (limited to 'docs/development/overview/OPNFV_HA_Guest_APIs-Overview_HLD.rst.bak')
-rw-r--r-- | docs/development/overview/OPNFV_HA_Guest_APIs-Overview_HLD.rst.bak | 279 |
1 files changed, 0 insertions, 279 deletions
diff --git a/docs/development/overview/OPNFV_HA_Guest_APIs-Overview_HLD.rst.bak b/docs/development/overview/OPNFV_HA_Guest_APIs-Overview_HLD.rst.bak deleted file mode 100644 index f7d5284..0000000 --- a/docs/development/overview/OPNFV_HA_Guest_APIs-Overview_HLD.rst.bak +++ /dev/null @@ -1,279 +0,0 @@ -********************************************************************* -PROPOSAL: Reach-thru Guest Monitoring and Services for High Availability -********************************************************************* - -Overview -===================================================================== - -:author: Greg Waines -:organization: Wind River Systems -:organization: OPNFV - High Availability -:status: Draft - PROPOSAL -:date: March 2017 -:revision: 1.5 - -:abstract: This document presents a PROPOSAL for a set of new - optional capabilities where the OpenStack Cloud messages - into the Guest VMs in order to provide improved Availability - of the hosted VMs. The initial set of new capabilities - include: enabling the detection of and recovery from internal - VM faults and providing a simple out-of-band messaging service - to prevent scenarios such as split brain. - - -.. sectnum:: - -.. contents:: Table of Contents - - - -Introduction -===================================================================== - - This document provides an overview and rationale for a PROPOSAL - of a set of new capabilities where the OpenStack Cloud messages - into the Guest VMs in order to provide improved Availability - of the hosted VMs. - - The initial set of new capabilities specifically include: - - - VM Heartbeating and Health Checking - - VM Peer State Notification and Messaging - - All of these capabilities leverage Host-to-Guest Messaging - Interfaces / APIs which are built on a messaging service between the - OpenStack Host and the Guest VM that uses a simple low-bandwidth - datagram messaging capability in the hypervisor and therefore has no - requirements on OpenStack Networking, and is available very early - after spawning the VM. - - For each capability, the document outlines the interaction with - the Guest VM, any key technologies involved, the integration into - the larger OpenStack and OPNFV Architectures (e.g. interactions - with VNFM), specific OPNFV HA Team deliverables, and the use cases - for how availability of the hosted VM is improved. - - The intent is for the OPNFV HA Team to review the proposals of this - document with the related other teams in OPNFV (Doctor and Management - & Orchestration (MANO)) and OpenStack (Nova). - - - - -Messaging Layer -======================================================================== - - The Host-to-Guest messaging APIs used by the services discussed - in this document use a JSON-formatted application messaging layer - on top of a ‘virtio serial device’ between QEMU on the OpenStack Host - and the Guest VM. JSON formatting provides a simple, humanly readable - messaging format which can be easily parsed and formatted using any - high level programming language being used in the Guest VM (e.g. C/C++, - Python, Java, etc.). Use of the ‘virtio serial device’ provides a - simple, direct communication channel between host and guest which is - independent of the Guest’s L2/L3 networking. - - The upper layer JSON messaging format is actually structured as a - hierarchical JSON format containing a Base JSON Message Layer and an - Application JSON Message Layer: - - - the Base Layer provides the ability to multiplex different groups - of message types on top of a single ‘virtio serial device’ - e.g. - - + heartbeating and healthchecks, - + server group messaging, - - and - - - the Application Layer provides the specific message types and - fields of a particular group of message types. - - - -VM Heartbeating and Health Checking -============================================================================ - - Normally OpenStack monitoring of the health of a Guest VM is limited - to a black-box approach of simply monitoring the presence of the - QEMU/KVM PID containing the VM. - - VM Heartbeating and Health Checking provides a heartbeat service to monitor - the health of guest application(s) within a VM running under the OpenStack - Cloud. Loss of heartbeat or a failed health check status will result in a - fault event being reported to OPNFV's DOCTOR infrastructure for alarm - identification, impact analysis and reporting. This would then enable VNF - Managers (VNFMs) listening to OPNFV's DOCTOR External Alarm Reporting through - Telemetry's AODH, to initiate any required fault recovery actions. - - .. image:: OPNFV_HA_Guest_APIs-Overview_HLD-Guest_Heartbeat-FIGURE-1.png - - The VM Heartbeating and Health Checking functionality is enabled on - a VM through a new flavor extraspec indicating that the VM supports - and wants to enable Guest Heartbeating. An extension to Nova Compute uses - this extraspec to setup the required 'virtio serial device' for Host-to-Guest - messaging, on the QEMU/KVM instance created for the VM. - - A daemon within the Guest VM will register with the OpenStack Guest - Heartbeat Service on the compute node to initiate the heartbeating on itself - (i.e. the Guest VM). The OpenStack Compute Node will start heartbeating the - Guest VM, and if the heartbeat fails, the OpenStack Compute Node will report - the VM Fault thru DOCTOR and ultimately VNFM will see this thru NOVA VM - State Chagne Notifications thru AODH. I.e. VNFM wouild see the VM Heartbeat - Failure events in teh same way it sees all other VM Faults, thru DOCTOR - initiated VM state changes. - - Part of the Guest VM's registration process is the specification of the - heartbeat interval in msecs. I.e. the registering Guest VM specifies the - heartbeating interval. - - Guest heartbeat works on a challenge response model. The OpenStack - Guest Heartbeat Service on the compute node will challenge the registered - Guest VM daemon with a message each interval. The registered Guest VM daemon - must respond prior to the next interval with a message indicating good health. - If the OpenStack Host does not receive a valid response, or if the response - specifies that the VM is in ill health, then a fault event for the Guest VM - is reported to the OpenStack Guest Heartbeat Service on the controller node which - will report the event to OPNFV's DOCTOR (i.e. thru the OpenStack Vitrage data - source APIs). - - The registered Guest VM daemon's response to the challenge can be as simple - as just immediately responding with OK. This alone allows for detection of - a failed or hung QEMU/KVM instance, or a failure of the OS within the VM to - schedule the registered Guest VM's daemon or failure to route basic IO within - the Guest VM. - - However the registered Guest VM daemon's response to the challenge can be more - complex, running anything from a quick simple sanity check of the health of - applications running in the Guest VM, to a more thorough audit of the - application state and data. In either case returning the status of the - health check enables the OpenStack host to detect and report the event in order - to initiate recovery from application level errors or failures within the Guest VM. - - In summary, the deliverables of this activity would be: - - - Host Deliverables: (OpenStack and OPNFV blueprints and implementation) - - + an OpenStack Nova or libvirt extension to interpret the new flavor extraspec and - if present setup the required 'virtio serial device' for Host-to-Guest - heartbeat / health-check messaging, on the QEMU/KVM instance created - for the VM, - + an OPNFV Base Host-to-Guest Msging Layer Agent for multiplexing of Application - Layer messaging over the 'virtio serial device' to the VM, - + an OPNFV Heartbeat / Health-Check Compute Agent for local heartbeating of VM - and reporting of failures to the OpenStack Controller, - + an OPNFV Heartbeat / Health-check Server on the OpenStack Controller for - receiving VM failure notifications and reporting these to Vitrage thru - Vitrage's Data Source API, - - - Guest Deliverables: - - + a Heartbeat / Health-Check Message Specification covering - - - Heartbeat / Health-Check Application Layer JSON Protocol, - - Base Host-to-Guest JSON Protocol, - - Details on the use of the underlying 'virtio serial device', - - + a Reference Implementation of the Guest-side support of - Heartbeat / Health-check containing the peer protocol layers - within the Guest. - - - will provide code and compile instructions, - - Guest will compile based on its specific OS. - - This proposal requires review with OPNFV's Doctor and Management & Orchestration - teams, and OpenStack's Nova Team. - - - -VM Peer State Notification and Messaging -=================================================================================== - - Server Group State Notification and Messaging is a service to provide - simple low-bandwidth datagram messaging and notifications for servers that - are part of the same server group. This messaging channel is available - regardless of whether IP networking is functional within the server, and - it requires no knowledge within the server about the other members of the group. - - NOTE: A Server Group here is the OpenStack Nova Server Group concept where VMs - are grouped together for purposes of scheduling. E.g. A specific Server Group - instance can specify whether the VMs within the group should be scheduled to - run on the same compute host or different compute hosts. A 'peer' VM in the - context of this section refers to a VM within the same Nova Server Group. - - This Server Group Messaging service provides three types of messaging: - - - Broadcast: this allows a server to send a datagram (size of up to 3050 bytes) - to all other servers within the server group. - - Notification: this provides servers with information about changes to the - (Nova) state of other servers within the server group. - - Status: this allows a server to query the current (Nova) state of all servers within - the server group (including itself). - - A Server Group Messaging entity on both the controller node and the compute nodes - manage the routing of of VM-to-VM messages through the platform, leveraging Nova - to determine Server Group membership and compute node locations of VMs. The Server - Group Messaging entity on the controller also listens to Nova VM state change notifications - and querys VM state data from Nova, in order to provide the VM query and notification - functionality of this service. - - .. image:: OPNFV_HA_Guest_APIs-Overview_HLD-Peer_Messaging-FIGURE-2.png - - This service is not intended for high bandwidth or low-latency operations. It - is best-effort, not reliable. Applications should do end-to-end acks and - retries if they care about reliability. - - This service provides building block type capabilities for the Guest VMs that - contribute to higher availability of the VMs in the Guest VM Server Group. Notifications - of VM Status changes potentially provide a faster and more accurate notification - of failed peer VMs than traditional peer VM monitoring over Tenant Networks. While - the Broadcast Messaging mechanism provides an out-of-band messaging mechanism to - monitor and control a peer VM under fault conditions; e.g. providing the ability to - avoid potential split brain scenarios between 1:1 VMs when faults in Tenant - Networking occur. - - In summary, the deliverables for Server Group Messaging would be: - - - Host Deliverables: - - + a Nova or libvirt extension to interpret the new flavor extraspec and - if present setup the required 'virtio serial device' for Host-to-Guest - Server Group Messaging, on the QEMU/KVM instance created - for the VM, - + [ leveraging the Base Host-to-Guest Msging Layer Agent from previous section ], - + a Server Group Messaging Compute Agent for implementing the Application Layer - Server Group Messaging JSON Protocol with the VM, and forwarding the - messages to/from the Server Group Messaging Server on the Controller, - + a Server Group Messaging Server on the Controller for routing broadcast - messages to the proper Computes and VMs, as well as listening for Nova - VM State Change Notifications and forwarding these to applicable Computes - and VMs, - - - Guest Deliverables: - - + a Server Group Messaging Message Specification covering - - - Server Group Messaging Application Layer JSON Protocol, - - [ leveraging Base Host-to-Guest JSON Protocol from previous section ], - - [ leveraging Details on the use of the underlying 'virtio serial device' from previous section ], - - + a Reference Implementation of the Guest-side support of - Server Group Messaging containing the peer protocol layers - and Guest Application hooks within the Guest. - - This proposal requires review with OPNFV's Management & Orchestration team and - OpenStack's Nova Team. - - -Conclusion -====================================================================================== - - The PROPOSAL of Reach-thru Guest Monitoring and Services described in this document - leverage Host-to-Guest messaging to provide a number of extended capabilities - that improve the Availability of the hosted VMs. These new capabilities - enable detection of and recovery from internal VM faults and provides a simple - out-of-band messaging service to prevent scenarios such as split brain. - - The integration of these proposed new capabilities into the larger OpenStack and OPNFV - Architectures need to be reviewed with the other related teams in OPNFV (Doctor and - Management & Orchestration (MANO)) and OpenStack (Nova). |