summaryrefslogtreecommitdiffstats
path: root/docs/requirements/04-gaps.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/requirements/04-gaps.rst')
-rw-r--r--docs/requirements/04-gaps.rst389
1 files changed, 0 insertions, 389 deletions
diff --git a/docs/requirements/04-gaps.rst b/docs/requirements/04-gaps.rst
deleted file mode 100644
index b8ff7f2e..00000000
--- a/docs/requirements/04-gaps.rst
+++ /dev/null
@@ -1,389 +0,0 @@
-.. This work is licensed under a Creative Commons Attribution 4.0 International License.
-.. http://creativecommons.org/licenses/by/4.0
-
-Gap analysis in upstream projects
-=================================
-
-This section presents the findings of gaps on existing VIM platforms. The focus
-was to identify gaps based on the features and requirements specified in Section
-3.3. The analysis work determined gaps that are presented here.
-
-VIM Northbound Interface
-------------------------
-
-Immediate Notification
-^^^^^^^^^^^^^^^^^^^^^^
-
-* Type: 'deficiency in performance'
-* Description
-
- + To-be
-
- - VIM has to notify unavailability of virtual resource (fault) to VIM user
- immediately.
- - Notification should be passed in '1 second' after fault detected/notified
- by VIM.
- - Also, the following conditions/requirement have to be met:
-
- - Only the owning user can receive notification of fault related to owned
- virtual resource(s).
-
- + As-is
-
- - OpenStack Metering 'Ceilometer' can notify unavailability of virtual
- resource (fault) to the owner of virtual resource based on alarm
- configuration by the user.
-
- - Ceilometer Alarm API:
- http://docs.openstack.org/developer/ceilometer/webapi/v2.html#alarms
-
- - Alarm notifications are triggered by alarm evaluator instead of
- notification agents that might receive faults
-
- - Ceilometer Architecture:
- http://docs.openstack.org/developer/ceilometer/architecture.html#id1
-
- - Evaluation interval should be equal to or larger than configured pipeline
- interval for collection of underlying metrics.
-
- - https://github.com/openstack/ceilometer/blob/stable/juno/ceilometer/alarm/service.py#L38-42
-
- - The interval for collection has to be set large enough which depends on
- the size of the deployment and the number of metrics to be collected.
- - The interval may not be less than one second in even small deployments.
- The default value is 60 seconds.
- - Alternative: OpenStack has a message bus to publish system events.
- The operator can allow the user to connect this, but there are no
- functions to filter out other events that should not be passed to the user
- or which were not requested by the user.
-
- + Gap
-
- - Fault notifications cannot be received immediately by Ceilometer.
-
-* Solved by
-
- + Event Alarm Evaluator:
- https://specs.openstack.org/openstack/ceilometer-specs/specs/liberty/event-alarm-evaluator.html
- + New OpenStack alarms and notifications project AODH:
- http://docs.openstack.org/developer/aodh/
-
-Maintenance Notification
-^^^^^^^^^^^^^^^^^^^^^^^^
-
-* Type: 'missing'
-* Description
-
- + To-be
-
- - VIM has to notify unavailability of virtual resource triggered by NFVI
- maintenance to VIM user.
- - Also, the following conditions/requirements have to be met:
-
- - VIM should accept maintenance message from administrator and mark target
- physical resource "in maintenance".
- - Only the owner of virtual resource hosted by target physical resource
- can receive the notification that can trigger some process for
- applications which are running on the virtual resource (e.g. cut off
- VM).
-
- + As-is
-
- - OpenStack: None
- - AWS (just for study)
-
- - AWS provides API and CLI to view status of resource (VM) and to create
- instance status and system status alarms to notify you when an instance
- has a failed status check.
- http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-instances-status-check_sched.html
- - AWS provides API and CLI to view scheduled events, such as a reboot or
- retirement, for your instances. Also, those events will be notified
- via e-mail.
- http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html
-
- + Gap
-
- - VIM user cannot receive maintenance notifications.
-
-* Solved by
-
- + https://blueprints.launchpad.net/nova/+spec/service-status-notification
-
-VIM Southbound interface
-------------------------
-
-Normalization of data collection models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-* Type: 'missing'
-* Description
-
- + To-be
-
- - A normalized data format needs to be created to cope with the many data
- models from different monitoring solutions.
-
- + As-is
-
- - Data can be collected from many places (e.g. Zabbix, Nagios, Cacti,
- Zenoss). Although each solution establishes its own data models, no common
- data abstraction models exist in OpenStack.
-
- + Gap
-
- - Normalized data format does not exist.
-
-* Solved by
-
- + Specification in Section :ref:`southbound`.
-
-OpenStack
----------
-
-Ceilometer
-^^^^^^^^^^
-
-OpenStack offers a telemetry service, Ceilometer, for collecting measurements of
-the utilization of physical and virtual resources [CEIL]_. Ceilometer can
-collect a number of metrics across multiple OpenStack components and watch for
-variations and trigger alarms based upon the collected data.
-
-Scalability of fault aggregation
-________________________________
-
-* Type: 'scalability issue'
-* Description
-
- + To-be
-
- - Be able to scale to a large deployment, where thousands of monitoring
- events per second need to be analyzed.
-
- + As-is
-
- - Performance issue when scaling to medium-sized deployments.
-
- + Gap
-
- - Ceilometer seems to be unsuitable for monitoring medium and large scale
- NFVI deployments.
-
-* Solved by
-
- + Usage of Zabbix for fault aggregation [ZABB]_. Zabbix can support a much
- higher number of fault events (up to 15 thousand events per second, but
- obviously also has some upper bound:
- http://blog.zabbix.com/scalable-zabbix-lessons-on-hitting-9400-nvps/2615/
-
- + Decentralized/hierarchical deployment with multiple instances, where one
- instance is only responsible for a small NFVI.
-
-Monitoring of hardware and software
-___________________________________
-
-* Type: 'missing (lack of functionality)'
-* Description
-
- + To-be
-
- - OpenStack (as VIM) should monitor various hardware and software in NFVI to
- handle faults on them by Ceilometer.
- - OpenStack may have monitoring functionality in itself and can be
- integrated with third party monitoring tools.
- - OpenStack need to be able to detect the faults listed in the Annex.
-
- + As-is
-
- - For each deployment of OpenStack, an operator has responsibility to
- configure monitoring tools with relevant scripts or plugins in order to
- monitor hardware and software.
- - OpenStack Ceilometer does not monitor hardware and software to capture
- faults.
-
- + Gap
-
- - Ceilometer is not able to detect and handle all faults listed in the Annex.
-
-* Solved by
-
- + Use of dedicated monitoring tools like Zabbix or Monasca.
- See :ref:`nfvi_faults`.
-
-Nova
-^^^^
-
-OpenStack Nova [NOVA]_ is a mature and widely known and used component in
-OpenStack cloud deployments. It is the main part of an
-"infrastructure-as-a-service" system providing a cloud computing fabric
-controller, supporting a wide diversity of virtualization and container
-technologies.
-
-Nova has proven throughout these past years to be highly available and
-fault-tolerant. Featuring its own API, it also provides a compatibility API with
-Amazon EC2 APIs.
-
-Correct states when compute host is down
-________________________________________
-
-* Type: 'missing (lack of functionality)'
-* Description
-
- + To-be
-
- - The API shall support to change VM power state in case host has failed.
- - The API shall support to change nova-compute state.
- - There could be single API to change different VM states for all VMs
- belonging to a specific host.
- - Support external systems that are monitoring the infrastructure and resources
- that are able to call the API fast and reliable.
- - Resource states are reliable such that correlation actions can be fast and automated.
- - User shall be able to read states from OpenStack and trust they are correct.
-
- + As-is
-
- - When a VM goes down due to a host HW, host OS or hypervisor failure,
- nothing happens in OpenStack. The VMs of a crashed host/hypervisor are
- reported to be live and OK through the OpenStack API.
- - nova-compute state might change too slowly or the state is not reliable
- if expecting also VMs to be down. This leads to ability to schedule VMs
- to a failed host and slowness blocks evacuation.
-
- + Gap
-
- - OpenStack does not change its states fast and reliably enough.
- - The API does not support to have an external system to change states and to
- trust the states are reliable (external system has fenced failed host).
- - User cannot read all the states from OpenStack nor trust they are right.
-
-* Solved by
-
- + https://blueprints.launchpad.net/nova/+spec/mark-host-down
- + https://blueprints.launchpad.net/python-novaclient/+spec/support-force-down-service
-
-Evacuate VMs in Maintenance mode
-________________________________
-
-* Type: 'missing'
-* Description
-
- + To-be
-
- - When maintenance mode for a compute host is set, trigger VM evacuation to
- available compute nodes before bringing the host down for maintenance.
-
- + As-is
-
- - If setting a compute node to a maintenance mode, OpenStack only schedules
- evacuation of all VMs to available compute nodes if in-maintenance compute
- node runs the XenAPI and VMware ESX hypervisors. Other hypervisors (e.g.
- KVM) are not supported and, hence, guest VMs will likely stop running due
- to maintenance actions administrator may perform (e.g. hardware upgrades,
- OS updates).
-
- + Gap
-
- - Nova libvirt hypervisor driver does not implement automatic guest VMs
- evacuation when compute nodes are set to maintenance mode (``$ nova
- host-update --maintenance enable <hostname>``).
-
-Monasca
-^^^^^^^
-
-Monasca is an open-source monitoring-as-a-service (MONaaS) solution that
-integrates with OpenStack. Even though it is still in its early days, it is the
-interest of the community that the platform be multi-tenant, highly scalable,
-performant and fault-tolerant. It provides a streaming alarm engine, a
-notification engine, and a northbound REST API users can use to interact with
-Monasca. Hundreds of thousands of metrics per second can be processed
-[MONA]_.
-
-Anomaly detection
-_________________
-
-
-* Type: 'missing (lack of functionality)'
-* Description
-
- + To-be
-
- - Detect the failure and perform a root cause analysis to filter out other
- alarms that may be triggered due to their cascading relation.
-
- + As-is
-
- - A mechanism to detect root causes of failures is not available.
-
- + Gap
-
- - Certain failures can trigger many alarms due to their dependency on the
- underlying root cause of failure. Knowing the root cause can help filter
- out unnecessary and overwhelming alarms.
-
-* Status
-
- + Monasca as of now lacks this feature, although the community is aware and
- working toward supporting it.
-
-Sensor monitoring
-_________________
-
-* Type: 'missing (lack of functionality)'
-* Description
-
- + To-be
-
- - It should support monitoring sensor data retrieval, for instance, from
- IPMI.
-
- + As-is
-
- - Monasca does not monitor sensor data
-
- + Gap
-
- - Sensor monitoring is very important. It provides operators status
- on the state of the physical infrastructure (e.g. temperature, fans).
-
-* Addressed by
-
- + Monasca can be configured to use third-party monitoring solutions (e.g.
- Nagios, Cacti) for retrieving additional data.
-
-Hardware monitoring tools
--------------------------
-
-Zabbix
-^^^^^^
-
-Zabbix is an open-source solution for monitoring availability and performance of
-infrastructure components (i.e. servers and network devices), as well as
-applications [ZABB]_. It can be customized for use with OpenStack. It is a
-mature tool and has been proven to be able to scale to large systems with
-100,000s of devices.
-
-Delay in execution of actions
-_____________________________
-
-
-* Type: 'deficiency in performance'
-* Description
-
- + To-be
-
- - After detecting a fault, the monitoring tool should immediately execute
- the appropriate action, e.g. inform the manager through the NB I/F
-
- + As-is
-
- - A delay of around 10 seconds was measured in two independent testbed
- deployments
-
- + Gap
-
- - Cause of the delay is a periodic evaluation and notification. Periodicity is configured
- as 30s default value and can be reduced to 5s but not below.
- https://github.com/zabbix/zabbix/blob/trunk/conf/zabbix_server.conf#L329
-
-
-..
- vim: set tabstop=4 expandtab textwidth=80: