summaryrefslogtreecommitdiffstats
path: root/docs/testing/user/testspecification/highavailability/index.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/testing/user/testspecification/highavailability/index.rst')
-rw-r--r--docs/testing/user/testspecification/highavailability/index.rst396
1 files changed, 368 insertions, 28 deletions
diff --git a/docs/testing/user/testspecification/highavailability/index.rst b/docs/testing/user/testspecification/highavailability/index.rst
index 1dd99d41..e489894f 100644
--- a/docs/testing/user/testspecification/highavailability/index.rst
+++ b/docs/testing/user/testspecification/highavailability/index.rst
@@ -31,7 +31,7 @@ This test area references the following specifications:
- ETSI GS NFV-REL 001
- - http://www.etsi.org/deliver/etsi_gs/NFV-REL/001_099/001/01.01.01_60/gs_nfv-rel001v010101p.pdf
+ - https://www.etsi.org/deliver/etsi_gs/NFV-REL/001_099/001/01.01.01_60/gs_nfv-rel001v010101p.pdf
- OpenStack High Availability Guide
@@ -84,7 +84,9 @@ Test Case 1 - Controller node OpenStack service down - nova-api
Short name
----------
-dovetail.ha.tc001.nova-api_service_down
+yardstick.ha.nova_api
+
+Yardstick test case: opnfv_yardstick_tc019.yaml
Use case specification
----------------------
@@ -102,6 +104,7 @@ Test preconditions
There is more than one controller node, which is providing the "nova-api"
service for API end-point.
+
Denoted a controller node as Node1 in the following configuration.
@@ -169,8 +172,10 @@ Post conditions
---------------
Restart the process of "nova-api" if they are not running.
-Delete image with "openstack image delete test-cirros"
-Delete flavor with "openstack flavor delete m1.test"
+
+Delete image with "openstack image delete test-cirros".
+
+Delete flavor with "openstack flavor delete m1.test".
---------------------------------------------------------------------
@@ -180,7 +185,9 @@ Test Case 2 - Controller node OpenStack service down - neutron-server
Short name
----------
-dovetail.ha.tc002.neutron-server_service_down
+yardstick.ha.neutron_server
+
+Yardstick test case: opnfv_yardstick_tc045.yaml
Use case specification
----------------------
@@ -196,6 +203,7 @@ Test preconditions
There is more than one controller node, which is providing the "neutron-server"
service for API end-point.
+
Denoted a controller node as Node1 in the following configuration.
Basic test flow execution description and pass/fail criteria
@@ -264,7 +272,9 @@ Test Case 3 - Controller node OpenStack service down - keystone
Short name
----------
-dovetail.ha.tc003.keystone_service_down
+yardstick.ha.keystone
+
+Yardstick test case: opnfv_yardstick_tc046.yaml
Use case specification
----------------------
@@ -280,6 +290,7 @@ Test preconditions
There is more than one controller node, which is providing the "keystone"
service for API end-point.
+
Denoted a controller node as Node1 in the following configuration.
Basic test flow execution description and pass/fail criteria
@@ -342,7 +353,9 @@ Test Case 4 - Controller node OpenStack service down - glance-api
Short name
----------
-dovetail.ha.tc004.glance-api_service_down
+yardstick.ha.glance_api
+
+Yardstick test case: opnfv_yardstick_tc047.yaml
Use case specification
----------------------
@@ -358,6 +371,7 @@ Test preconditions
There is more than one controller node, which is providing the "glance-api"
service for API end-point.
+
Denoted a controller node as Node1 in the following configuration.
@@ -430,7 +444,9 @@ Test Case 5 - Controller node OpenStack service down - cinder-api
Short name
----------
-dovetail.ha.tc005.cinder-api_service_down
+yardstick.ha.cinder_api
+
+Yardstick test case: opnfv_yardstick_tc048.yaml
Use case specification
----------------------
@@ -446,6 +462,7 @@ Test preconditions
There is more than one controller node, which is providing the "cinder-api"
service for API end-point.
+
Denoted a controller node as Node1 in the following configuration.
Basic test flow execution description and pass/fail criteria
@@ -509,7 +526,9 @@ Test Case 6 - Controller Node CPU Overload High Availability
Short name
----------
-dovetail.ha.tc006.cpu_overload
+yardstick.ha.cpu_load
+
+Yardstick test case: opnfv_yardstick_tc051.yaml
Use case specification
----------------------
@@ -526,6 +545,7 @@ Test preconditions
There is more than one controller node, which is providing the "cinder-api",
"neutron-server", "glance-api" and "keystone" services for API end-point.
+
Denoted a controller node as Node1 in the following configuration.
Basic test flow execution description and pass/fail criteria
@@ -594,7 +614,9 @@ Test Case 7 - Controller Node Disk I/O Overload High Availability
Short name
----------
-dovetail.ha.tc007.disk_I/O_overload
+yardstick.ha.disk_load
+
+Yardstick test case: opnfv_yardstick_tc052.yaml
Use case specification
----------------------
@@ -668,16 +690,18 @@ Test Case 8 - Controller Load Balance as a Service High Availability
Short name
----------
-dovetail.ha.tc008.load_balance_service_down
+yardstick.ha.haproxy
+
+Yardstick test case: opnfv_yardstick_tc053.yaml
Use case specification
----------------------
-This test verifies the high availability of "load balancer" service. When
-the "load balancer" service of a specified controller node is killed, whether
-"load balancer" service on other controller nodes will work, and whether the
-controller node will restart the "load balancer" service are checked. This
-test case kills the processes of "load balancer" service on the selected
+This test verifies the high availability of "haproxy" service. When
+the "haproxy" service of a specified controller node is killed, whether
+"haproxy" service on other controller nodes will work, and whether the
+controller node will restart the "haproxy" service are checked. This
+test case kills the processes of "haproxy" service on the selected
controller node, then checks whether the request of the related OpenStack
command is processed with no failure and whether the killed processes are
recovered.
@@ -685,8 +709,10 @@ recovered.
Test preconditions
------------------
-There is more than one controller node, which is providing the "load balancer"
-service for rest-api. Denoted as Node1 in the following configuration.
+There is more than one controller node, which is providing the "haproxy"
+service for rest-api.
+
+Denoted as Node1 in the following configuration.
Basic test flow execution description and pass/fail criteria
------------------------------------------------------------
@@ -694,33 +720,32 @@ Basic test flow execution description and pass/fail criteria
Methodology for monitoring high availability
''''''''''''''''''''''''''''''''''''''''''''
-The high availability of "load balancer" service is evaluated by monitoring
+The high availability of "haproxy" service is evaluated by monitoring
service outage time and process outage time
Service outage time is tested by continuously executing "openstack image list"
command in loop and checking if the response of the command request is returned
with no failure.
-When the response fails, the "load balancer" service is considered in outage.
+When the response fails, the "haproxy" service is considered in outage.
The time between the first response failure and the last response failure is
considered as service outage time.
-Process outage time is tested by checking the status of processes of "load
-balancer" service on the selected controller node. The time of those processes
+Process outage time is tested by checking the status of processes of "haproxy"
+service on the selected controller node. The time of those processes
being killed to the time of those processes being recovered is the process
outage time.
-Process recovery is verified by checking the existence of processes of "load
-balancer" service.
+Process recovery is verified by checking the existence of processes of "haproxy" service.
Test execution
''''''''''''''
* Test action 1: Connect to Node1 through SSH, and check that processes of
- "load balancer" service are running on Node1
-* Test action 2: Start two monitors: one for processes of "load balancer"
+ "haproxy" service are running on Node1
+* Test action 2: Start two monitors: one for processes of "haproxy"
service and the other for "openstack image list" command. Each monitor will
run as an independent process
* Test action 3: Connect to Node1 through SSH, and then kill the processes of
- "load balancer" service
+ "haproxy" service
* Test action 4: Continuously measure service outage time from the monitor until
the service outage time is more than 5s
* Test action 5: Continuously measure process outage time from the monitor until
@@ -737,7 +762,322 @@ A negative result will be generated if the above is not met in completion.
Post conditions
---------------
-Restart the processes of "load balancer" if they are not running.
+Restart the processes of "haproxy" if they are not running.
+
+----------------------------------------------------------------
+Test Case 9 - Controller node OpenStack service down - Database
+----------------------------------------------------------------
+
+Short name
+----------
+
+yardstick.ha.database
+
+Yardstick test case: opnfv_yardstick_tc090.yaml
+
+Use case specification
+----------------------
+
+This test case verifies that the high availability of the data base instances
+used by OpenStack (mysql) on control node is working properly.
+Specifically, this test case kills the processes of database service on a
+selected control node, then checks whether the request of the related
+OpenStack command is OK and the killed processes are recovered.
+
+Test preconditions
+------------------
+
+In this test case, an attacker called "kill-process" is needed.
+This attacker includes three parameters: fault_type, process_name and host.
+
+The purpose of this attacker is to kill any process with a specific process
+name which is run on the host node. In case that multiple processes use the
+same name on the host node, all of them are going to be killed by this attacker.
+
+Basic test flow execution description and pass/fail criteria
+------------------------------------------------------------
+
+Methodology for verifying service continuity and recovery
+'''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+In order to verify this service two different monitors are going to be used.
+
+As first monitor is used a OpenStack command and acts as watcher for
+database connection of different OpenStack components.
+
+For second monitor is used a process monitor and the main purpose is to watch
+whether the database processes on the host node are killed properly.
+
+Therefore, in this test case, there are two metrics:
+
+* service_outage_time, which indicates the maximum outage time (seconds)
+ of the specified OpenStack command request
+* process_recover_time, which indicates the maximum time (seconds) from the
+ process being killed to recovered
+
+Test execution
+''''''''''''''
+* Test action 1: Connect to Node1 through SSH, and check that "database"
+ processes are running on Node1
+* Test action 2: Start two monitors: one for "database" processes on the host
+ node and the other for connection toward database from OpenStack
+ components, verifying the results of openstack image list, openstack router list,
+ openstack stack list and openstack volume list.
+ Each monitor will run as an independent process
+* Test action 3: Connect to Node1 through SSH, and then kill the "mysql"
+ process(es)
+* Test action 4: Stop monitors after a period of time specified by "waiting_time".
+ The monitor info will be aggregated.
+* Test action 5: Verify the SLA and set the verdict of the test case to pass or fail.
+
+
+Pass / fail criteria
+''''''''''''''''''''
+
+Check whether the SLA is passed:
+- The process outage time is less than 30s.
+- The service outage time is less than 5s.
+
+The database operations are carried out in above order and no errors occur.
+
+A negative result will be generated if the above is not met in completion.
+
+Post conditions
+---------------
+
+The database service is up and running again.
+If the database service did not recover successfully by itself,
+the test explicitly restarts the database service.
+
+------------------------------------------------------------------------
+Test Case 10 - Controller Messaging Queue as a Service High Availability
+------------------------------------------------------------------------
+
+Short name
+----------
+
+yardstick.ha.rabbitmq
+
+Yardstick test case: opnfv_yardstick_tc056.yaml
+
+Use case specification
+----------------------
+
+This test case will verify the high availability of the messaging queue
+service (RabbitMQ) that supports OpenStack on controller node. This
+test case expects that message bus service implementation is RabbitMQ.
+If the SUT uses a different message bus implementations, the Dovetail
+configuration (pod.yaml) can be changed accordingly. When messaging
+queue service (which is active) of a specified controller node
+is killed, the test case will check whether messaging queue services
+(which are standby) on other controller nodes will be switched active,
+and whether the cluster manager on the attacked controller node will
+restart the stopped messaging queue.
+
+Test preconditions
+------------------
+
+There is more than one controller node, which is providing the "messaging queue"
+service. Denoted as Node1 in the following configuration.
+
+Basic test flow execution description and pass/fail criteria
+------------------------------------------------------------
+
+Methodology for verifying service continuity and recovery
+'''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+The high availability of "messaging queue" service is evaluated by monitoring
+service outage time and process outage time.
+
+Service outage time is tested by continuously executing "openstack image list",
+"openstack network list", "openstack volume list" and "openstack stack list"
+commands in loop and checking if the responses of the command requests are
+returned with no failure.
+When the response fails, the "messaging queue" service is considered in outage.
+The time between the first response failure and the last response failure is
+considered as service outage time.
+
+Process outage time is tested by checking the status of processes of "messaging
+queue" service on the selected controller node. The time of those processes
+being killed to the time of those processes being recovered is the process
+outage time.
+Process recovery is verified by checking the existence of processes of
+"messaging queue" service.
+
+Test execution
+''''''''''''''
+
+* Test action 1: Start five monitors: one for processes of "messaging queue"
+ service and the others for "openstack image list", "openstack network list",
+ "openstack stack list" and "openstack volume list" command. Each monitor
+ will run as an independent process
+* Test action 2: Connect to Node1 through SSH, and then kill all the processes of
+ "messaging queue" service
+* Test action 3: Continuously measure service outage time from the monitors until
+ the service outage time is more than 5s
+* Test action 4: Continuously measure process outage time from the monitor until
+ the process outage time is more than 30s
+
+Pass / fail criteria
+''''''''''''''''''''
+
+Test passes if the process outage time is no more than 30s and
+the service outage time is no more than 5s.
+
+A negative result will be generated if the above is not met in completion.
+
+Post conditions
+---------------
+Restart the processes of "messaging queue" if they are not running.
+
+---------------------------------------------------------------------------
+Test Case 11 - Controller node OpenStack service down - Controller Restart
+---------------------------------------------------------------------------
+
+Short name
+----------
+
+yardstick.ha.controller_restart
+
+Yardstick test case: opnfv_yardstick_tc025.yaml
+
+Use case specification
+----------------------
+
+This test case verifies that the high availability of controller node is working
+properly.
+Specifically, this test case shutdowns a specified controller node via IPMI,
+then checks whether all services provided by the controller node are OK with
+some monitor tools.
+
+Test preconditions
+------------------
+
+In this test case, an attacker called "host-shutdown" is needed.
+This attacker includes two parameters: fault_type and host.
+
+The purpose of this attacker is to shutdown a controller and check whether the
+services are handled by this controller are still working normally.
+
+Basic test flow execution description and pass/fail criteria
+------------------------------------------------------------
+
+Methodology for verifying service continuity and recovery
+'''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+In order to verify this service one monitor is going to be used.
+
+This monitor is using an OpenStack command and the respective command name of
+the OpenStack component that we want to verify that the respective service is
+still running normally.
+
+In this test case, there is one metric: 1)service_outage_time: which indicates
+the maximum outage time (seconds) of the specified OpenStack command request.
+
+Test execution
+''''''''''''''
+* Test action 1: Connect to Node1 through SSH, and check that controller services
+ are running normally
+* Test action 2: Start monitors: each monitor will run as independently
+ process, monitoring the image list, router list, stack list and volume list accordingly.
+ The monitor info will be collected.
+* Test action 3: Using the IPMI component, the Node1 is shut-down remotely.
+* Test action 4: Stop monitors after a period of time specified by "waiting_time".
+ The monitor info will be aggregated.
+* Test action 5: Verify the SLA and set the verdict of the test case to pass or fail.
+
+
+Pass / fail criteria
+''''''''''''''''''''
+
+Check whether the SLA is passed:
+- The process outage time is less than 30s.
+- The service outage time is less than 5s.
+
+The controller operations are carried out in above order and no errors occur.
+
+A negative result will be generated if the above is not met in completion.
+
+Post conditions
+---------------
+
+The controller has been restarted
+
+----------------------------------------------------------------------------
+Test Case 12 - OpenStack Controller Virtual Router Service High Availability
+----------------------------------------------------------------------------
+
+Short name
+----------
+
+yardstick.ha.neutron_l3_agent
+
+Yardstick test case: opnfv_yardstick_tc058.yaml
+
+Use case specification
+----------------------
+
+This test case will verify the high availability of virtual routers(L3 agent)
+on controller node. When a virtual router service on a specified controller
+node is shut down, this test case will check whether the network of virtual
+machines will be affected, and whether the attacked virtual router service
+will be recovered.
+
+Test preconditions
+------------------
+
+There is more than one controller node, which is providing the Neutron API
+extension called "neutron-l3-agent" virtual router service API.
+
+Denoted as Node1 in the following configuration.
+
+Basic test flow execution description and pass/fail criteria
+------------------------------------------------------------
+
+Methodology for verifying service continuity and recovery
+'''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+The high availability of "neutrol-l3-agent" virtual router service is evaluated
+by monitoring service outage time and process outage time.
+
+Service outage is tested using ping to virtual machines. Ping tests that
+the network routing of virtual machines is ok.
+When the response fails, the virtual router service is considered in outage.
+The time between the first response failure and the last response failure is
+considered as service outage time.
+
+Process outage time is tested by checking the status of processes of "neutron-l3-agent"
+service on the selected controller node. The time of those processes being
+killed to the time of those processes being recovered is the process outage time.
+
+Process recovery is verified by checking the existence of processes of
+"neutron-l3-agent" service.
+
+Test execution
+''''''''''''''
+* Test action 1: Two host VMs are booted, these two hosts are in two different
+ networks, the networks are connected by a virtual router.
+* Test action 2: Start monitors: each monitor will run with independently process.
+ The monitor info will be collected.
+* Test action 3: Do attacker: Connect the host through SSH, and then execute the kill
+ process script with param value specified by “process_name”
+* Test action 4: Stop monitors after a period of time specified by “waiting_time”
+ The monitor info will be aggregated.
+* Test action 5: Verify the SLA and set the verdict of the test case to pass or fail.
+
+Pass / fail criteria
+''''''''''''''''''''
+
+Check whether the SLA is passed:
+- The process outage time is less than 30s.
+- The service outage time is less than 5s.
+
+A negative result will be generated if the above is not met in completion.
+
+Post conditions
+---------------
+Delete image with "openstack image delete neutron-l3-agent_ha_image".
+Delete flavor with "openstack flavor delete neutron-l3-agent_ha_flavor".