1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
|
.. This work is licensed under a Creative Commons Attribution 4.0 International
.. License.
.. http://creativecommons.org/licenses/by/4.0
.. (c) OPNFV, Yin Kanglin and others.
.. 14_ykl@tongji.edu.cn
*************************************
Yardstick Test Case Description TC057
*************************************
+-----------------------------------------------------------------------------+
|OpenStack Controller Cluster Management Service High Availability |
+==============+==============================================================+
|test case id | |
+--------------+--------------------------------------------------------------+
|test purpose | This test case will verify the quorum configuration of the |
| | cluster manager(pacemaker) on controller nodes. When a |
| | controller node , which holds all active application |
| | resources, failed to communicate with other cluster nodes |
| | (via corosync), the test case will check whether the standby |
| | application resources will take place of those active |
| | application resources which should be regarded to be down in |
| | the cluster manager. |
+--------------+--------------------------------------------------------------+
|test method | This test case kills the processes of cluster messaging |
| | service(corosync) on a selected controller node(the node |
| | holds the active application resources), then checks whether |
| | active application resources are switched to other |
| | controller nodes and whether the Openstack commands are OK. |
+--------------+--------------------------------------------------------------+
|attackers | In this test case, an attacker called "kill-process" is |
| | needed. This attacker includes three parameters: |
| | 1) fault_type: which is used for finding the attacker's |
| | scripts. It should be always set to "kill-process" in this |
| | test case. |
| | 2) process_name: which is the process name of the load |
| | balance service. If there are multiple processes use the |
| | same name on the host, all of them are killed by this |
| | attacker. |
| | 3) host: which is the name of a control node being attacked. |
| | |
| | In this case, this process name should set to "corosync" , |
| | for example |
| | -fault_type: "kill-process" |
| | -process_name: "corosync" |
| | -host: node1 |
+--------------+--------------------------------------------------------------+
|monitors | In this test case, a kind of monitor is needed: |
| | 1. the "openstack-cmd" monitor constantly request a specific |
| | Openstack command, which needs two parameters: |
| | 1) monitor_type: which is used for finding the monitor class |
| | and related scripts. It should be always set to |
| | "openstack-cmd" for this monitor. |
| | 2) command_name: which is the command name used for request |
| | |
| | In this case, the command_name of monitor1 should be services|
| | that are managed by the cluster manager. (Since rabbitmq and |
| | haproxy are managed by pacemaker, most Openstack Services |
| | can be used to check high availability in this case) |
| | |
| | (e.g.) |
| | monitor1: |
| | -monitor_type: "openstack-cmd" |
| | -command_name: "nova image-list" |
| | monitor2: |
| | -monitor_type: "openstack-cmd" |
| | -command_name: "neutron router-list" |
| | monitor3: |
| | -monitor_type: "openstack-cmd" |
| | -command_name: "heat stack-list" |
| | monitor4: |
| | -monitor_type: "openstack-cmd" |
| | -command_name: "cinder list" |
| | |
+--------------+--------------------------------------------------------------+
|checkers | In this test case, a checker is needed, the checker will |
| | the status of application resources in pacemaker and the |
| | checker have three parameters: |
| | 1) checker_type: which is used for finding the result |
| | checker class and related scripts. In this case the checker |
| | type will be "pacemaker-check-resource" |
| | 2) resource_name: the application resource name |
| | 3) resource_status: the expected status of the resource |
| | 4) expectedValue: the expected value for the output of the |
| | checker script, in the case the expected value will be the |
| | identifier in the cluster manager |
| | 3) condition: whether the expected value is in the output of |
| | checker script or is totally same with the output. |
| | (note: pcs is required to installed on controller node in |
| | order to run this checker) |
| | |
| | (e.g.) |
| | checker1: |
| | -checker_type: "pacemaker-check-resource" |
| | -resource_name: "p_rabbitmq-server" |
| | -resource_status: "Stopped" |
| | -expectedValue: "node-1" |
| | -condition: "in" |
| | checker2: |
| | -checker_type: "pacemaker-check-resource" |
| | -resource_name: "p_rabbitmq-server" |
| | -resource_status: "Master" |
| | -expectedValue: "node-2" |
| | -condition: "in" |
+--------------+--------------------------------------------------------------+
|metrics | In this test case, there are two metrics: |
| | 1)service_outage_time: which indicates the maximum outage |
| | time (seconds) of the specified Openstack command request. |
+--------------+--------------------------------------------------------------+
|test tool | None. Self-developed. |
+--------------+--------------------------------------------------------------+
|references | ETSI NFV REL001 |
+--------------+--------------------------------------------------------------+
|configuration | This test case needs two configuration files: |
| | 1) test case file: opnfv_yardstick_tc057.yaml |
| | -Attackers: see above "attackers" description |
| | -Monitors: see above "monitors" description |
| | -Checkers: see above "checkers" description |
| | -Steps: the test case execution step, see "test sequence" |
| | description below |
| | |
| | 2)POD file: pod.yaml |
| | The POD configuration should record on pod.yaml first. |
| | the "host" item in this test case will use the node name in |
| | the pod.yaml. |
+--------------+------+----------------------------------+--------------------+
|test sequence | description and expected result |
| | |
+--------------+--------------------------------------------------------------+
|step 1 | start monitors: |
| | each monitor will run with independently process |
| | |
| | Result: The monitor info will be collected. |
| | |
+--------------+--------------------------------------------------------------+
|step 2 | do attacker: connect the host through SSH, and then execute |
| | the kill process script with param value specified by |
| | "process_name" |
| | |
| | Result: Process will be killed. |
| | |
+--------------+--------------------------------------------------------------+
|step 3 | do checker: check whether the status of application |
| | resources on different nodes are updated |
| | |
+--------------+--------------------------------------------------------------+
|step 4 | stop monitors after a period of time specified by |
| | "waiting_time" |
| | |
| | Result: The monitor info will be aggregated. |
| | |
+--------------+--------------------------------------------------------------+
|step 5 | verify the SLA |
| | |
| | Result: The test case is passed or not. |
| | |
+--------------+------+----------------------------------+--------------------+
|post-action | It is the action when the test cases exist. It will check |
| | the status of the cluster messaging process(corosync) on the |
| | host, and restart the process if it is not running for next |
| | test cases. |
| | Notice: This post-action uses 'lsb_release' command to check |
| | the host linux distribution and determine the OpenStack |
| | service name to restart the process. Lack of 'lsb_release' |
| | on the host may cause failure to restart the process. |
| | |
+--------------+------+----------------------------------+--------------------+
|test verdict | Fails only if SLA is not passed, or if there is a test case |
| | execution problem. |
| | |
+--------------+--------------------------------------------------------------+
|