summaryrefslogtreecommitdiffstats
path: root/docs/requirements/impl_architecture.rst
blob: 42541ac59650728fa1be653e2a2bcb2cfa9f5591 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
Detailed architecture and message flows
=======================================

Within the Promise project we consider two different architectural options, i.e.
a *shim-layer* based architecture and an architecture targeting at full
OpenStack *integration*.

Shim-layer architecture
-----------------------

The *shim-layer architecture* is using a layer on top of OpenStack to provide
the capacity management, resource reservation, and resource allocation features.


Detailed Message Flows
^^^^^^^^^^^^^^^^^^^^^^

Note, that only selected parameters for the messages are shown. Refer to
:ref:`northbound_API` and Annex :ref:`yang_schema` for a full set of message
parameters.

Resource Capacity Management
""""""""""""""""""""""""""""

.. figure:: images/figure5_new.png
    :name: figure5
    :width: 90%

    Capacity Management Scenario

:numref:`figure5` shows a detailed message flow between the consumers and the
capacity management functional blocks inside the shim-layer. It has the
following steps:

    * Step 1a: The Consumer sends a *query-capacity* request to Promise
      using some filter like time-windows or resource type. The capacity is
      looked up in the shim-layer capacity map.

    * Step 1b: The shim-layer will respond with information about the
      total, available, reserved, and used (allocated) capacities matching the
      filter.

    * Step 2a: The Consumer can send *increase/decrease-capacity* requests
      to update the capacity available to the reservation system. It can be
      100% of available capacity in the given provider/source or only a subset,
      i.e., it can allow for leaving some "buffer" in the actual NFVI to be
      used outside the Promise shim-layer or for a different reservation
      service instance. It can also be used to inform the reservation system
      that from a certain time in the future, additional resources can be
      reserved (e.g. due to a planned upgrade of the capacity), or the
      available capacity will be reduced (e.g. due to a planned downtime of
      some of the resources).

    * Step 2b: The shim-layer will respond with an ACK/NACK message.

    * Step 3a: Consumers can subscribe for capacity-change events using a
      filter.

    * Step 3b: Each successful subscription is responded with a
      subscription_id.

    * Step 4: The shim-layer monitors the capacity information for the
      various types of resources by periodically querying the various
      Controllers (e.g. Nova, Neutron, Cinder) or by creating event alarms in
      the VIM (e.g. with Ceilometer for OpenStack) and updates capacity
      information in its capacity map.

    * Step 5: Capacity changes are notified to the Consumer.

Resource Reservation
""""""""""""""""""""

.. figure:: images/figure6_new.png
    :name: figure6
    :width: 90%

    Resource Reservation for Future Use Scenario

:numref:`figure6` shows a detailed message flow between the Consumer and the
resource reservation functional blocks inside the shim-layer. It has the
following steps:

    * Step 1a: The Consumer creates a resource reservation request for
      future use by setting a start and end time for the reservation as well as
      more detailed information about the resources to be reserved. The Promise
      shim-layer will check the free capacity in the given time window and in
      case sufficient capacity exists to meet the reservation request, will
      mark those resources "reserved" in its reservation map.

    * Step 1b: If the reservation was successful, a reservation_id and
      status of the reservation will be returned to the Consumer. In case the
      reservation cannot be met, the shim-layer may return information about
      the maximum capacity that could be reserved during the requested time
      window and/or a potential time window where the requested (amount of)
      resources would be available.

    * Step 2a: Reservations can be updated using an *update-reservation*,
      providing the reservation_id and the new reservation_data. Promise
      Reservation Manageer will check the feasibility to update the reservation
      as requested.

    * Step 2b: If the reservation was updated successfully, a
      reservation_id and status of the reservation will be returned to the
      Consumer. Otherwise, an appropriate error message will be returned.

    * Step 3a: A *cancel-reservation* request can be used to withdraw an
      existing reservation. Promise will update the reservation map by removing
      the reservation as well as the capacity map by adding the freed capacity.

    * Step 3b: The response message confirms the cancelation.

    * Step 4a: Consumers can also issue *query-reservation* requests to
      receive a list of reservation. An input filter can be used to narrow down
      the query, e.g., only provide reservations in a given time window.
      Promise will query its reservation map to identify reservations matching
      the input filter.

    * Step 4b: The response message contains information about all
      reservations matching the input filter. It also provides information
      about the utilization in the requested time window.

    * Step 5a: Consumers can subscribe for reservation-change events using
      a filter.

    * Step 5b: Each successful subscription is responded with a
      subscription_id.

    * Step 6a: Promise synchronizes the available and used capacity with
      the underlying VIM.

    * Step 6b: In certain cases, e.g., due a failure in the underlying
      hardware, some reservations cannot be kept up anymore and have to be
      updated or canceled. The shim-layer will identify affected reservations
      among its reservation records.

    * Step 7: Subscribed Consumers will be informed about the updated
      reservations. The notification contains the updated reservation_data and
      new status of the reservation. It is then up to the Consumer to take
      appropriate actions in order to ensure high priority reservations are
      favored over lower priority reservations.

Resource Allocation
"""""""""""""""""""

.. figure:: images/figure7_new.png
    :name: figure7
    :width: 90%

    Resource Allocation

:numref:`figure7` shows a detailed message flow between the Consumer, the
functional blocks inside the shim-layer, and the VIM. It has the following
steps:

    * Step 1a: The Consumer sends a *create-instance* request providing
      information about the resources to be reserved, i.e., provider_id
      (optional in case of only one provider), name of the instance, the
      requested flavour and image, etc. If the allocation is against an
      existing reservation, the reservation_id has to be provided.

    * Step 1b: If a reservation_id was provided, Promise checks if a
      reservation with that ID exists, the reservation start time has arrived
      (i.e. the reservation is active), and the required capacity for the
      requested flavor is within the available capacity of the reservation. If
      those conditions are met, Promise creates a record for the allocation
      (VMState="INITIALIZED") and update its databases. If no reservation_id
      was provided in the allocation request, Promise checks whether the
      required capacity to meet the request can be provided from the available,
      non-reserved capacity. If yes, Promise creates a record for the
      allocation and update its databases. In any other case, Promise rejects
      the *create-instance* request.

    * Step 2: In the case the *create-instance* request was rejected,
      Promise responds with a "status=rejected" providing the reason of the
      rejection. This will help the Consumer to take appropriate actions, e.g.,
      send an updated *create-instance* request. The allocation work flow will
      terminate at this step and the below steps are not executed.

    * Step 3a: If the *create-instance* request was accepted and a related
      allocation record has been created, the shim-layer issues a
      *createServer* request to the VIM Controller providing all information to
      create the server instance.

    * Step 3b: The VIM Controller sends an immediate reply with an
      instance_id and starts the VIM-internal allocation process.

    * Step 4: The Consumer gets an immediate response message with
      allocation status "in progress" and the assigned instance_id.

    * Step 5a+b: The consumer subscribes to receive notifications about
      allocation events related to the requested instance. Promise responds
      with an acknowledgment including a subscribe_id.

    * Step 6: In parallel to the previous step, Promise shim-layer creates
      an alarm in Aodh to receive notifications about all changes to the
      VMState for instance_id.

    * Step 7a: The VIM Controller notifies all instance related events to
      Ceilometer. After the allocation has been completed or failed, it sends
      an event to Ceilometer. This triggers the OpenStack alarming service Aodh
      to notify the new VMState (e.g. ACTIVE and ERROR) to the shim-layer that
      updates its internal allocation records.

    * Step 7b: Promise sends a notification message to the subscribed
      Consumer with information on the allocated resources including their new
      VMState.

    * Step 8a+b: Allocated instances can be terminated by the Consumer by
      sending a *destroy-instance* request to the shim-layer. Promise responds
      with an acknowledgment and the new status "DELETING" for the instance.

    * Step 9a: Promise sends a *deleteServer* request for the instance_id
      to the VIM Controller.

    * Step 10a: After the instance has been deleted, an event alarm is
      sent to the shim-layer that updates its internal allocation records and
      capacity utilization.

    * Step 10b: The shim-layer also notifies the subscribed Consumer about
      the successfully destroyed instance.


Internal operations
^^^^^^^^^^^^^^^^^^^

.. note:: This section is to be updated

In the following, the internal logic and operations of the shim-layer will be
explained in more detail, e.g. the "check request" (step 1b in
:numref:`figure7` of the allocation work flow).



Integrated architecture
-----------------------

The *integrated architecture* aims at full integration with OpenStack.
This means that it is planned to use the already existing OpenStack APIs
extended with the reservation capabilities.

The advantage of this approach is that we don't need to re-model the
complex resource structure we have for the virtual machines and the
corresponding infrastructure.

The atomic item is the virtual machine with the minimum set of resources
it requires to be able to start it up. It is important to state that
resource reservation is handled on VM instance level as opposed to standalone
resources like CPU, memory and so forth. As the placement is an important
aspect in order to be able to use the reserved resources it provides the
constraint to handle resources in groups.

The placement constraint also makes it impossible to use a quota management
system to solve the base use case described earlier in this document.

OpenStack had a project called Blazar, which was created in order to provide
resource reservation functionality in cloud environments. It uses the Shelve
API of Nova, which provides a sub-optimal solution. Due to the fact that this
feature blocks the reserved resources this solution cannot be considered to
be final. Further work is needed to reach a more optimal stage, where the
Nova scheduler is intended to be used to schedule the resources for future
use to make the reservations.

Phases of the work
^^^^^^^^^^^^^^^^^^

The work has two main stages to reach the final solution. The following main work items
are on the roadmap for this approach:

#. Sub-optimal solution by using the shelve API of Nova through the Blazar project:

   * Fix the code base of the Blazar project:

     Due to integration difficulties the Blazar project got suspended. Since the last
     activities in that repository the OpenStack code base and environment changed
     significantly, which means that the project's code base needs to be updated to the
     latest standards and has to be able to interact with the latest version of the
     other OpenStack services.

   * Update the Blazar API:

     The REST API needs to be extended to contain the attributes for the reservation
     defined in this document. This activity shall include testing towards the new API.

#. Use Nova scheduler to avoid blocking the reserved resources:

   * Analyze the Nova scheduler:

     The status and the possible interface between the resource reservation system and
     the Nova scheduler needs to be identified. It is crucial to achieve a much more
     optimal solution than what the current version of Blazar can provide. The goal is
     to be able to use the reserved resources before the reservation starts. In order to
     be able to achieve this we need the scheduler to do scheduling for the future
     considering the reservation intervals that are specified in the request.

   * Define a new design based on the analysis and start the work on it:

     The design for the more optimal solution can be defined only after analyzing the
     structure and capabilities of the Nova scheduler.

   * This phase can be started in parallel with the previous one.

Detailed Message Flows
^^^^^^^^^^^^^^^^^^^^^^

.. note:: to be done

Resource Reservation
""""""""""""""""""""

.. note:: to be specified