summaryrefslogtreecommitdiffstats
path: root/docs/release/configguide/visibility_config_guide.rst
blob: 77db2f78911590fc2b448ff4452fbc857e08d8fc (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
.. This work is licensed under a Creative Commons Attribution 4.0 International License.
.. http://creativecommons.org/licenses/by/4.0
.. SPDX-License-Identifier CC-BY-4.0
.. (c) Authors of Clover

.. _visibility_config_guide:

==============================================
Clover Visibility Services Configuration Guide
==============================================

This document provides a guide to use Clover visibility services, which are initially delivered in
the Clover Gambia release. A key assumption of this guide is that Istio 1.0.x has been deployed
to Kubernetes (k8s), as it is a foundational element for Clover visibility services.

Overview
=========

Clover visibility services are an integrated set of microservices that allow HTTP/gRPC traffic to
be observed and analyzed in an Istio service mesh within k8s managed clusters. It leverages
observability open source projects from the CNCF community such as Jaeger for distributed tracing
and Prometheus for monitoring. These tools are packaged with Istio and service mesh sidecars have
extensive hooks built in to interface with them. They gather low-level, per HTTP request driven
data. Clover visibility services focus on enriching the data, gathering it from various sources
and analyzing it at the system or aggregate level.

The visibility services are comprised of the following microservices all deployed within the
**clover-system** namespace in a k8s cluster:

 * **clover-controller** -  exposes REST interface external to the k8s cluster and
   used to relay messages to other Clover services via gRPC from external agents including
   **cloverctl** CLI, web browsers and other APIs, scripts or CI jobs. It incorporates a web
   application with dashboard views to consume analyzed visibility data and control other
   Clover services.

 * **clover-collector** - gathers data from tracing (Jaeger) and monitoring (Prometheus)
   infrastructure that is integrated with Istio using a pull model.

 * **clover-spark** - is a Clover specific Apache Spark service. It leverages Spark 2.3.x native
   k8s support and includes visibility services artifacts to execute Spark jobs.

 * **clover-spark-submit** - simple service to continually perform Spark job submits interacting
   with the k8s API to spawn driver and executor pods.

 * **cassandra** - a sink for visibility data from **clover-collector** with specific schemas
   for monitoring and tracing.

 * **redis** - holds configuration data and analyzed data for visibility services. Used by
   **clover-controller** web application and REST API to maintain state and exchange data.

The table below shows key details of the visibility service manifests outlined above:

+---------------------+----------------------+---------------------------+-----------------------+
| Service             | Kubernetes           | Docker Image              | Ports                 |
|                     | Deployment App Name  |                           |                       |
+=====================+======================+===========================+=======================+
| Controller          | clover-controller    | opnfv/clover-controller   | HTTP: 80 (external)   |
|                     |                      |                           | gRPC: 50052, 50054    |
|                     |                      |                           |                       |
+---------------------+----------------------+---------------------------+-----------------------+
| Collector           | clover-collector     | opnfv/clover-collector    | Jaeger: 16686         |
|                     |                      |                           | Prometheus: 9090      |
|                     |                      |                           | gRPC: 50054           |
|                     |                      |                           | Datastore: 6379, 9042 |
+---------------------+----------------------+---------------------------+-----------------------+
| Spark               | clover-spark         | opnfv/clover-spark        | Datastore: 6379, 9042 |
|                     | clover-spark-submit  | opnfv/clover-spark-submit |                       |
|                     |                      |                           |                       |
|                     |                      |                           |                       |
|                     |                      |                           |                       |
+---------------------+----------------------+---------------------------+-----------------------+
| Data Stores         | cassandra            | cassandra:3               | 9042                  |
|                     | redis                | k8s.gcr.io/redis:v1       | 6379                  |
|                     |                      | kubernetes/redis:v1       |                       |
+---------------------+----------------------+---------------------------+-----------------------+

The **redis** and **cassandra** data stores use community container images while the other
services use Clover-specific Dockerhub OPNFV images.

Additionally, visibility services are operated with the **cloverctl** CLI. Further information on
setting up **clover-controller** and **cloverctl** can be found at
:ref:`controller_services_config_guide`.


.. image:: imgs/visibility_overview.png
    :align: center
    :scale: 100%

The diagram above shows the flow of data through the visibility services where all blue arrows
denote the path of data ingestion originating from the observability tools. The
**clover-collector** reads data from these underlying tools using their REST query interfaces
and inserts into schemas within the **cassandra** data store.

Apache Spark jobs are used to analyze data within **cassandra**. Spark is deployed using native
Kubernetes support added since Spark version 2.3. The **clover-spark-submit**
container continually submits jobs to the Kubernetes API. The API spawns a Spark driver pod which
in turn spawns executor pods to run Clover-specific jobs packaged in the **clover-spark**
service.

Analyzed data from **clover-spark** jobs is written to **redis**, an in-memory data store. The
**clover-controller** provides a REST API for the analyzed visibility data to be read by other
services (**cloverctl**, CI jobs, etc.) or viewed using a Clover provided visibility web
dashboard.

Deploying the visibility engine
===============================

.. _visibility_prerequisites:

Prerequisites
-------------

The following assumptions must be met before continuing on to deployment:

 * Installation of Docker has already been performed. It's preferable to install Docker CE.
 * Installation of k8s in a single-node or multi-node cluster with at least
   twelve cores and 16GB of memory. Google Kubernetes Engine (GKE) clusters are supported.
 * Installation of Istio in the k8s cluster. See :ref:`sdc_deploy_container`.
 * Clover CLI (**cloverctl**) has been downloaded and setup. Instructions to deploy can be found
   at :ref:`controller_services_controller`.


Deploy with Clover CLI
----------------------

To deploy the visibility services into your k8s cluster use the **cloverctl** CLI command
shown below::

    $ cloverctl create system visibility

Container images with the Gambia release tag will pulled if the tag is unspecified. The release
tag is **opnfv-7.0.0** for the Gambia release. To deploy the latest containers from master, use
the command shown below::

    $ cloverctl create system visibility -t latest

    Using config file: /home/earrage/.cloverctl.yaml
    Creating visibility services
    Created clover-system namespace
    Created statefulset "cassandra".
    Created service "cassandra"
    Created pod "redis".
    Created service "redis"
    Created deployment "clover-collector".
    Image: opnfv/clover-collector:latest
    Created service "clover-collector"
    Created deployment "clover-controller".
    Image: opnfv/clover-controller:latest
    Created service "clover-controller-internal"
    Created serviceaccount "clover-spark".
    Created clusterrolebinding "clover-spark-default".
    Created clusterrolebinding "clover-spark".
    Created deployment "clover-spark-submit".
    Image: opnfv/clover-spark-submit:latest

Verifying the deployment
------------------------

To verify the visibility services deployment, ensure the following pods have been deployed
with the command below::

    $ kubectl get pod --all-namespaces

    NAMESPACE      NAME                                    READY     STATUS
    clover-system   clover-collector-7dcc5d849f-6jc6m       1/1       Running
    clover-system   clover-controller-74d8596bb5-qrr6b      1/1       Running
    clover-system   cassandra-0                             1/1       Running
    clover-system   redis                                   2/2       Running
    clover-system   clover-spark-submit-6c4d5bcdf8-kc6l9    1/1       Running

Additionally, spark driver and executor pods will continuously be deployed as displayed below::

    clover-system   clover-spark-0fa43841362b3f27b35eaf6112965081-driver
    clover-system   clover-spark-fast-d5135cdbdd8330f6b46431d9a7eb3c20-driver
    clover-system   clover-spark-0fa43841362b3f27b35eaf6112965081-exec-3
    clover-system   clover-spark-0fa43841362b3f27b35eaf6112965081-exec-4

Initializing visibility services
================================

In order to setup visibility services, initialization and start commands must be
invoked from the **cloverctl** CLI. There are sample yaml files in yaml directory
from the **cloverctl** binary path. Navigate to this directory to execute the next
sequence of commands.

Initialize the visibility schemas in cassandra with the following command::

    $ cloverctl init visibility

    Using config file: /home/earrage/.cloverctl.yaml
    clover-controller address: http://10.145.71.21:32044
    Added visibility schemas in cassandra

The initial configuration to the visibility services are the Jaeger tracing and Prometheus
connection parameters and sample interval to **clover-collector**. To start visibility
use the sample yaml provided and execute the command::

    cloverctl start visibility -f start_visibility.yaml

    Started collector on pid: 44

The ``start_visibility.yaml`` has defaults for the tracing and monitoring modules packaged with
Istio 1.0.0.

Configure and control visibility
================================

The core requirement for Clover visibility services to function, is for your services to be
added to the Istio service mesh. Istio deployment and usage instructions are in the
:ref:`sdc_config_guide` and the Service Delivery Controller (SDC) sample can be used to
evaluate the Clover visibility services initially. A user may inject their own web-based services
into the service mesh and track separately.

Connecting to visibility dashboard UI
-------------------------------------

The **clover-controller** service comes packaged with a web-based UI with a visibility view.
To access the dashboard, navigate to the **clover-controller** address for either a ``NodePort``
or ``LoadBalancer`` service

    * http://<node or CNI IP address>:<``NodePort`` port>/
    * http://<``LoadBalancer`` IP address>/

See :ref:`exposing_clover_controller` to expose **clover-controller** externally with a k8s
service.

Set runtime parameters using Clover CLI
---------------------------------------

The services visibility will track are based on the deployment/pod names specified in the k8s
resources. Using some sample services from the SDC guide, the **proxy-access-control**,
**clover-server1**, **clover-server2** and **clover-server3** services are specified in the
``set_visibility.yaml`` sample yaml referenced below.

To modify the configuration of the services visibility will track, use the **cloverctl CLI**,
executing the following command::

    cloverctl set visibility -f set_visibility.yaml

Use the ``services:`` section of the yaml to configure service names to track.

.. code-block:: bash

    # set_visibility.yaml
    services:
        - name: proxy_access_control
        - name: clover_server1
        - name: clover_server2
        - name: clover_server3
    metric_prefixes:
        - prefix: envoy_cluster_outbound_9180__
        - prefix: envoy_cluster_inbound_9180__
    metric_suffixes:
        - suffix: _default_svc_cluster_local_upstream_rq_2xx
        - suffix: _default_svc_cluster_local_upstream_cx_active
    custom_metrics:
        - metric: envoy_tracing_zipkin_spans_sent

Set runtime parameters using dashboard UI
-----------------------------------------

The services being tracked by visibility can also be configured by selecting from the
boxes under **Discovered Services** within the dashboard, as shown in the graphic below.
Services can be multi-selected by using by holding the ``Ctrl`` or ``command`` (Mac OS)
keyboard button down while selecting or unselecting. The SDC services that were configured from
the **cloverctl** CLI above are currently active, denoted as the boxes with blue backgrounds.

.. image:: imgs/visibility_discovered_active.png
    :align: center
    :scale: 100%

In order for any services to be discovered from Jaeger tracing and displayed within the dashboard,
some traffic must target the services of interest. Using curl/wget to send HTTP requests
to your services will cause services to be discovered. Using Clover JMeter validation services,
as detailed :ref:`jmeter_config_guide` against SDC sample services will also generate a service
listing. The **cloverctl** CLI commands below will generate traces through the SDC service chain
with the JMeter master injected into the service mesh::

    $ cloverctl create testplan –f yaml/jmeter_testplan.yaml # yaml located with cloverctl binary
    $ cloverctl start testplan

Clearing visibility data
-------------------------

To clear visibility data in cassandra and redis, which truncates **cassandra** tables and
deletes or zeros out **redis** keys, use the following command::

    $ cloverctl clear visibility

This can be useful when analyzing or observing an issue during a particular time horizon.
The same function can be performed from the dashboard UI using the ``Clear`` button under
``Visibility Controls``, as illustrated in the graphic from the previous section.

Viewing visibility data
========================

The visibility dashboard can be used to view visibility data in real-time. The page will
automatically refresh every 5 seconds. To disable continuous page refresh and freeze on a
snapshot of the data, use the slider at the top of the page that defaults to ``On``. Toggling
it will result in it displaying ``Off``.

The visibility dashboard displays various metrics and graphs of analyzed data described in
subsequent sections.

System metrics
--------------

System metrics provide aggregate counts of cassandra tables including total traces, spans
and metrics, as depicted on the left side of the graphic below.

.. image:: imgs/visibility_system_counts_response_times.png
    :align: center
    :scale: 100%

The metrics counter will continually increase, as it is based on time series data from
Prometheus. The trace count will correspond to the number of HTTP requests sent to services
within the Istio service mesh. The span count ties to trace count, as it is a child object
under Jaeger tracing data hierarchy and is based on the service graph (number of interactions
between microservices for a given request). It will increase more rapidly when service graph
depths are larger.

Per service response times
--------------------------

Per service response times are displayed on the right side of the graphic above and are
calculated from tracing data when visibility is started. The minimum, maximum and average
response times are output over the entire analysis period.

Group by span field counts
--------------------------

This category groups schema fields in various combinations to gain insight into the composition
of HTTP data and can be used by CI scripts to perform various validations. Metrics include:

  * Per service
  * Distinct URL
  * Distinct URL / HTTP status code
  * Distinct user-agent (HTTP header)
  * Per service / distinct URL

The dashboard displays bar/pie charts with counts and percentages, as depicted below. Each distinct
key is displayed when hovering your mouse over a chart value.

.. image:: imgs/visibility_distinct_counts.png
    :align: center
    :scale: 100%

Distinct HTTP details
---------------------

A listing of distinct HTTP user-agents, request URLs and status codes is shown below divided
with tabs.

.. image:: imgs/visibility_distinct_http.png
    :align: center
    :scale: 100%


Monitoring Metrics
------------------

The Istio sidecars (Envoy) provide a lengthy set of metrics exposed through Prometheus. These
metrics can be analyzed with the visibility service by setting up metrics, as outlined in section
`Set runtime parameters using Clover CLI`_. Use ``metric_prefixes`` and ``metric_suffixes``
sections of the set visibility yaml for many Envoy metrics that have a key with the service
straddled by a prefix/suffix. A row in the table and a graph will be displayed for each
combination of service, prefix and suffix.

The metrics are displayed in tabular and scatter plots over time formats from the dashboard, as
shown in the graphic below:

.. image:: imgs/visibility_monitoring_metrics.png
    :align: center
    :scale: 100%

Uninstall from Kubernetes envionment
====================================

Delete with Clover CLI
----------------------

When you're finished working with Clover visibility services, you can uninstall them with the
following command::

     $ cloverctl delete system visibility

The command above will remove the SDC sample services, Istio components and Jaeger/Prometheus
tools from your Kubernetes environment.

Uninstall from Docker environment
=================================

The OPNFV docker images can be removed with the following commands:

.. code-block:: bash

    $ docker rmi opnfv/clover-collector
    $ docker rmi opnfv/clover-spark
    $ docker rmi opnfv/clover-spark-submit
    $ docker rmi opnfv/clover-controller
    $ docker rmi k8s.gcr.io/redis
    $ docker rmi kubernetes/redis
    $ docker rmi cassandra:3