summaryrefslogtreecommitdiffstats
path: root/docs/release/configguide/visibility_config_guide.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/release/configguide/visibility_config_guide.rst')
-rw-r--r--docs/release/configguide/visibility_config_guide.rst403
1 files changed, 403 insertions, 0 deletions
diff --git a/docs/release/configguide/visibility_config_guide.rst b/docs/release/configguide/visibility_config_guide.rst
new file mode 100644
index 0000000..77db2f7
--- /dev/null
+++ b/docs/release/configguide/visibility_config_guide.rst
@@ -0,0 +1,403 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+.. SPDX-License-Identifier CC-BY-4.0
+.. (c) Authors of Clover
+
+.. _visibility_config_guide:
+
+==============================================
+Clover Visibility Services Configuration Guide
+==============================================
+
+This document provides a guide to use Clover visibility services, which are initially delivered in
+the Clover Gambia release. A key assumption of this guide is that Istio 1.0.x has been deployed
+to Kubernetes (k8s), as it is a foundational element for Clover visibility services.
+
+Overview
+=========
+
+Clover visibility services are an integrated set of microservices that allow HTTP/gRPC traffic to
+be observed and analyzed in an Istio service mesh within k8s managed clusters. It leverages
+observability open source projects from the CNCF community such as Jaeger for distributed tracing
+and Prometheus for monitoring. These tools are packaged with Istio and service mesh sidecars have
+extensive hooks built in to interface with them. They gather low-level, per HTTP request driven
+data. Clover visibility services focus on enriching the data, gathering it from various sources
+and analyzing it at the system or aggregate level.
+
+The visibility services are comprised of the following microservices all deployed within the
+**clover-system** namespace in a k8s cluster:
+
+ * **clover-controller** - exposes REST interface external to the k8s cluster and
+ used to relay messages to other Clover services via gRPC from external agents including
+ **cloverctl** CLI, web browsers and other APIs, scripts or CI jobs. It incorporates a web
+ application with dashboard views to consume analyzed visibility data and control other
+ Clover services.
+
+ * **clover-collector** - gathers data from tracing (Jaeger) and monitoring (Prometheus)
+ infrastructure that is integrated with Istio using a pull model.
+
+ * **clover-spark** - is a Clover specific Apache Spark service. It leverages Spark 2.3.x native
+ k8s support and includes visibility services artifacts to execute Spark jobs.
+
+ * **clover-spark-submit** - simple service to continually perform Spark job submits interacting
+ with the k8s API to spawn driver and executor pods.
+
+ * **cassandra** - a sink for visibility data from **clover-collector** with specific schemas
+ for monitoring and tracing.
+
+ * **redis** - holds configuration data and analyzed data for visibility services. Used by
+ **clover-controller** web application and REST API to maintain state and exchange data.
+
+The table below shows key details of the visibility service manifests outlined above:
+
++---------------------+----------------------+---------------------------+-----------------------+
+| Service | Kubernetes | Docker Image | Ports |
+| | Deployment App Name | | |
++=====================+======================+===========================+=======================+
+| Controller | clover-controller | opnfv/clover-controller | HTTP: 80 (external) |
+| | | | gRPC: 50052, 50054 |
+| | | | |
++---------------------+----------------------+---------------------------+-----------------------+
+| Collector | clover-collector | opnfv/clover-collector | Jaeger: 16686 |
+| | | | Prometheus: 9090 |
+| | | | gRPC: 50054 |
+| | | | Datastore: 6379, 9042 |
++---------------------+----------------------+---------------------------+-----------------------+
+| Spark | clover-spark | opnfv/clover-spark | Datastore: 6379, 9042 |
+| | clover-spark-submit | opnfv/clover-spark-submit | |
+| | | | |
+| | | | |
+| | | | |
++---------------------+----------------------+---------------------------+-----------------------+
+| Data Stores | cassandra | cassandra:3 | 9042 |
+| | redis | k8s.gcr.io/redis:v1 | 6379 |
+| | | kubernetes/redis:v1 | |
++---------------------+----------------------+---------------------------+-----------------------+
+
+The **redis** and **cassandra** data stores use community container images while the other
+services use Clover-specific Dockerhub OPNFV images.
+
+Additionally, visibility services are operated with the **cloverctl** CLI. Further information on
+setting up **clover-controller** and **cloverctl** can be found at
+:ref:`controller_services_config_guide`.
+
+
+.. image:: imgs/visibility_overview.png
+ :align: center
+ :scale: 100%
+
+The diagram above shows the flow of data through the visibility services where all blue arrows
+denote the path of data ingestion originating from the observability tools. The
+**clover-collector** reads data from these underlying tools using their REST query interfaces
+and inserts into schemas within the **cassandra** data store.
+
+Apache Spark jobs are used to analyze data within **cassandra**. Spark is deployed using native
+Kubernetes support added since Spark version 2.3. The **clover-spark-submit**
+container continually submits jobs to the Kubernetes API. The API spawns a Spark driver pod which
+in turn spawns executor pods to run Clover-specific jobs packaged in the **clover-spark**
+service.
+
+Analyzed data from **clover-spark** jobs is written to **redis**, an in-memory data store. The
+**clover-controller** provides a REST API for the analyzed visibility data to be read by other
+services (**cloverctl**, CI jobs, etc.) or viewed using a Clover provided visibility web
+dashboard.
+
+Deploying the visibility engine
+===============================
+
+.. _visibility_prerequisites:
+
+Prerequisites
+-------------
+
+The following assumptions must be met before continuing on to deployment:
+
+ * Installation of Docker has already been performed. It's preferable to install Docker CE.
+ * Installation of k8s in a single-node or multi-node cluster with at least
+ twelve cores and 16GB of memory. Google Kubernetes Engine (GKE) clusters are supported.
+ * Installation of Istio in the k8s cluster. See :ref:`sdc_deploy_container`.
+ * Clover CLI (**cloverctl**) has been downloaded and setup. Instructions to deploy can be found
+ at :ref:`controller_services_controller`.
+
+
+Deploy with Clover CLI
+----------------------
+
+To deploy the visibility services into your k8s cluster use the **cloverctl** CLI command
+shown below::
+
+ $ cloverctl create system visibility
+
+Container images with the Gambia release tag will pulled if the tag is unspecified. The release
+tag is **opnfv-7.0.0** for the Gambia release. To deploy the latest containers from master, use
+the command shown below::
+
+ $ cloverctl create system visibility -t latest
+
+ Using config file: /home/earrage/.cloverctl.yaml
+ Creating visibility services
+ Created clover-system namespace
+ Created statefulset "cassandra".
+ Created service "cassandra"
+ Created pod "redis".
+ Created service "redis"
+ Created deployment "clover-collector".
+ Image: opnfv/clover-collector:latest
+ Created service "clover-collector"
+ Created deployment "clover-controller".
+ Image: opnfv/clover-controller:latest
+ Created service "clover-controller-internal"
+ Created serviceaccount "clover-spark".
+ Created clusterrolebinding "clover-spark-default".
+ Created clusterrolebinding "clover-spark".
+ Created deployment "clover-spark-submit".
+ Image: opnfv/clover-spark-submit:latest
+
+Verifying the deployment
+------------------------
+
+To verify the visibility services deployment, ensure the following pods have been deployed
+with the command below::
+
+ $ kubectl get pod --all-namespaces
+
+ NAMESPACE NAME READY STATUS
+ clover-system clover-collector-7dcc5d849f-6jc6m 1/1 Running
+ clover-system clover-controller-74d8596bb5-qrr6b 1/1 Running
+ clover-system cassandra-0 1/1 Running
+ clover-system redis 2/2 Running
+ clover-system clover-spark-submit-6c4d5bcdf8-kc6l9 1/1 Running
+
+Additionally, spark driver and executor pods will continuously be deployed as displayed below::
+
+ clover-system clover-spark-0fa43841362b3f27b35eaf6112965081-driver
+ clover-system clover-spark-fast-d5135cdbdd8330f6b46431d9a7eb3c20-driver
+ clover-system clover-spark-0fa43841362b3f27b35eaf6112965081-exec-3
+ clover-system clover-spark-0fa43841362b3f27b35eaf6112965081-exec-4
+
+Initializing visibility services
+================================
+
+In order to setup visibility services, initialization and start commands must be
+invoked from the **cloverctl** CLI. There are sample yaml files in yaml directory
+from the **cloverctl** binary path. Navigate to this directory to execute the next
+sequence of commands.
+
+Initialize the visibility schemas in cassandra with the following command::
+
+ $ cloverctl init visibility
+
+ Using config file: /home/earrage/.cloverctl.yaml
+ clover-controller address: http://10.145.71.21:32044
+ Added visibility schemas in cassandra
+
+The initial configuration to the visibility services are the Jaeger tracing and Prometheus
+connection parameters and sample interval to **clover-collector**. To start visibility
+use the sample yaml provided and execute the command::
+
+ cloverctl start visibility -f start_visibility.yaml
+
+ Started collector on pid: 44
+
+The ``start_visibility.yaml`` has defaults for the tracing and monitoring modules packaged with
+Istio 1.0.0.
+
+Configure and control visibility
+================================
+
+The core requirement for Clover visibility services to function, is for your services to be
+added to the Istio service mesh. Istio deployment and usage instructions are in the
+:ref:`sdc_config_guide` and the Service Delivery Controller (SDC) sample can be used to
+evaluate the Clover visibility services initially. A user may inject their own web-based services
+into the service mesh and track separately.
+
+Connecting to visibility dashboard UI
+-------------------------------------
+
+The **clover-controller** service comes packaged with a web-based UI with a visibility view.
+To access the dashboard, navigate to the **clover-controller** address for either a ``NodePort``
+or ``LoadBalancer`` service
+
+ * http://<node or CNI IP address>:<``NodePort`` port>/
+ * http://<``LoadBalancer`` IP address>/
+
+See :ref:`exposing_clover_controller` to expose **clover-controller** externally with a k8s
+service.
+
+Set runtime parameters using Clover CLI
+---------------------------------------
+
+The services visibility will track are based on the deployment/pod names specified in the k8s
+resources. Using some sample services from the SDC guide, the **proxy-access-control**,
+**clover-server1**, **clover-server2** and **clover-server3** services are specified in the
+``set_visibility.yaml`` sample yaml referenced below.
+
+To modify the configuration of the services visibility will track, use the **cloverctl CLI**,
+executing the following command::
+
+ cloverctl set visibility -f set_visibility.yaml
+
+Use the ``services:`` section of the yaml to configure service names to track.
+
+.. code-block:: bash
+
+ # set_visibility.yaml
+ services:
+ - name: proxy_access_control
+ - name: clover_server1
+ - name: clover_server2
+ - name: clover_server3
+ metric_prefixes:
+ - prefix: envoy_cluster_outbound_9180__
+ - prefix: envoy_cluster_inbound_9180__
+ metric_suffixes:
+ - suffix: _default_svc_cluster_local_upstream_rq_2xx
+ - suffix: _default_svc_cluster_local_upstream_cx_active
+ custom_metrics:
+ - metric: envoy_tracing_zipkin_spans_sent
+
+Set runtime parameters using dashboard UI
+-----------------------------------------
+
+The services being tracked by visibility can also be configured by selecting from the
+boxes under **Discovered Services** within the dashboard, as shown in the graphic below.
+Services can be multi-selected by using by holding the ``Ctrl`` or ``command`` (Mac OS)
+keyboard button down while selecting or unselecting. The SDC services that were configured from
+the **cloverctl** CLI above are currently active, denoted as the boxes with blue backgrounds.
+
+.. image:: imgs/visibility_discovered_active.png
+ :align: center
+ :scale: 100%
+
+In order for any services to be discovered from Jaeger tracing and displayed within the dashboard,
+some traffic must target the services of interest. Using curl/wget to send HTTP requests
+to your services will cause services to be discovered. Using Clover JMeter validation services,
+as detailed :ref:`jmeter_config_guide` against SDC sample services will also generate a service
+listing. The **cloverctl** CLI commands below will generate traces through the SDC service chain
+with the JMeter master injected into the service mesh::
+
+ $ cloverctl create testplan –f yaml/jmeter_testplan.yaml # yaml located with cloverctl binary
+ $ cloverctl start testplan
+
+Clearing visibility data
+-------------------------
+
+To clear visibility data in cassandra and redis, which truncates **cassandra** tables and
+deletes or zeros out **redis** keys, use the following command::
+
+ $ cloverctl clear visibility
+
+This can be useful when analyzing or observing an issue during a particular time horizon.
+The same function can be performed from the dashboard UI using the ``Clear`` button under
+``Visibility Controls``, as illustrated in the graphic from the previous section.
+
+Viewing visibility data
+========================
+
+The visibility dashboard can be used to view visibility data in real-time. The page will
+automatically refresh every 5 seconds. To disable continuous page refresh and freeze on a
+snapshot of the data, use the slider at the top of the page that defaults to ``On``. Toggling
+it will result in it displaying ``Off``.
+
+The visibility dashboard displays various metrics and graphs of analyzed data described in
+subsequent sections.
+
+System metrics
+--------------
+
+System metrics provide aggregate counts of cassandra tables including total traces, spans
+and metrics, as depicted on the left side of the graphic below.
+
+.. image:: imgs/visibility_system_counts_response_times.png
+ :align: center
+ :scale: 100%
+
+The metrics counter will continually increase, as it is based on time series data from
+Prometheus. The trace count will correspond to the number of HTTP requests sent to services
+within the Istio service mesh. The span count ties to trace count, as it is a child object
+under Jaeger tracing data hierarchy and is based on the service graph (number of interactions
+between microservices for a given request). It will increase more rapidly when service graph
+depths are larger.
+
+Per service response times
+--------------------------
+
+Per service response times are displayed on the right side of the graphic above and are
+calculated from tracing data when visibility is started. The minimum, maximum and average
+response times are output over the entire analysis period.
+
+Group by span field counts
+--------------------------
+
+This category groups schema fields in various combinations to gain insight into the composition
+of HTTP data and can be used by CI scripts to perform various validations. Metrics include:
+
+ * Per service
+ * Distinct URL
+ * Distinct URL / HTTP status code
+ * Distinct user-agent (HTTP header)
+ * Per service / distinct URL
+
+The dashboard displays bar/pie charts with counts and percentages, as depicted below. Each distinct
+key is displayed when hovering your mouse over a chart value.
+
+.. image:: imgs/visibility_distinct_counts.png
+ :align: center
+ :scale: 100%
+
+Distinct HTTP details
+---------------------
+
+A listing of distinct HTTP user-agents, request URLs and status codes is shown below divided
+with tabs.
+
+.. image:: imgs/visibility_distinct_http.png
+ :align: center
+ :scale: 100%
+
+
+Monitoring Metrics
+------------------
+
+The Istio sidecars (Envoy) provide a lengthy set of metrics exposed through Prometheus. These
+metrics can be analyzed with the visibility service by setting up metrics, as outlined in section
+`Set runtime parameters using Clover CLI`_. Use ``metric_prefixes`` and ``metric_suffixes``
+sections of the set visibility yaml for many Envoy metrics that have a key with the service
+straddled by a prefix/suffix. A row in the table and a graph will be displayed for each
+combination of service, prefix and suffix.
+
+The metrics are displayed in tabular and scatter plots over time formats from the dashboard, as
+shown in the graphic below:
+
+.. image:: imgs/visibility_monitoring_metrics.png
+ :align: center
+ :scale: 100%
+
+Uninstall from Kubernetes envionment
+====================================
+
+Delete with Clover CLI
+----------------------
+
+When you're finished working with Clover visibility services, you can uninstall them with the
+following command::
+
+ $ cloverctl delete system visibility
+
+The command above will remove the SDC sample services, Istio components and Jaeger/Prometheus
+tools from your Kubernetes environment.
+
+Uninstall from Docker environment
+=================================
+
+The OPNFV docker images can be removed with the following commands:
+
+.. code-block:: bash
+
+ $ docker rmi opnfv/clover-collector
+ $ docker rmi opnfv/clover-spark
+ $ docker rmi opnfv/clover-spark-submit
+ $ docker rmi opnfv/clover-controller
+ $ docker rmi k8s.gcr.io/redis
+ $ docker rmi kubernetes/redis
+ $ docker rmi cassandra:3