.. This work is licensed under a Creative Commons Attribution 4.0 International License.
.. http://creativecommons.org/licenses/by/4.0
.. (c) OPNFV, Intel Corporation, AT&T, Red Hat, Spirent, Ixia and others.
.. OPNFV VSPERF Documentation master file.
***************
Logs User Guide
***************
Prerequisites
=============
- Require 3 VMs to setup K8s
- ``$ sudo yum install ansible``
- ``$ pip install openshift pyyaml kubernetes`` (required for ansible K8s module)
- Update IPs in all these files (if changed)
====================================================================== ======================
Path Description
====================================================================== ======================
``ansible-server/group_vars/all.yml`` IP of K8s apiserver and VM hostname
``ansible-server/hosts`` IP of VMs to install
``ansible-server/roles/logging/files/persistentVolume.yaml`` IP of NFS-Server
``ansible-server/roles/logging/files/elastalert/ealert-rule-cm.yaml`` IP of alert-receiver
====================================================================== ======================
Architecture
============
.. image:: images/setup.png
Installation - Clientside
=========================
Nodes
-----
- **Node1** = 10.10.120.21
- **Node4** = 10.10.120.24
How installation is done?
-------------------------
- TD-agent installation
``$ curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent3.sh | sh``
- Copy the TD-agent config file in **Node1**
``$ cp tdagent-client-config/node1.conf /etc/td-agent/td-agent.conf``
- Copy the TD-agent config file in **Node4**
``$ cp tdagent-client-config/node4.conf /etc/td-agent/td-agent.conf``
- Restart the service
``$ sudo service td-agent restart``
Installation - Serverside
=========================
Nodes
-----
Inside Jumphost - POD12
- **VM1** = 10.10.120.211
- **VM2** = 10.10.120.203
- **VM3** = 10.10.120.204
How installation is done?
-------------------------
**Using Ansible:**
- **K8s**
- **Elasticsearch:** 1 Master & 1 Data node at each VM
- **Kibana:** 1 Replicas
- **Nginx:** 2 Replicas
- **Fluentd:** 2 Replicas
- **Elastalert:** 1 Replica (get duplicate alert, if increase replica)
- **NFS Server:** at each VM to store elasticsearch data at following path
- ``/srv/nfs/master``
- ``/srv/nfs/data``
How to setup?
-------------
- **To setup K8s cluster and EFK:** Run the ansible-playbook ``ansible/playbooks/setup.yaml``
- **To clean everything:** Run the ansible-playbook ``ansible/playbooks/clean.yaml``
Do we have HA?
--------------
Yes
Configuration
=============
K8s
---
Path of all yamls (Serverside)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``ansible-server/roles/logging/files/``
K8s namespace
^^^^^^^^^^^^^
``logging``
K8s Service details
^^^^^^^^^^^^^^^^^^^
``$ kubectl get svc -n logging``
Elasticsearch Configuration
---------------------------
Elasticsearch Setup Structure
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. image:: images/elasticsearch.png
Elasticsearch service details
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| **Service Name:** ``logging-es-http``
| **Service Port:** ``9200``
| **Service Type:** ``ClusterIP``
How to get elasticsearch default username & password?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- User1 (custom user):
| **Username:** ``elasticsearch``
| **Password:** ``password123``
- User2 (by default created by Elastic Operator):
| **Username:** ``elastic``
| To get default password:
| ``$ PASSWORD=$(kubectl get secret -n logging logging-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')``
| ``$ echo $PASSWORD``
How to increase replica of any index?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| $ curl -k -u "elasticsearch:password123" -H 'Content-Type: application/json' -XPUT "https://10.10.120.211:9200/indexname*/_settings" -d '
| {
| "index" : {
| "number_of_replicas" : "2" }
| }'
Index Life
^^^^^^^^^^
**30 Days**
Kibana Configuration
--------------------
Kibana Service details
^^^^^^^^^^^^^^^^^^^^^^
| **Service Name:** ``logging-kb-http``
| **Service Port:** ``5601``
| **Service Type:** ``ClusterIP``
Nginx Configuration
-------------------
IP
^^
The IP address with https. Ex: "10.10.120.211:32000"
Nginx Setup Structure
^^^^^^^^^^^^^^^^^^^^^
.. image:: images/nginx.png
Ngnix Service details
^^^^^^^^^^^^^^^^^^^^^
| **Service Name:** ``nginx``
| **Service Port:** ``32000``
| **Service Type:** ``NodePort``
Why NGINX is used?
^^^^^^^^^^^^^^^^^^
`Securing ELK using Nginx `_
Nginx Configuration
^^^^^^^^^^^^^^^^^^^
**Path:** ``ansible-server/roles/logging/files/nginx/nginx-conf-cm.yaml``
Fluentd Configuration - Clientside (Td-agent)
---------------------------------------------
Fluentd Setup Structure
^^^^^^^^^^^^^^^^^^^^^^^
.. image:: images/fluentd-cs.png
Log collection paths
^^^^^^^^^^^^^^^^^^^^
- ``/tmp/result*/*.log``
- ``/tmp/result*/*.dat``
- ``/tmp/result*/*.csv``
- ``/tmp/result*/stc-liveresults.dat.*``
- ``/var/log/userspace*.log``
- ``/var/log/sriovdp/*.log.*``
- ``/var/log/pods/**/*.log``
Logs sent to
^^^^^^^^^^^^
Another fluentd instance of K8s cluster (K8s Master: 10.10.120.211) at Jumphost.
Td-agent logs
^^^^^^^^^^^^^
Path of td-agent logs: ``/var/log/td-agent/td-agent.log``
Td-agent configuration
^^^^^^^^^^^^^^^^^^^^^^
| Path of conf file: ``/etc/td-agent/td-agent.conf``
| **If any changes is made in td-agent.conf then restart the td-agent service,** ``$ sudo service td-agent restart``
Config Description
^^^^^^^^^^^^^^^^^^
- Get the logs from collection path
- | Convert to this format
| {
| msg: "log line"
| log_path: “/file/path”
| file: “file.name”
| host: “pod12-node4”
| }
- Sends it to fluentd
Fluentd Configuration - Serverside
----------------------------------
Fluentd Setup Structure
^^^^^^^^^^^^^^^^^^^^^^^
.. image:: images/fluentd-ss.png
Fluentd Service details
^^^^^^^^^^^^^^^^^^^^^^^
| **Service Name:** ``fluentd``
| **Service Port:** ``32224``
| **Service Type:** ``NodePort``
Logs sent to
^^^^^^^^^^^^
Elasticsearch service (Example: logging-es-http at port 9200)
Config Description
^^^^^^^^^^^^^^^^^^
- **Step 1**
- Get the logs from Node1 & Node4
- **Step 2**
======================================== ======================
log_path add tag (for routing)
======================================== ======================
``/tmp/result.*/.*errors.dat`` errordat.log
``/tmp/result.*/.*counts.dat`` countdat.log
``/tmp/result.*/stc-liveresults.dat.tx`` stcdattx.log
``/tmp/result.*/stc-liveresults.dat.rx`` stcdatrx.log
``/tmp/result.*/.*Statistics.csv`` ixia.log
``/tmp/result.*/vsperf-overall*`` vsperf.log
``/tmp/result.*/vswitchd*`` vswitchd.log
``/var/log/userspace*`` userspace.log
``/var/log/sriovdp*`` sriovdp.log
``/var/log/pods*`` pods.log
======================================== ======================
- **Step 3**
Then parse each type using tags.
- error.conf: to find any error
- time-series.conf: to parse time series data
- time-analysis.conf: to calculate time analyasis
- **Step 4**
================================ ======================
host add tag (for routing)
================================ ======================
``pod12-node4`` node4
``worker`` node1
================================ ======================
- **Step 5**
================================ ======================
Tag elasticsearch
================================ ======================
``node4`` index “node4*”
``node1`` index “node1*”
================================ ======================
Elastalert
==========
Send alert if
-------------
- Blacklist
- "Failed to run test"
- "Failed to execute in '30' seconds"
- "('Result', 'Failed')"
- "could not open socket: connection refused"
- "Input/output error"
- "dpdk|ERR|EAL: Error - exiting with code: 1"
- "Failed to execute in '30' seconds"
- "dpdk|ERR|EAL: Driver cannot attach the device"
- "dpdk|EMER|Cannot create lock on"
- "dpdk|ERR|VHOST_CONFIG: * device not found"
- Time
- vswitch_duration > 3 sec
How to configure alert?
-----------------------
- Add your rule in ``ansible/roles/logging/files/elastalert/ealert-rule-cm.yaml`` (`Elastalert Rule Config `_)
| name: anything
| type: #The RuleType to use
| index: node4* #index name
| realert:
| minutes: 0 #to get alert for all cases after each interval
| alert: post #To send alert as HTTP POST
| http_post_url: # Provide URL
- Mount this file to elastalert pod in ``ansible/roles/logging/files/elastalert/elastalert.yaml``.
Alert Format
------------
{"type": "pattern-match", "label": "failed", "index": "node4-20200815", "log": "error-log-line", "log-path": "/tmp/result/file.log", "reson": "error-message" }
Data Management
===============
Elasticsearch
-------------
Q&As
^^^^
Where data is stored now?
Data is stored in NFS server with 1 replica of each index (default). Path of data are following:
- ``/srv/nfs/data (VM1)``
- ``/srv/nfs/data (VM2)``
- ``/srv/nfs/data (VM3)``
- ``/srv/nfs/master (VM1)``
- ``/srv/nfs/master (VM2)``
- ``/srv/nfs/master (VM3)``
If user wants to change from NFS to local storage, can he do it?
Yes, user can do this, need to configure persistent volume. (``ansible-server/roles/logging/files/persistentVolume.yaml``)
Do we have backup of data?
Yes. 1 replica of each index
When K8s restart, the data is still accessible?
Yes (If data is not deleted from /srv/nfs/data)
Troubleshooting
===============
If no logs receiving in Elasticsearch
-------------------------------------
- Check IP & port of server-fluentd in client config.
- Check client-fluentd logs, ``$ sudo tail -f /var/log/td-agent/td-agent.log``
- Check server-fluentd logs, ``$ sudo kubectl logs -n logging ``
If no notification received
---------------------------
- Search your "log" in Elasticsearch.
- Check config of elastalert
- Check IP of alert-receiver
Reference
=========
- `Elastic cloud on K8s `_
- `HA Elasticsearch on K8s `_
- `Fluentd Configuration `_
- `Elastalert Rule Config `_