=================
Table of Contents
=================
.. contents::
.. section-numbering::
Setup
======
Prerequisites
-------------------------
- Require 3 VMs to setup K8s
- ``$ sudo yum install ansible``
- ``$ pip install openshift pyyaml kubernetes`` (required for ansible K8s module)
- Update IPs in all these files (if changed)
====================================================================== ======================
Path Description
====================================================================== ======================
``ansible-server/group_vars/all.yml`` IP of K8s apiserver and VM hostname
``ansible-server/hosts`` IP of VMs to install
``ansible-server/roles/logging/files/persistentVolume.yaml`` IP of NFS-Server
``ansible-server/roles/logging/files/elastalert/ealert-rule-cm.yaml`` IP of alert-receiver
====================================================================== ======================
Architecture
--------------
.. image:: images/setup.png
Installation - Clientside
-------------------------
Nodes
`````
- **Node1** = 10.10.120.21
- **Node4** = 10.10.120.24
How installation is done?
`````````````````````````
- TD-agent installation
``$ curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent3.sh | sh``
- Copy the TD-agent config file in **Node1**
``$ cp tdagent-client-config/node1.conf /etc/td-agent/td-agent.conf``
- Copy the TD-agent config file in **Node4**
``$ cp tdagent-client-config/node4.conf /etc/td-agent/td-agent.conf``
- Restart the service
``$ sudo service td-agent restart``
Installation - Serverside
-------------------------
Nodes
`````
Inside Jumphost - POD12
- **VM1** = 10.10.120.211
- **VM2** = 10.10.120.203
- **VM3** = 10.10.120.204
How installation is done?
`````````````````````````
**Using Ansible:**
- **K8s**
- **Elasticsearch:** 1 Master & 1 Data node at each VM
- **Kibana:** 1 Replicas
- **Nginx:** 2 Replicas
- **Fluentd:** 2 Replicas
- **Elastalert:** 1 Replica (get duplicate alert, if increase replica)
- **NFS Server:** at each VM to store elasticsearch data at following path
- ``/srv/nfs/master``
- ``/srv/nfs/data``
How to setup?
`````````````
- **To setup K8s cluster and EFK:** Run the ansible-playbook ``ansible/playbooks/setup.yaml``
- **To clean everything:** Run the ansible-playbook ``ansible/playbooks/clean.yaml``
Do we have HA?
````````````````
Yes
Configuration
=============
K8s
---
Path of all yamls (Serverside)
````````````````````````````````
``ansible-server/roles/logging/files/``
K8s namespace
`````````````
``logging``
K8s Service details
````````````````````
``$ kubectl get svc -n logging``
Elasticsearch Configuration
---------------------------
Elasticsearch Setup Structure
`````````````````````````````
.. image:: images/elasticsearch.png
Elasticsearch service details
`````````````````````````````
| **Service Name:** ``logging-es-http``
| **Service Port:** ``9200``
| **Service Type:** ``ClusterIP``
How to get elasticsearch default username & password?
`````````````````````````````````````````````````````
- User1 (custom user):
| **Username:** ``elasticsearch``
| **Password:** ``password123``
- User2 (by default created by Elastic Operator):
| **Username:** ``elastic``
| To get default password:
| ``$ PASSWORD=$(kubectl get secret -n logging logging-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')``
| ``$ echo $PASSWORD``
How to increase replica of any index?
````````````````````````````````````````
| $ curl -k -u "elasticsearch:password123" -H 'Content-Type: application/json' -XPUT "https://10.10.120.211:9200/indexname*/_settings" -d '
| {
| "index" : {
| "number_of_replicas" : "2" }
| }'
Index Life
```````````
**30 Days**
Kibana Configuration
--------------------
Kibana Service details
````````````````````````
| **Service Name:** ``logging-kb-http``
| **Service Port:** ``5601``
| **Service Type:** ``ClusterIP``
Nginx Configuration
--------------------
IP
````
https://10.10.120.211:32000
Nginx Setup Structure
`````````````````````
.. image:: images/nginx.png
Ngnix Service details
`````````````````````
| **Service Name:** ``nginx``
| **Service Port:** ``32000``
| **Service Type:** ``NodePort``
Why NGINX is used?
```````````````````
`Securing ELK using Nginx `_
Nginx Configuration
````````````````````
**Path:** ``ansible-server/roles/logging/files/nginx/nginx-conf-cm.yaml``
Fluentd Configuration - Clientside (Td-agent)
---------------------------------------------
Fluentd Setup Structure
````````````````````````
.. image:: images/fluentd-cs.png
Log collection paths
`````````````````````
- ``/tmp/result*/*.log``
- ``/tmp/result*/*.dat``
- ``/tmp/result*/*.csv``
- ``/tmp/result*/stc-liveresults.dat.*``
- ``/var/log/userspace*.log``
- ``/var/log/sriovdp/*.log.*``
- ``/var/log/pods/**/*.log``
Logs sends to
`````````````
Another fluentd instance of K8s cluster (K8s Master: 10.10.120.211) at Jumphost.
Td-agent logs
`````````````
Path of td-agent logs: ``/var/log/td-agent/td-agent.log``
Td-agent configuration
````````````````````````
| Path of conf file: ``/etc/td-agent/td-agent.conf``
| **If any changes is made in td-agent.conf then restart the td-agent service,** ``$ sudo service td-agent restart``
Config Description
````````````````````
- Get the logs from collection path
- | Convert to this format
| {
| msg: "log line"
| log_path: “/file/path”
| file: “file.name”
| host: “pod12-node4”
| }
- Sends it to fluentd
Fluentd Configuration - Serverside
----------------------------------
Fluentd Setup Structure
````````````````````````
.. image:: images/fluentd-ss.png
Fluentd Service details
````````````````````````
| **Service Name:** ``fluentd``
| **Service Port:** ``32224``
| **Service Type:** ``NodePort``
Logs sends to
`````````````
Elasticsearch service (https://logging-es-http:9200)
Config Description
````````````````````
- **Step 1**
- Get the logs from Node1 & Node4
- **Step 2**
======================================== ======================
log_path add tag (for routing)
======================================== ======================
``/tmp/result.*/.*errors.dat`` errordat.log
``/tmp/result.*/.*counts.dat`` countdat.log
``/tmp/result.*/stc-liveresults.dat.tx`` stcdattx.log
``/tmp/result.*/stc-liveresults.dat.rx`` stcdatrx.log
``/tmp/result.*/.*Statistics.csv`` ixia.log
``/tmp/result.*/vsperf-overall*`` vsperf.log
``/tmp/result.*/vswitchd*`` vswitchd.log
``/var/log/userspace*`` userspace.log
``/var/log/sriovdp*`` sriovdp.log
``/var/log/pods*`` pods.log
======================================== ======================
- **Step 3**
Then parse each type using tags.
- error.conf: to find any error
- time-series.conf: to parse time series data
- time-analysis.conf: to calculate time analyasis
- **Step 4**
================================ ======================
host add tag (for routing)
================================ ======================
``pod12-node4`` node4
``worker`` node1
================================ ======================
- **Step 5**
================================ ======================
Tag elasticsearch
================================ ======================
``node4`` index “node4*”
``node1`` index “node1*”
================================ ======================
Elastalert
----------
Send alert if
``````````````
- Blacklist
- "Failed to run test"
- "Failed to execute in '30' seconds"
- "('Result', 'Failed')"
- "could not open socket: connection refused"
- "Input/output error"
- "dpdk|ERR|EAL: Error - exiting with code: 1"
- "Failed to execute in '30' seconds"
- "dpdk|ERR|EAL: Driver cannot attach the device"
- "dpdk|EMER|Cannot create lock on"
- "dpdk|ERR|VHOST_CONFIG: * device not found"
- Time
- vswitch_duration > 3 sec
How to configure alert?
````````````````````````
- Add your rule in ``ansible/roles/logging/files/elastalert/ealert-rule-cm.yaml`` (`Elastalert Rule Config `_)
| name: anything
| type: #The RuleType to use
| index: node4* #index name
| realert:
| minutes: 0 #to get alert for all cases after each interval
| alert: post #To send alert as HTTP POST
| http_post_url: "http://url"
- Mount this file to elastalert pod in ``ansible/roles/logging/files/elastalert/elastalert.yaml``.
Alert Format
````````````
{"type": "pattern-match", "label": "failed", "index": "node4-20200815", "log": "error-log-line", "log-path": "/tmp/result/file.log", "reson": "error-message" }
Data Management
===============
Elasticsearch
-------------
Where data is stored now?
`````````````````````````
Data is stored in NFS server with 1 replica of each index (default). Path of data are following:
- ``/srv/nfs/data (VM1)``
- ``/srv/nfs/data (VM2)``
- ``/srv/nfs/data (VM3)``
- ``/srv/nfs/master (VM1)``
- ``/srv/nfs/master (VM2)``
- ``/srv/nfs/master (VM3)``
If user wants to change from NFS to local storage
``````````````````````````````````````````````````
Yes, user can do this, need to configure persistent volume. (``ansible-server/roles/logging/files/persistentVolume.yaml``)
Do we have backup of data?
````````````````````````````
1 replica of each index
When K8s restart, the data is still accessible?
`````````````````````````````````````````````````````
Yes (If data is not deleted from /srv/nfs/data)
Troubleshooting
===============
If no logs receiving in Elasticsearch
--------------------------------------
- Check IP & port of server-fluentd in client config.
- Check client-fluentd logs, ``$ sudo tail -f /var/log/td-agent/td-agent.log``
- Check server-fluentd logs, ``$ sudo kubectl logs -n logging ``
If no notification received
---------------------------
- Search your "log" in Elasticsearch.
- Check config of elastalert
- Check IP of alert-receiver
Reference
=========
- `Elastic cloud on K8s `_
- `HA Elasticsearch on K8s `_
- `Fluentd Configuration `_
- `Elastalert Rule Config `_