From 09999bbf633ec292630eba1e066ab9019408a42d Mon Sep 17 00:00:00 2001 From: adi0509 Date: Fri, 21 Aug 2020 23:12:57 +0530 Subject: LMA: Deployment of LMA solution. Docs for LMA deployment Signed-off-by: Adarsh Yadav Change-Id: Ib58bec806ce80c6927b40ddd490d612195bd6d70 --- docs/lma/logs/images/elasticsearch.png | Bin 0 -> 36046 bytes docs/lma/logs/images/fluentd-cs.png | Bin 0 -> 40226 bytes docs/lma/logs/images/fluentd-ss.png | Bin 0 -> 18331 bytes docs/lma/logs/images/nginx.png | Bin 0 -> 36737 bytes docs/lma/logs/images/setup.png | Bin 0 -> 43503 bytes docs/lma/logs/userguide.rst | 348 +++++++++++++++++++++++++++++++++ 6 files changed, 348 insertions(+) create mode 100644 docs/lma/logs/images/elasticsearch.png create mode 100644 docs/lma/logs/images/fluentd-cs.png create mode 100644 docs/lma/logs/images/fluentd-ss.png create mode 100644 docs/lma/logs/images/nginx.png create mode 100644 docs/lma/logs/images/setup.png create mode 100644 docs/lma/logs/userguide.rst (limited to 'docs/lma/logs') diff --git a/docs/lma/logs/images/elasticsearch.png b/docs/lma/logs/images/elasticsearch.png new file mode 100644 index 00000000..f0b876f5 Binary files /dev/null and b/docs/lma/logs/images/elasticsearch.png differ diff --git a/docs/lma/logs/images/fluentd-cs.png b/docs/lma/logs/images/fluentd-cs.png new file mode 100644 index 00000000..513bb3ef Binary files /dev/null and b/docs/lma/logs/images/fluentd-cs.png differ diff --git a/docs/lma/logs/images/fluentd-ss.png b/docs/lma/logs/images/fluentd-ss.png new file mode 100644 index 00000000..4e9ab112 Binary files /dev/null and b/docs/lma/logs/images/fluentd-ss.png differ diff --git a/docs/lma/logs/images/nginx.png b/docs/lma/logs/images/nginx.png new file mode 100644 index 00000000..a0b00514 Binary files /dev/null and b/docs/lma/logs/images/nginx.png differ diff --git a/docs/lma/logs/images/setup.png b/docs/lma/logs/images/setup.png new file mode 100644 index 00000000..267685fa Binary files /dev/null and b/docs/lma/logs/images/setup.png differ diff --git a/docs/lma/logs/userguide.rst b/docs/lma/logs/userguide.rst new file mode 100644 index 00000000..b410ee6c --- /dev/null +++ b/docs/lma/logs/userguide.rst @@ -0,0 +1,348 @@ +================= +Table of Contents +================= +.. contents:: +.. section-numbering:: + +Setup +====== + +Prerequisites +------------------------- +- Require 3 VMs to setup K8s +- ``$ sudo yum install ansible`` +- ``$ pip install openshift pyyaml kubernetes`` (required for ansible K8s module) +- Update IPs in all these files (if changed) + ====================================================================== ====================== + Path Description + ====================================================================== ====================== + ``ansible-server/group_vars/all.yml`` IP of K8s apiserver and VM hostname + ``ansible-server/hosts`` IP of VMs to install + ``ansible-server/roles/logging/files/persistentVolume.yaml`` IP of NFS-Server + ``ansible-server/roles/logging/files/elastalert/ealert-rule-cm.yaml`` IP of alert-receiver + ====================================================================== ====================== + +Architecture +-------------- +.. image:: images/setup.png + +Installation - Clientside +------------------------- + +Nodes +````` +- **Node1** = 10.10.120.21 +- **Node4** = 10.10.120.24 + +How installation is done? +````````````````````````` +- TD-agent installation + ``$ curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent3.sh | sh`` +- Copy the TD-agent config file in **Node1** + ``$ cp tdagent-client-config/node1.conf /etc/td-agent/td-agent.conf`` +- Copy the TD-agent config file in **Node4** + ``$ cp tdagent-client-config/node4.conf /etc/td-agent/td-agent.conf`` +- Restart the service + ``$ sudo service td-agent restart`` + +Installation - Serverside +------------------------- + +Nodes +````` +Inside Jumphost - POD12 + - **VM1** = 10.10.120.211 + - **VM2** = 10.10.120.203 + - **VM3** = 10.10.120.204 + + +How installation is done? +````````````````````````` +**Using Ansible:** + - **K8s** + - **Elasticsearch:** 1 Master & 1 Data node at each VM + - **Kibana:** 1 Replicas + - **Nginx:** 2 Replicas + - **Fluentd:** 2 Replicas + - **Elastalert:** 1 Replica (get duplicate alert, if increase replica) + - **NFS Server:** at each VM to store elasticsearch data at following path + - ``/srv/nfs/master`` + - ``/srv/nfs/data`` + +How to setup? +````````````` +- **To setup K8s cluster and EFK:** Run the ansible-playbook ``ansible/playbooks/setup.yaml`` +- **To clean everything:** Run the ansible-playbook ``ansible/playbooks/clean.yaml`` + +Do we have HA? +```````````````` +Yes + +Configuration +============= + +K8s +--- +Path of all yamls (Serverside) +```````````````````````````````` +``ansible-server/roles/logging/files/`` + +K8s namespace +````````````` +``logging`` + +K8s Service details +```````````````````` +``$ kubectl get svc -n logging`` + +Elasticsearch Configuration +--------------------------- + +Elasticsearch Setup Structure +````````````````````````````` +.. image:: images/elasticsearch.png + +Elasticsearch service details +````````````````````````````` +| **Service Name:** ``logging-es-http`` +| **Service Port:** ``9200`` +| **Service Type:** ``ClusterIP`` + +How to get elasticsearch default username & password? +````````````````````````````````````````````````````` +- User1 (custom user): + | **Username:** ``elasticsearch`` + | **Password:** ``password123`` +- User2 (by default created by Elastic Operator): + | **Username:** ``elastic`` + | To get default password: + | ``$ PASSWORD=$(kubectl get secret -n logging logging-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')`` + | ``$ echo $PASSWORD`` + +How to increase replica of any index? +```````````````````````````````````````` +| $ curl -k -u "elasticsearch:password123" -H 'Content-Type: application/json' -XPUT "https://10.10.120.211:9200/indexname*/_settings" -d ' +| { +| "index" : { +| "number_of_replicas" : "2" } +| }' + +Index Life +``````````` +**30 Days** + +Kibana Configuration +-------------------- + +Kibana Service details +```````````````````````` +| **Service Name:** ``logging-kb-http`` +| **Service Port:** ``5601`` +| **Service Type:** ``ClusterIP`` + +Nginx Configuration +-------------------- +IP +```` +https://10.10.120.211:32000 + +Nginx Setup Structure +````````````````````` +.. image:: images/nginx.png + +Ngnix Service details +````````````````````` +| **Service Name:** ``nginx`` +| **Service Port:** ``32000`` +| **Service Type:** ``NodePort`` + +Why NGINX is used? +``````````````````` +`Securing ELK using Nginx `_ + +Nginx Configuration +```````````````````` +**Path:** ``ansible-server/roles/logging/files/nginx/nginx-conf-cm.yaml`` + +Fluentd Configuration - Clientside (Td-agent) +--------------------------------------------- + +Fluentd Setup Structure +```````````````````````` +.. image:: images/fluentd-cs.png + +Log collection paths +````````````````````` +- ``/tmp/result*/*.log`` +- ``/tmp/result*/*.dat`` +- ``/tmp/result*/*.csv`` +- ``/tmp/result*/stc-liveresults.dat.*`` +- ``/var/log/userspace*.log`` +- ``/var/log/sriovdp/*.log.*`` +- ``/var/log/pods/**/*.log`` + +Logs sends to +````````````` +Another fluentd instance of K8s cluster (K8s Master: 10.10.120.211) at Jumphost. + +Td-agent logs +````````````` +Path of td-agent logs: ``/var/log/td-agent/td-agent.log`` + +Td-agent configuration +```````````````````````` +| Path of conf file: ``/etc/td-agent/td-agent.conf`` +| **If any changes is made in td-agent.conf then restart the td-agent service,** ``$ sudo service td-agent restart`` + +Config Description +```````````````````` +- Get the logs from collection path +- | Convert to this format + | { + | msg: "log line" + | log_path: “/file/path” + | file: “file.name” + | host: “pod12-node4” + | } +- Sends it to fluentd + +Fluentd Configuration - Serverside +---------------------------------- + +Fluentd Setup Structure +```````````````````````` +.. image:: images/fluentd-ss.png + +Fluentd Service details +```````````````````````` +| **Service Name:** ``fluentd`` +| **Service Port:** ``32224`` +| **Service Type:** ``NodePort`` + +Logs sends to +````````````` +Elasticsearch service (https://logging-es-http:9200) + +Config Description +```````````````````` +- **Step 1** + - Get the logs from Node1 & Node4 +- **Step 2** + ======================================== ====================== + log_path add tag (for routing) + ======================================== ====================== + ``/tmp/result.*/.*errors.dat`` errordat.log + ``/tmp/result.*/.*counts.dat`` countdat.log + ``/tmp/result.*/stc-liveresults.dat.tx`` stcdattx.log + ``/tmp/result.*/stc-liveresults.dat.rx`` stcdatrx.log + ``/tmp/result.*/.*Statistics.csv`` ixia.log + ``/tmp/result.*/vsperf-overall*`` vsperf.log + ``/tmp/result.*/vswitchd*`` vswitchd.log + ``/var/log/userspace*`` userspace.log + ``/var/log/sriovdp*`` sriovdp.log + ``/var/log/pods*`` pods.log + ======================================== ====================== + +- **Step 3** + Then parse each type using tags. + - error.conf: to find any error + - time-series.conf: to parse time series data + - time-analysis.conf: to calculate time analyasis +- **Step 4** + ================================ ====================== + host add tag (for routing) + ================================ ====================== + ``pod12-node4`` node4 + ``worker`` node1 + ================================ ====================== +- **Step 5** + ================================ ====================== + Tag elasticsearch + ================================ ====================== + ``node4`` index “node4*” + ``node1`` index “node1*” + ================================ ====================== + +Elastalert +---------- + +Send alert if +`````````````` +- Blacklist + - "Failed to run test" + - "Failed to execute in '30' seconds" + - "('Result', 'Failed')" + - "could not open socket: connection refused" + - "Input/output error" + - "dpdk|ERR|EAL: Error - exiting with code: 1" + - "Failed to execute in '30' seconds" + - "dpdk|ERR|EAL: Driver cannot attach the device" + - "dpdk|EMER|Cannot create lock on" + - "dpdk|ERR|VHOST_CONFIG: * device not found" +- Time + - vswitch_duration > 3 sec + +How to configure alert? +```````````````````````` +- Add your rule in ``ansible/roles/logging/files/elastalert/ealert-rule-cm.yaml`` (`Elastalert Rule Config `_) + | name: anything + | type: #The RuleType to use + | index: node4* #index name + | realert: + | minutes: 0 #to get alert for all cases after each interval + | alert: post #To send alert as HTTP POST + | http_post_url: "http://url" + +- Mount this file to elastalert pod in ``ansible/roles/logging/files/elastalert/elastalert.yaml``. + +Alert Format +```````````` +{"type": "pattern-match", "label": "failed", "index": "node4-20200815", "log": "error-log-line", "log-path": "/tmp/result/file.log", "reson": "error-message" } + +Data Management +=============== + +Elasticsearch +------------- + +Where data is stored now? +````````````````````````` +Data is stored in NFS server with 1 replica of each index (default). Path of data are following: + - ``/srv/nfs/data (VM1)`` + - ``/srv/nfs/data (VM2)`` + - ``/srv/nfs/data (VM3)`` + - ``/srv/nfs/master (VM1)`` + - ``/srv/nfs/master (VM2)`` + - ``/srv/nfs/master (VM3)`` +If user wants to change from NFS to local storage +`````````````````````````````````````````````````` +Yes, user can do this, need to configure persistent volume. (``ansible-server/roles/logging/files/persistentVolume.yaml``) + +Do we have backup of data? +```````````````````````````` +1 replica of each index + +When K8s restart, the data is still accessible? +````````````````````````````````````````````````````` +Yes (If data is not deleted from /srv/nfs/data) + +Troubleshooting +=============== +If no logs receiving in Elasticsearch +-------------------------------------- +- Check IP & port of server-fluentd in client config. +- Check client-fluentd logs, ``$ sudo tail -f /var/log/td-agent/td-agent.log`` +- Check server-fluentd logs, ``$ sudo kubectl logs -n logging `` + +If no notification received +--------------------------- +- Search your "log" in Elasticsearch. +- Check config of elastalert +- Check IP of alert-receiver + +Reference +========= +- `Elastic cloud on K8s `_ +- `HA Elasticsearch on K8s `_ +- `Fluentd Configuration `_ +- `Elastalert Rule Config `_ \ No newline at end of file -- cgit 1.2.3-korg