summaryrefslogtreecommitdiffstats
path: root/src/ceph/doc/dev/blkin.rst
diff options
context:
space:
mode:
Diffstat (limited to 'src/ceph/doc/dev/blkin.rst')
-rw-r--r--src/ceph/doc/dev/blkin.rst167
1 files changed, 167 insertions, 0 deletions
diff --git a/src/ceph/doc/dev/blkin.rst b/src/ceph/doc/dev/blkin.rst
new file mode 100644
index 0000000..8e0320f
--- /dev/null
+++ b/src/ceph/doc/dev/blkin.rst
@@ -0,0 +1,167 @@
+=========================
+ Tracing Ceph With BlkKin
+=========================
+
+Ceph can use Blkin, a library created by Marios Kogias and others,
+which enables tracking a specific request from the time it enters
+the system at higher levels till it is finally served by RADOS.
+
+In general, Blkin implements the Dapper_ tracing semantics
+in order to show the causal relationships between the different
+processing phases that an IO request may trigger. The goal is an
+end-to-end visualisation of the request's route in the system,
+accompanied by information concerning latencies in each processing
+phase. Thanks to LTTng this can happen with a minimal overhead and
+in realtime. The LTTng traces can then be visualized with Twitter's
+Zipkin_.
+
+.. _Dapper: http://static.googleusercontent.com/media/research.google.com/el//pubs/archive/36356.pdf
+.. _Zipkin: http://zipkin.io/
+
+
+Installing Blkin
+================
+
+You can install Markos Kogias' upstream Blkin_ by hand.::
+
+ cd blkin/
+ make && make install
+
+or build distribution packages using DistroReadyBlkin_, which also comes with
+pkgconfig support. If you choose the latter, then you must generate the
+configure and make files first.::
+
+ cd blkin
+ autoreconf -i
+
+.. _Blkin: https://github.com/marioskogias/blkin
+.. _DistroReadyBlkin: https://github.com/agshew/blkin
+
+
+Configuring Ceph with Blkin
+===========================
+
+If you built and installed Blkin by hand, rather than building and
+installing packages, then set these variables before configuring
+Ceph.::
+
+ export BLKIN_CFLAGS=-Iblkin/
+ export BLKIN_LIBS=-lzipkin-cpp
+
+Since there are separate lttng and blkin changes to Ceph, you may
+want to configure with something like::
+
+ ./configure --with-blkin --without-lttng --with-debug
+
+
+Testing Blkin
+=============
+
+It's easy to test Ceph's Blkin tracing. Let's assume you don't have
+Ceph already running, and you compiled Ceph with Blkin support but
+you did't install it. Then launch Ceph with the ``vstart.sh`` script
+in Ceph's src directgory so you can see the possible tracepoints.::
+
+ cd src
+ OSD=3 MON=3 RGW=1 ./vstart.sh -n
+ lttng list --userspace
+
+You'll see something like the following:::
+
+ UST events:
+ -------------
+ PID: 8987 - Name: ./ceph-osd
+ zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
+ zipkin:keyval (loglevel: TRACE_WARNING (4)) (type: tracepoint)
+ ust_baddr_statedump:soinfo (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
+
+ PID: 8407 - Name: ./ceph-mon
+ zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
+ zipkin:keyval (loglevel: TRACE_WARNING (4)) (type: tracepoint)
+ ust_baddr_statedump:soinfo (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
+
+ ...
+
+Next, stop Ceph so that the tracepoints can be enabled.::
+
+ ./stop.sh
+
+Start up an LTTng session and enable the tracepoints.::
+
+ lttng create blkin-test
+ lttng enable-event --userspace zipkin:timestamp
+ lttng enable-event --userspace zipkin:keyval
+ lttng start
+
+Then start up Ceph again.::
+
+ OSD=3 MON=3 RGW=1 ./vstart.sh -n
+
+You may want to check that ceph is up.::
+
+ ./ceph status
+
+Now put something in usin rados, check that it made it, get it back, and remove it.::
+
+ ./rados mkpool test-blkin
+ ./rados put test-object-1 ./vstart.sh --pool=test-blkin
+ ./rados -p test-blkin ls
+ ./ceph osd map test-blkin test-object-1
+ ./rados get test-object-1 ./vstart-copy.sh --pool=test-blkin
+ md5sum vstart*
+ ./rados rm test-object-1 --pool=test-blkin
+
+You could also use the example in ``examples/librados/`` or ``rados bench``.
+
+Then stop the LTTng session and see what was collected.::
+
+ lttng stop
+ lttng view
+
+You'll see something like:::
+
+ [13:09:07.755054973] (+?.?????????) scruffy zipkin:timestamp: { cpu_id = 5 }, { trace_name = "Main", service_name = "MOSDOp", port_no = 0, ip = "0.0.0.0", trace_id = 7492589359882233221, span_id = 2694140257089376129, parent_span_id = 0, event = "Message allocated" }
+ [13:09:07.755071569] (+0.000016596) scruffy zipkin:keyval: { cpu_id = 5 }, { trace_name = "Main", service_name = "MOSDOp", port_no = 0, ip = "0.0.0.0", trace_id = 7492589359882233221, span_id = 2694140257089376129, parent_span_id = 0, key = "Type", val = "MOSDOp" }
+ [13:09:07.755074217] (+0.000002648) scruffy zipkin:keyval: { cpu_id = 5 }, { trace_name = "Main", service_name = "MOSDOp", port_no = 0, ip = "0.0.0.0", trace_id = 7492589359882233221, span_id = 2694140257089376129, parent_span_id = 0, key = "Reqid", val = "client.4126.0:1" }
+ ...
+
+
+Install Zipkin
+===============
+One of the points of using Blkin is so that you can look at the traces
+using Zipkin. Users should run Zipkin as a tracepoints collector and
+also a web service, which means users need to run three services,
+zipkin-collector, zipkin-query and zipkin-web.
+
+Download Zipkin Package::
+
+ wget https://github.com/twitter/zipkin/archive/1.1.0.tar.gz
+ tar zxf 1.1.0.tar.gz
+ cd zipkin-1.1.0
+ bin/collector cassandra &
+ bin/query cassandra &
+ bin/web &
+
+Check Zipkin::
+
+ bin/test
+ Browse http://${zipkin-web-ip}:8080
+
+
+Show Ceph's Blkin Traces in Zipkin-web
+======================================
+Blkin provides a script which translates lttng result to Zipkin
+(Dapper) semantics.
+
+Send lttng data to Zipkin::
+
+ python3 babeltrace_zipkin.py ${lttng-traces-dir}/${blkin-test}/ust/uid/0/64-bit/ -p ${zipkin-collector-port(9410 by default)} -s ${zipkin-collector-ip}
+
+Example::
+
+ python3 babeltrace_zipkin.py ~/lttng-traces-dir/blkin-test-20150225-160222/ust/uid/0/64-bit/ -p 9410 -s 127.0.0.1
+
+Check Ceph traces on webpage::
+
+ Browse http://${zipkin-web-ip}:8080
+ Click "Find traces"