summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authormbeierl <mark.beierl@dell.com>2018-10-12 13:42:57 -0400
committermbeierl <mark.beierl@dell.com>2018-10-16 15:28:42 -0400
commit0d2be9eba7abec75d8fb3115fd2ab748ce6ffbc9 (patch)
tree0ee45d107ab46314858571acccef4efb62993ee3
parent4aba838a07f5cd7dbd6d606c34f688e647a5d890 (diff)
Additional documentation
Change-Id: I9b176794206e39db436d9597d976c42b7e9d22cf Signed-off-by: mbeierl <mark.beierl@dell.com>
-rw-r--r--docs/testing/user/introduction.rst196
1 files changed, 190 insertions, 6 deletions
diff --git a/docs/testing/user/introduction.rst b/docs/testing/user/introduction.rst
index 49e3220..0099c39 100644
--- a/docs/testing/user/introduction.rst
+++ b/docs/testing/user/introduction.rst
@@ -25,13 +25,13 @@ performance metrics in the shortest reasonable time.
How Does StorPerf Work?
=======================
-Once launched, StorPerf presents you with a ReST interface, along with a
+Once launched, StorPerf presents a ReST interface, along with a
`Swagger UI <https://swagger.io/swagger-ui/>`_ that makes it easier to
form HTTP ReST requests. Issuing an HTTP POST to the configurations API
-causes StorPerf to talk to your OpenStack's heat service to create a new stack
-with as many agent VMs and attached Cinder volumes as you specify.
+causes StorPerf to talk to OpenStack's heat service to create a new stack
+with as many agent VMs and attached Cinder volumes as specified.
-After the stack is created, you can issue one or more jobs by issuing a POST
+After the stack is created, we can issue one or more jobs by issuing a POST
to the jobs ReST API. The job is the smallest unit of work that StorPerf
can use to measure the disk's performance.
@@ -45,8 +45,187 @@ measured start to "flat line" and stay within that range for the specified
amount of time, then the metrics are considered to be indicative of a
repeatable level of performance.
-What Data Can I Get?
-====================
+StorPerf Testing Guidelines
+===========================
+
+First of all, StorPerf is not able to give pointers on how to tune a
+Cinder implementation, as there are far too many backends (Ceph, NFS, LVM,
+etc), each with their own methods of tuning. StorPerf is here to assist in
+getting a reliable performance measurement by encoding the test
+specification from SNIA, and helping present the results in a way that makes
+sense.
+
+Having said that, there are some general guidelines that we can present to
+assist with planning a performance test.
+
+Workload Modelling
+------------------
+
+This is an important item to address as there are many parameters to how
+data is accessed. Databases typically use a fixed block size and tend to
+manage their data so that sequential access is more likely. GPS image tiles
+can be around 20-60kb and will be accessed by reading the file in full, with
+no easy way to predict what tiles will be needed next. Some programs are
+able to submit I/O asynchronously where others need to have different threads
+and may be synchronous. There is no one size fits all here, so knowing what
+type of I/O pattern we need to model is critical to getting realistic
+measurements.
+
+System Under Test
+-----------------
+
+The unfortunate part is that StorPerf does not have any knowledge about the
+underlying OpenStack itself – we can only see what is available through
+OpenStack APIs, and none of them provide details about the underlying
+storage implementation. As the test executor, we need to know
+information such as: the number of disks or storage nodes; the amount of RAM
+available for caching; the type of connection to the storage and bandwidth
+available.
+
+Measure Storage, not Cache
+--------------------------
+
+As part of the test data size, we need to ensure that we prevent
+caching from interfering in the measurements. The total size of the data
+set in the test must exceed the total size of all the disk cache memory
+available by a certain amount in order to ensure we are forcing non-cached
+I/O. There is no exact science here, but if we balance test duration against
+cache hit ratio, it can be argued that 20% cache hit is good enough and
+increasing file size would result in diminishing returns. Let’s break this
+number down a bit. Given a cache size of 10GB, we could write, then read the
+following dataset sizes:
+
+* 10GB gives 100% cache hit
+* 20GB gives 50% cache hit
+* 50GB gives 20% cache hit
+* 100GB gives 10% cache hit
+
+This means that for the first test, 100% of the results are unreliable due to
+cache. At 50GB, the true performance without cache has only a 20% margin of
+error. Given the fact that the 100GB would take twice as long, and that we
+are only reducing the margin of error by 10%, we recommend this as the best
+tradeoff.
+
+How much cache do we actually have? This depends on the storage device being
+used. For hardware NAS or other arrays, it should be fairly easy to get the
+number from the manufacturer, but for software defined storage, it can be
+harder to determine. Let’s take Ceph as an example. Ceph runs as software
+on the bare metal server and therefore has access to all the RAM available on
+the server to use as its cache. Well, not exactly all the memory. We have
+to take into account the memory consumed by the operating system, by the Ceph
+processes, as well as any other processes running on the same system. In the
+case of hyper-converged Ceph, where workload VMs and Ceph run on the systems,
+it can become quite difficult to predict. Ultimately, the amount of memory
+that is left over is the cache for that single Ceph instance. We now need to
+add the memory available from all the other Ceph storage nodes in the
+environment. Time for another example: given 3 Ceph storage nodes with
+256GB RAM each. Let’s take 20% off to pin to the OS and other processes,
+leaving approximately 240GB per node This gives us 3 x 240 or 720GB total RAM
+available for cache. The total amount of data we want to write in order to
+initialize our Cinder volumes would then be 5 x 720, or 3,600 GB. The
+following illustrates some ways to allocate the data:
+
+* 1 VM with 1 3,600 GB volume
+* 10 VMs each with 1 360 GB volume
+* 2 VMs each with 5 360 GB volumes
+
+Back to Modelling
+-----------------
+
+Now that we know there is 3.6 TB of data to be written, we need to go back to
+the workload model to determine how we are going to write it. Factors to
+consider:
+
+* Number of Volumes. We might be simulating a single database of 3.6 TB, so
+ only 1 Cinder volume is needed to represent this. Or, we might be
+ simulating a web server farm where there are hundreds of processes
+ accessing many different volumes. In this case, we divide the 3.6 TB by
+ the number of volumes, making each volume smaller.
+* Number of Virtual Machines. We might have one monster VM that will drive
+ all our I/O in the system, or maybe there are hundreds of VMs, each with
+ their own individual volume. Using Ceph as an example again, we know that
+ it allows for a single VM to consume all the Ceph resources, which can be
+ perceived as a problem in terms of multi-tenancy and scaling. A common
+ practice to mitigate this is to use Cinder to throttle IOPS at the VM
+ level. If this technique is being used in the environment under test, we
+ must adjust the number of VMs used in the test accordingly.
+* Block Size. We need to know if the application is managing the volume as a
+ raw device (ie: /dev/vdb) or as a filesystem mounted over the device.
+ Different filesystems have their own block sizes: ext4 only allows 1024,
+ 2048 or 4096 as the block size. Typically the larger the block, the better
+ the throughput, however as blocks must be written as an atomic unit, larger
+ block sizes can also reduce effective throughput by having to pad the block
+ if the content is smaller than the actual block size.
+* I/O Depth. This represents the amount of I/O that the application can
+ issue simultaneously. In a multi-threaded app, or one that uses
+ asynchronous I/O, it is possible to have multiple read or write requests
+ outstanding at the same time. For example, with software defined storage
+ where there is an Ethernet network between the client and the storage,
+ the storage would have a higher latency for each I/O, but is capable of
+ accepting many requests in parallel. With an I/O depth of 1, we spend
+ time waiting for the network latency before a response comes back. With
+ higher I/O depth, we can get more throughput despite each I/O having higher
+ latency. Typically, we do not see applications that would go beyond a
+ queue depth of 8, however this is not a firm rule.
+* Data Access Pattern. We need to know if the application typically reads
+ data sequentially or randomly, as well as what the mixture of read vs.
+ write is. It is possible to measure read by itself, or write by itself,
+ but this is not typical behavior for applications. It is useful for
+ determining the potential maximum throughput of a given type of operation.
+
+Fastest Path to Results
+-----------------------
+
+Once we have the information gathered, we can now start executing some tests.
+Let’s take some of the points discussed above and describe our system:
+
+* OpenStack deployment with 3 Control nodes, 5 Compute nodes and 3 dedicated
+ Ceph storage nodes.
+* Ceph nodes each have 240 GB RAM available to be used as cache.
+* Our application writes directly to the raw device (/dev/vdb)
+* There will be 10 instances of the application running, each with its own
+ volume.
+* Our application can use block sizes of 4k or 64k.
+* Our application is capable of maintaining up to 6 I/O operations
+ simultaneously.
+
+The first thing we know is that we want to keep our cache hit ratio around
+20%, so we will be moving 3,600 GB of data. We also know this will take a
+significant amount of time, so here is where StorPerf helps.
+
+First, we use the configurations API to launch our 10 virtual machines each
+with a 360 GB volume. Next comes the most time consuming part: we call the
+initializations API to fill each one of these volumes with random data. By
+preloading the data, we ensure a number of things:
+
+* The storage device has had to fully allocate all of the space for our
+ volumes. This is especially important for software defined storage like
+ Ceph, which is smart enough to know if data is being read from a block that
+ has never been written. No data on disk means no disk read is needed and
+ the response is immediate.
+* The RAM cache has been overrun multiple times. Only 20% of what was
+ written can possibly remain in cache.
+
+This last part is important as we can now use StorPerf’s implementation of
+SNIA’s steady state algorithm to ensure our follow up tests execute as
+quickly as possible. Given the fact that 80% of the data in any given test
+results in a cache miss, we can run multiple tests in a row without having
+to re-initialize or invalidate the cache again in between test runs. We can
+also mix and match the types of workloads to be run in a single performance
+job submission.
+
+Now we can submit a job to the jobs API to execute a 70%/30% mix of
+read/write, with a block size of 4k and an I/O queue depth of 6. This job
+will run until either the maximum time has expired, or until StorPerf detects
+steady state has been reached, at which point it will immediately complete
+and report the results of the measurements.
+
+StorPerf uses FIO as its workload engine, so whatever workload parameters we
+would like to use with FIO can be passed directly through via StorPerf’s jobs
+API.
+
+What Data Can We Get?
+=====================
StorPerf provides the following metrics:
@@ -57,4 +236,9 @@ StorPerf provides the following metrics:
These metrics are available for every job, and for the specific workloads,
I/O loads and I/O types (read, write) associated with the job.
+For each metric, StorPerf also provides the set of samples that were
+collected along with the slope, min and max values that can be used for
+plotting or comparison.
+
As of this time, StorPerf only provides textual reports of the metrics.
+