White Papers

Performance Test: FlashSoft 3.7 for VMware vSphere 5.5

This paper describes the application performance and VM density gains achieved in a virtualized computing environment using FlashSoft software as write-through cache with synthetic workload (fio.exe).

Introduction

The objective of this paper is to present the performance and VM density gains that can be achieved in a virtualized computing environment with the application of a host-based solid-state storage cache enabled by the FlashSoft® software from SanDisk®. VMware vSphere® 5.5 was installed on a host server configured with multiple virtual machines (VMs) and FlashSoft software was used to provide a host-based write-through cache to accelerate workloads running on the VMs. Using a synthetic workload generated and measured by the benchmark program fio.exe, application performance and VM density of the accelerated VMs were compared to baseline configurations comprising all-HDD storage without caching.

 

System Under Test

The test server was configured as described below:

Server Hardware

  • Server: Dell™ PowerEdge™ R730xd
  • CPU: 2x Intel® Xeon® E5-2690 @ 2.6GHz, 24 cores total
  • System Memory (RAM): 192GB
  • Storage Controller: Dell PERC H730P Mini RAID (embedded); cache size: 2,048MB
  • HDD (Target): 8x (RAID5) 600GB SAS 6Gb/s 10K.6 RPM, Seagate ST600MM0006
  • HDD (OS): 2x (RAID1) 600GB SAS 10K.6 RPM, Seagate ST600MM0006
  • Flash Memory (Cache): 1x ioDrive®2 1.2TB, SanDisk F00-001-1T20-CS-0001

 

Software

  • System Software
  • Operating System
  • VMware ESXi 5.5.0 (VMkernel Release Build 2068190)
  • ESXi 5.5.0 Update 2
  • Virtual Machines
  • CentOS 6.5 Linux (up to 8 VMs were created)
  • Benchmark Program)
  • fio.exe (version 2.1.14)
  • Caching Software
  • FlashSoft for VMware vSphere (driver version: 3.7.0.61608)

 

Storage

RAID Controller: The following settings were used for all storage for the tests:

  • Strip Size: 64KB
  • Access Policy: Read Write
  • Disk Cache Policy: Default
  • Read Policy: Always Read Ahead
  • Write Policy: Always Write Back (although the RAID controller contains some onboard cache, its small size relative to storage and the external cache did not adversely affect test performance)
  • Patrol Read Mode: Auto

 

OS HDD:
Two 600GB SAS 10.6K RPM HDDs were configured as a 558GB RAID1 disk and contained the ESXi hypervisor.

Target HDD:
Eight 600GB SAS 10.6K RPM HDDs were configured as a 3.9TB RAID5 disk and provided the storage backend for all VMs used in the test.

Cache SSD:
A single 1.2TB PCIe Flash memory device was enabled for caching by FlashSoft software.

 

Benchmark Tests

Benchmark tests were conducted multiple times to measure and compare performance of the non-accelerated server running over the all-HDD storage backend (the “baseline configuration”) and the same server running over the same HDD backend, but accelerated using FlashSoft software to drive the flash memory as a server-tier write-through cache (the “accelerated configuration”).

The following testing procedures were used:

  • Install and configure the ESXi hypervisor.
  • Configure the storage to be tested.
  • Install and configure flash memory (SSD) and FlashSoft software.
  • Create the required number of initial VMs and install CentOS 6.5 as the guest operating system using the settings specified for the test to be performed.
  • Install the fio.exe benchmark software on the VM(s).
  • Conduct benchmark test and record results:
    • Ensure the benchmark tests are run concurrently on all VMs being tested.
    • Measure baseline performance with all-HDD storage and caching DISABLED.
    • Measure accelerated performance with all-HDD storage, but with FlashSoft write-through caching ENABLED.
  • Note the increase of application performance with caching enabled compared to baseline.
  • For VM Density testing – Clone the VM (created in step 4) and repeat benchmark tests described in step 6.
  • For VM Density testing – Continue increasing the number of tested VMs until maximum VM density with caching enabled can be determined. (Latency of accelerated VMs equals or exceeds latency of baseline.)

 

Considerations

In order to consistently measure the performance of cached configurations and reflect the operation of a cache that has been warmed through normal use, the cache was completely flushed after each individual test and “pre-conditioned” immediately before conducting the next benchmark test using the same warmup workload.

When testing multiple VMs, performance must be measured on all VMs concurrently. Ensure the benchmark test in each VM will run long enough for all VMs to be launched and benchmark scripts to run until completion.

Workload Configuration and Testing Methodology

The workload was configured and tested in two ways. The first test set was a basic performance test to measure the increase in application performance provided by FlashSoft software. This test was limited to two VMs concurrently running 100GB workloads with a 70%/30% read/write ratio, 100% random data distribution and 4KB aligned data blocks. The 1.2TB size of the FlashSoft cache on the host server was large enough to fully contain the workloads tested in the VMs. This sped the testing process and simplified analysis of test results – the benchmark test was run in each VM until the cache hit ratio approached 100%; it could then be assumed the cache was adequately warmed and operating at its maximum potential. The measured IOPS values were summed for both VMs while the latency values were averaged across both VMs and weighted to match the 70/30 read/write ratio of the benchmark test.

The second test set was designed to measure VM density improvement. A lighter weight VM configuration was used for the density test and the workload was adjusted to limit the total IOPS processed by each VM to ensure uniformity amongst all VMs used in the test and to prevent any factors other than storage IO bandwidth from becoming a limiting factor (e.g. system memory, CPU utilization, network utilization, etc.) The test was started with a single VM; IOPS and latency were measured and graphed for the baseline (non-accelerated) and accelerated configurations. The VM was cloned and the test was run again and data recorded. This process was repeated, each run adding an additional VM to the host until the average weighted latency of the accelerated VMs matched that of the single VM running in the baseline configuration. This indicates the increased number of VMs (density) that can be supported by the accelerated system while providing the same level of performance of the baseline system.

Benchmark Preconditioning

Warmup for All Tests: Prior to each benchmark test the cache was warmed up using the following: fio.exe script:

[global]
ioengine=libaio
rw=write
filename=/dev/sdb
numjobs=8
runtime=240
bs=64k
ba=64k
exitall
randrepeat=0
time_based
iodepth=2
group_reporting
direct=1
[warmitup]

 

Benchmark Tests

Performance Benchmark Test: The performance test was conducted using the following parameters.
VM Setting:

  • Two (2) virtual machines operated simultaneously
  • 8 vCPUs
  • 8GB vRAM
  • 100GB vHDD (workload)

 

Benchmark Setting:

  • 100% random distribution
  • 4KB block size with aligned transfers
  • 70/30 read-write mix
  • Queue depth = 4
  • Treads = 16

 

Benchmark Script:

[global]
ioengine=libaio
rw=randrw
rwmixread=70
filename=/dev/sdb
numjobs=16
runtime=600
bs=4k
ba=4k
exitall
randrepeat=0
time_based
iodepth=4
group_reporting
direct=1
norandommap
ramp_time=900
[fst]

 

VM Density Benchmark Test: The VM Density test was conducted using the following parameters.

 

VM Setting:

  • One (1) to eight (8) virtual machines operated simultaneously
  • 2 vCPUs
  • 4GB vRAM
  • 12GB vHDD (workload)

 

Benchmark Setting:

  • 100% random distribution
  • 4KB block size with aligned transfers
  • 70/30 read-write mix
  • Queue depth = 1
  • Threads = 3
  • Bandwidth of IOPS capped at 84 reads/36 writes per thread

 

Benchmark Script:

[global]
ioengine=libaio
rw=randrw
rwmixread=70
filename=/dev/sdb
numjobs=3
runtime=120
bs=4k
ba=4k
exitall
randrepeat=0
time_based
iodepth=1
group_reporting
direct=1
norandommap
thread
ramp_time=360
rate_iops=84,36
[fst]

 

Measured Results

Performance Test

Table 1: Performance Test Measurements

 

Figure 1: Total IOPS across both VMs increased 2.4 times with caching enabled.

 

Figure 2: Average latency for both VMs (weighted 70/30 read-write ratio) decreased 2.9 times with caching enabled.

 

Table 2: VM Density Measurements

 

Figure 3: VM Density could be increased eight times with caching enabled (compared using weighted average latency).

 

Figure 4: The total number of IOPS processed by the VMs during the VM Density test.

 

Analysis / Conclusion

Data measured in the tests illustrate how FlashSoft software can noticeably improve application performance and VM density in a VMware vSphere environment compared to traditional all-HDD backend storage.

The tests were conducted with small data sets using fio.exe a synthetic benchmark testing tool. Although the tests were constructed to simulate the conditions typically encountered in real-world computing and to generate data that reveal application performance and VM density, the tests can only be considered a demonstration of the capability of FlashSoft software – the data should not be interpreted as the performance impact of FlashSoft software for all workload types and storage environments. The actual performance of any caching solution is highly dependent upon the workload and the computing environment in which it is used.

The application performance test was limited to two virtual machines that were allowed to run the benchmark test unencumbered. Read and write IOPS were aggregated for both virtual machines and used for direct comparison between baseline and accelerated configurations. Latency was averaged and weighted to account for the fact that over the course of the test, individual IOs had varying amounts of latency and 70% of the IO activity was read operations and 30% of the IO was write operations.

The IOPS comparison shows more than 2.4 times IOs were processed during the test with write-through caching enabled. The read and write latency values indicated in Table 1 show greater than 37 times decrease in read latency but a 3.4 times increase in write latency with caching enabled. This condition demonstrates typical behavior of a write-through cache because IOs are written to both the SSD and HDD backend before the write request is acknowledged to the application. Furthermore, a greater number of IOs were actually processed by the accelerated VMs during the test compared to the baseline. These two factors result in slightly increased write latency values with write-through caching enabled. The overall result of the test; however, in which 70% of all IOs were read requests and 30% were write requests, demonstrates a net decrease of total latency by 2.9 times.

The VM density test was handled differently from the performance test. The VMs were constructed to consume less system resources and run against smaller individual workloads. This was done to prevent the host server from becoming overloaded and to ensure uniformity amongst all VMs as the tests were run. Baseline and accelerated weighted latencies were measured and compared. The objective of the test was to determine how many more VMs could be operated with caching enabled compared to a non-accelerated baseline, within the same latency Service Level Agreement (SLA). The test showed a single VM without caching had a latency of 1.68 milliseconds, defining the single-VM SLA at 2 to 3 milliseconds for the baseline configuration, with latency steadily and greatly increasing as additional VMs were added. Furthermore, in the baseline configuration, when more than two VMs were provisioned, IOPS noticeably dropped below the upper threshold, clearly indicating an IO bottleneck. With FlashSoft caching enabled, latency essentially remained unchanged for up to five VMs and then gradually rose until eight VMs were provisioned, at which point the weighted average latency was still only 1.65 milliseconds – the same latency as a single non-accelerated VM. The IOPS of the accelerated VMs remained at the upper threshold for the first six VMs and only slightly dipped by addition of the eighth VM, indicating the alleviation of the IO bottleneck observed in the baseline configuration. Thus, the use of FlashSoft software as a write-through cache allowed an increase of eight times the VM density compared to an all-HDD baseline.

READY TO FLASH FORWARD?

Whether you’re a Fortune 500 or five person startup, SanDisk has solutions that will help you get the most out of your infrastructure.

VIA
EMAIL

Go ahead, ask us some questions and we'll get back to you with answers.

Let's Talk
800.578.6007

Don't wait, let's just talk now and start building the perfect flash solution.

Global Contact

Find contact information for offices all over the world.

SALES INQUIRIES

Whether you'd like to ask a few initial questions or are ready to discuss a SanDisk solution tailored to your organizations's needs, the SanDisk sales team is standing by to help.

We're happy to answer your questions, so please fill out the form below so we can get started. If you need to talk to the sales team immediately, please phone: 800.578.6007

Field cannot be empty.
Field cannot be empty.
Enter a valid email address.
Field can only contain numbers.
Field cannot be empty.
Field cannot be empty.
Field cannot be empty.
Field cannot be empty.
Field cannot be empty.
Field cannot be empty.

Please indicate your areas of interest:

You must choose an option.

Questions or comments:

You must choose an option.

Thank you. We have received your request.