设计工具
公司

Ceph BlueStore: To cache or not to cache, that Is the question

约翰Mazzie | 2019年3月

To Cache or not to Cache, that is the question.

嗯,你呢?? Cache for your Ceph® cluster? The answer is, that it depends.

You can use high-end enterprise NVMe™ drives, such as the 微米® 9200 MAX, and not have to worry about getting the most performance from your Ceph cluster. But what if you would like to gain more performance in a system that is made up mostly of SATA drives. 如果是这样的话, there are benefits to adding a couple of faster drives to your Ceph OSD 服务器 for storing your BlueStore database and write-ahead log.

微米 developed and tested the popular Accelerated Ceph 存储 Solution, which leverages 服务器 with Red Hat Ceph 存储 running on Red Hat Linux. I will go through a few workload scenarios and show you where caching can help you, based on actual results from our solution testing lab.

系统配置

Testing was done using a four OSD node Ceph cluster with the following configuration:

 处理器  单插座AMD 7551P
 内存  256GB DDR4 @ 2666Hz (8x32GB)
 网络  100G
 SATA驱动器  微米5210离子 3.84年结核病(x12)
 NVMe驱动器(缓存设备)  微米 9200 Max 1.6结核病(x2)
 OS  Red Hat®Enterprise Linux.6
 应用程序  Red Hat Ceph 存储.2
 每盘SATA盘的osd数  2
 数据集  50 RBDs @ 150GB each with 2x replication

Table 1: Ceph OSD Server Configuration

4KiB随机块测试

对于4KiB随机写入, 使用FIO(灵活I/O), you can see that utilizing caching drives greatly increases your performance while keeping your tail latency low, 即使在高负荷下. 对于40个FIO实例, the performance is 71% higher (190K vs 111K) and tail latency is 72% lower (119ms vs 665ms).

 

Ceph 1

Figure 1: 4KiB Random Write Performance and Tail Latency

There is some performance gain during 4KiB Random Read testing, but it is much less convincing. 这是可以预期的, 在读测试期间, the write-ahead log will not be utilized and the BlueStore database won’t change much if at all.

Figure 2: 4KiB Random Read Performance and Tail Latency

A mixed workload (70% Read/30% Write) also shows the benefits of having caching devices in your system. Performance gains range from 30% at 64 queue depth to 162% at 6 Queue depth.

Figure 3: 4KiB Random 70% Read/30% Write Performance and Tail Latency

4MiB对象测试

When running the rados bench command with 4MiB objects, there is some performance gain with caching devices, but it’s not as dramatic as the small block workloads. Since the write-ahead log is small and the objects are large, there is much less impact on performance by adding caching devices. 吞吐量提高了9% (4.94 GiB/s vs 4.53 GiB/s) with caching vs none, while average latency is 7% lower (126ms vs 138ms), when running 10 instances of rados bench.

Figure 4: 4MiB Object Write Performance

With reads, we again see that there is negligible performance gain across the board.

Figure 5: 4MiB Object Read Performance

结论

如你所见, if your workload is almost all reads, you won’t gain much if anything from adding caching devices to your Ceph cluster for BlueStore database and write-ahead log storage. But with writes, it is a completely different story. 虽然, 对于大型物体, 这是有好处的, the real showstopper for caching devices is with small block writes and mixed workloads. For a small investment of adding a couple of 微米 performance 9200 NVMe drives to your system, you can get the most out of your Ceph cluster.

What sorts of results are you getting with your open source storage? Learn more at 微米 Accelerated Ceph 存储.

Stay up to date by following us on Twitter @微米存储 and connect with us on LinkedIn.

Principal 存储 Solutions Engineer

约翰Mazzie

John is a Member of the Technical Staff in the Data Center Workload Engineering group in Austin, TX. He graduated in 2008 from West Virginia University with his MSEE with an emphasis in wireless communications. John has worked for Dell on their storage MD3 Series of storage arrays on both the development and sustaining side. John joined 微米 in 2016 where he has worked on Cassandra, MongoDB, 和Ceph, and other advanced storage workloads.