设计工具
存储

Comparing BlueStore and FileStore block performance

瑞安梅雷迪思 | 2018年5月

Ceph BlueStore vs. FileStore: Block performance comparison when leveraging 微米 NVMe 固态硬盘s

BlueStore is the new 存储 engine for Ceph and is the default configuration in the community edition. BlueStore performance numbers are not included in our current 微米 Accelerated Ceph 存储 Solution reference architecture since it is currently not supported in Red Hat Ceph 3.0. I ran performance tests against the community edition of Ceph夜光(12.2.4) on our Ceph reference architecture hardware and will compare the results to the FileStore performance we achieved in RHCS 3.0在这个博客中.

4KB random write IOPS performance increases by 18%, average latency decreases by 15%, and 99.99% tail latency decreases by up to 80%. 4KB random read performance is better at higher queue depths with BlueStore.

块的工作负载 RHCS 3.0文件存储IOPS Ceph 12.2.4 BlueStore IOPS RHCS 3.0 FileStore平均延迟 Ceph 12.2.4 BlueStore平均延迟
4KB随机读 200万年 2.100万年 1.6ms 1.4ms
4KB随机写入 363K 424K 5.3ms 4.5ms

This solution is optimized for block performance. Random small block testing using the Rados Block Driver in Linux saturates platinum-level 8168 英特尔Purley processors in a 2-socket 存储 node.

With 10 drives per 存储 node, this architecture has a usable 存储 capacity of 232TB that can be scaled out by adding additional 1U 存储 nodes.

参考设计-硬件

SuperMicro switches, monitor nodes, and 存储 nodes

测试结果及分析

Ceph测试方法

Red Hat Ceph 存储.0 (12.2.1) is configured with FileStore with 2 OSDs per 微米 9200MAX NVMe 固态硬盘. A 20GB journal was used for each OSD.

Ceph发光社区(12.2.4) is configured with BlueStore with 2 OSDs per 微米 9200MAX NVMe 固态硬盘. RocksDB and WAL data are stored on the same partition as data.

In both configurations there are 10 drives per 存储 node and 2 OSDs per drive, 80 total OSDs with 232TB of usable capacity.

The Ceph 存储 pool tested was created with 8192 placement groups and 2x replication. Performance is tested with 100 RBD images at 75GB each, providing 7.5TB of data on a 2x replicated pool, 15TB of total data.

4KB random block performance was measured using FIO against the Rados Block Driver. We are CPU limited in all tests, even with 2x Intel 8168 CPUs per 存储 node.

RBD FIO 4KB随机写入 Performance: FileStore vs. BlueStore

BlueStore provides a ~18% increase in IOPS and a ~15% decrease in average latency.

 

Blue bar graph showing 4KB random write IOPS + average latency

There is also a large decrease in the tail latency of Ceph at higher FIO client counts with BlueStore. At 100 clients, tail latency is decreased by 4.3X. 在较低的客户机数量下, tail latency for BlueStore is higher than FileStore because BlueStore is pushing higher performance.

Blue bar graph showing 4KB random write IOPS + tail latency

RBD FIO 4KB随机读取

4KB random read performance is similar between FileStore & BlueStore. There’s a 5% increase in IOPS at a queue depth of 32.

Blue bar graph showing 4KB random read IOPS + average latency

Tail latency is also similar up to queue depth 32, where BlueStore performs better.

Blue bar graph showing 4KB random read IOPS + tail latency

你想知道更多吗?

RHCS 3.0 + 微米 9200 MAX NVMe 固态硬盘 on the 英特尔Purley platform is super fast. The latest reference architecture for 微米 Accelerated Ceph 存储 Solutions 现在可用. My next blog post will discuss FileStore vs. BlueStore related to object performance. I presented details about the reference architecture and other Ceph tuning and performance topics during my session at OpenStack峰会2018. 我的录音 会议在这里.

Have additional questions about our testing or methodology? 发邮件给我们ssd@dream-messenger.com.

Director, 存储 Solutions Architecture

瑞安梅雷迪思

瑞安梅雷迪思 is director of Data Center Workload Engineering for 微米's 存储 Business Unit, testing new technologies to help build 微米's thought leadership and awareness in fields like AI and NVMe-oF/TCP, along with all-flash software-defined 存储 technologies.