Exadata Write-back Smart Flash Cache Random Write Performance

最近在一台X4-2半配上用SLOB测试了Write-back Smart Flash Cache对于随机写的性能, 记录一下, 测试的时候只有6个cell节点可用, 而不是7个.

Oracle后台进程Database write(dbwr)负责把buffer cache中的脏块写回到数据文件中, 这些写操作几乎都是随机写的. 为了让dbwr尽可能进行繁忙的随机写, 可以把buffer cache设置的足够小, 这个测试中设为192M, 表CF1每个数据块只包含一条记录, 思路是每次update 64条记录, 大约产生64个脏块. 把redo logfile size设置足够大, 通过set C2=‘A’尽可能少得产生redo, 避免lgwr成为瓶颈.

update和select的语句如下, 比例接近50%-50%.

        Elapsed                  Elapsed Time
        Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
         1,996.3         88,329          0.02   83.1   32.0   77.8 f335amzu9kgy8
Module: SQL*Plus
UPDATE CF1 SET C2='A' WHERE ( CUSTID > ( :B1 - :B2 ) ) AND (CUSTID < :B1 )

           382.6         95,873          0.00   15.9   56.0   60.1 bhdvtsvjhgvrh
Module: SQL*Plus
SELECT COUNT(C2) FROM CF1 WHERE ( CUSTID > ( :B1 - :B2 ) ) AND (CUSTID < :B1 )

当flash cache为write-back模式, buffer cache中得脏块只需写到存储节点的flash上, db file parallel write平均时间在1ms左右, top 10等待事件, 如下:

Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            Tota    Wait   % DB
Event                                 Waits Time Avg(ms)   time Wait Class
------------------------------ ------------ ---- ------- ------ ----------
cell single block physical rea    5,921,681 1635       0   68.1 User I/O
DB CPU                                      877.           36.5
cell list of blocks physical r      190,164 148.       1    6.2 User I/O
log file switch completion               16   .4      22     .0 Configurat
reliable message                         11   .2      18     .0 Other
enq: WF - contention                     20   .1       4     .0 Other
gc current block busy                    41    0       1     .0 Cluster
gc current block 3-way                  111    0       0     .0 Cluster
gc cr multi block request                10    0       3     .0 Cluster
control file sequential read            136    0       0     .0 System I/O

                                                             Avg
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file parallel write          395,538     0        276       1      1.7   19.1

当flash cache为write-through模式, buffer cache中得脏块需要直接写到磁盘上, db file parallel write平均时间不小于21ms, 造成大量的free buffer waits, top 10等待事件, 如下:

Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                           Total Wait       Wait   % DB Wait
Event                                Waits Time (sec)    Avg(ms)   time Class
------------------------------ ----------- ---------- ---------- ------ --------
free buffer waits                1,039,389      12.2K      11.72   63.0 Configur
cell single block physical rea   9,244,945     4251.9       0.46   22.0 User I/O
DB CPU                                         2561.1              13.3
cell list of blocks physical r     358,004      736.8       2.06    3.8 User I/O
latch: gc element                  144,537      368.1       2.55    1.9 Other
write complete waits                   571      224.7     393.47    1.2 Configur
cursor: pin S wait on X              1,793       54.4      30.35     .3 Concurre
enq: RO - fast object reuse             92       43.5     472.32     .2 Applicat
latch free                          15,992       18.6       1.16     .1 Other
latch: gcs resource hash            12,772       17.9       1.40     .1 Other

                                                Total        Avg
                                       %Time     Wait       wait    Waits   % bg
Event                            Waits -outs Time (s)       (ms)     /txn   time
-------------------------- ----------- ----- -------- ---------- -------- ------
db file parallel write           7,019     0      153      21.74      0.0    7.4

在write-back模式下用32/64/128/192/256/320/392并发分别跑五分钟, 以下是各种统计图.

计算节点的CPU利用率, 因为主要是IO操作, 计算节点的CPU利用率并不高.

awr报告中的IOPS, 其中写IOPS最高为25万左右.

awr报告中的MBPS, 其中写MBPS最高为2GB/s.

存储节点的CPU利用率和MBPS, 其中写MBPS最高为4GB/s.

存储节点的IOPS, 使用392并发时, flash卡德IOPS为1百万, 如果满配的14个存储节点的话, 会接近datasheet所说的2,660,000的flash IOPS.

存储节点的infiniband流量, 接近存储节点的MBPS

存储节点的IOPS/MBPS会比awr报告中的值稍大, 因为对于写操作, 在write-back模式下, 需要写两份到flash卡上(normal redundency), 这也意味这打开write-back的话, 会牺牲一般的flash的容量, 这是设置write-back时需要权衡的.

write-back的实际作用:

对于写密集型的应用, 比如繁忙的股票交易系统, 有大量的数据更新, 不用write-back的话, Exadata的磁盘的IOPS很其实容易用满, 例如:
1. buffer cache中的脏块需要直接写到磁盘上, normal redundency写两份, high redundency写三份;
2. 因为配置了data guard或者其他原因, 需要频繁的读写控制文件;

此时设置write-back会极大改善这类应用的IO性能.

发表评论

电子邮件地址不会被公开。 必填项已用*标注

您可以使用这些HTML标签和属性: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>