How to get extreme Exadata IOPS with SLOB

From Exadata X3-2 datasheet, The maximum Flash IOPS is 1.5M, which is pretty impressive.

With SLOB and the help from Kevin Closson, I can get around 1.4M maxinum IOPS on X3-2 full rack. This Blog show how to use Kevin Closson’s SLob to push the IOPS on X3-2 over 1M. To push IOPS as hard as possible, I’ll use all the 8 db nodes on X3-2, to trigger ramdon read concurrently to the 14 cell nodes. The procedure is to setup 256 SLOB schema, keep the SLOB table into smart flash cache and warm up the cache by smart scan. Prepare the SQL*NET connection to the 8 db nodes, make sure the buffer cache is small enough, so the logical read can not hit the buffer cahce, and need to issue physical read request to cells. Here comes the preparation steps and final observations.

1. Slob 2

Get the Slob 2 source 2013.05.05.slob2.tar and patch with SLOB2_patch_2014.01.13.tar

SLOB 2 — A Significant Update. Links Are Here.
http://kevinclosson.wordpress.com/2013/05/02/slob-2-a-significant-update-links-are-here/

Patch README :
https://my.syncplicity.com/share/ien1nsqok0equbk/README

Patch Source:
https://my.syncplicity.com/share/vba5x0tgn05a3bk/SLOB2_patch_2014.01.13.tar

2. Setup 256 SLOB.

You need hack the setup.sh a little bit to setup over 128 schemas.

NOTIFY: Load Parameters (slob.conf):

LOAD_PARALLEL_DEGREE == 32

...

NOTIFY: Seed table loading procedure has exited.
NOTIFY: Setting up user   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32
NOTIFY: Waiting for background processes - Sun Jan 26 04:58:02 PST 2014
NOTIFY: Setting up user   33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64
NOTIFY: Waiting for background processes - Sun Jan 26 05:03:13 PST 2014
NOTIFY: Setting up user   65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96
NOTIFY: Waiting for background processes - Sun Jan 26 05:08:13 PST 2014
NOTIFY: Setting up user   97  98  99  100  101  102  103  104  105  106  107  108  109  110  111  112  113  114  115  116  117  118  119  120  121  122  123  124  125  126  127  128
NOTIFY: Waiting for background processes - Sun Jan 26 05:13:06 PST 2014
NOTIFY: Setting up user   129  130  131  132  133  134  135  136  137  138  139  140  141  142  143  144  145  146  147  148  149  150  151  152  153  154  155  156  157  158  159  160
NOTIFY: Waiting for background processes - Sun Jan 26 05:17:59 PST 2014
NOTIFY: Setting up user   161  162  163  164  165  166  167  168  169  170  171  172  173  174  175  176  177  178  179  180  181  182  183  184  185  186  187  188  189  190  191  192
NOTIFY: Waiting for background processes - Sun Jan 26 05:23:35 PST 2014
NOTIFY: Setting up user   193  194  195  196  197  198  199  200  201  202  203  204  205  206  207  208  209  210  211  212  213  214  215  216  217  218  219  220  221  222  223  224
NOTIFY: Waiting for background processes - Sun Jan 26 05:29:02 PST 2014
NOTIFY: Setting up user   225  226  227  228  229  230  231  232  233  234  235  236  237  238  239  240  241  242  243  244  245  246  247  248  249  250  251  252  253  254  255  256
NOTIFY: Waiting for background processes - Sun Jan 26 05:34:54 PST 2014

Connect to the SLOB database as user1/user1 via sqlplus and execute @./procedure.

3. Prepare 8 slob service and add the 8 entries to the tnsnames.ora

srvctl add service -d exa -s slob1 -r exa1 -a exa2
srvctl add service -d exa -s slob2 -r exa2 -a exa1
srvctl add service -d exa -s slob3 -r exa3 -a exa1
srvctl add service -d exa -s slob4 -r exa4 -a exa1
srvctl add service -d exa -s slob5 -r exa5 -a exa1
srvctl add service -d exa -s slob6 -r exa6 -a exa1
srvctl add service -d exa -s slob7 -r exa7 -a exa1
srvctl add service -d exa -s slob8 -r exa8 -a exa1

srvctl start service -d exa -s slob1
srvctl start service -d exa -s slob2
srvctl start service -d exa -s slob3
srvctl start service -d exa -s slob4
srvctl start service -d exa -s slob5
srvctl start service -d exa -s slob6
srvctl start service -d exa -s slob7
srvctl start service -d exa -s slob8

slob1 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db01)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob1)))
slob2 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db02)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob2)))
slob3 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db03)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob3)))
slob4 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db04)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob4)))
slob5 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db05)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob5)))
slob6 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db06)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob6)))
slob7 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db07)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob7)))
slob8 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db08)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob8)))

4. slob.conf

UPDATE_PCT=0
RUN_TIME=300
WORK_LOOP=0
SCALE=1000000
WORK_UNIT=256
REDO_STRESS=HEAVY
LOAD_PARALLEL_DEGREE=32
SHARED_DATA_MODULUS=0

# Settings for SQL*Net connectivity:
SQLNET_SERVICE_BASE=slob
SQLNET_SERVICE_MAX=8

5. init parameters

Huge page is enable, except one of the eight node dont have enough huge page to hold the whole SGA. 128M buffer cache size seems the smallest size with CPU_COUNT=32.

*.use_large_pages='TRUE'
*.db_cache_size  = 128m

6. Keep the Slob tables into smart flash cache and warm up SLOB tables in smart flash cache by 3 smart scans.

declare
n number;
begin
  for i in 1 .. 256
  loop
    execute immediate 'alter table user' || i || '.cf1 storage (CELL_FLASH_CACHE keep)';
    execute immediate 'select /*+ full(cf1)*/ count(*) from user' || i||'.cf1' into n;
    execute immediate 'select /*+ full(cf1)*/ count(*) from user' || i||'.cf1' into n;
    execute immediate 'select /*+ full(cf1)*/ count(*) from user' || i||'.cf1' into n;
  end loop;
end;
/

7. Obserzation

I’ve tried the number of slob sessions with 96, 128, 160, 192, 224 and 256. With 160 sessions, I can get the maximum IOPS, which is average 1.36M. From the screen capture, the peak IOPS is over 14M, which is pretty close to the 1.5M stated in the X3-2 Datasheet, and each of the 14 cells provide 100K IOPS evenly, which is good. From the awr of 160 session, “cell list of blocks physical read” is 3ms and “cell single block physical read” is 1.25ms. CPU utilization is up to 90% on cells, while it’s 37% on db nodes, seems in such extreme case, the cell processing power is not sufficient and become the first bottleneck to scale.

Top Timed Events                            DB/Inst: EXA/exa1  Snaps: 994-995

                               Wait                                   Event                      Wait Time                    Summary Avg Wait Time (ms)
       ---------------------------------------------------   ----------------------   ------------------------------   ----------------------------------------
  I#   Class      Event                                             Waits %Timeouts        Total(s) Avg(ms) %DB time        Avg      Min      Max  Std Dev  Cnt
----   ---------- ----------------------------------------   ------------ ---------   ------------- ------- --------   -------- -------- -------- -------- ----
   *   User I/O   cell list of blocks physical read            11,416,579       0.0       35,283.85     3.1    43.04       3.09     2.98     3.12     0.05    8
       User I/O   cell single block physical read              25,295,624       0.0       31,685.75     1.3    38.65       1.25     1.25     1.26     0.00    8
                  DB CPU                                              N/A       N/A       24,707.27     N/A    30.14                                          8
       Cluster    gc cr grant 2-way                               758,555       0.0          154.92     0.2     0.19       0.21     0.20     0.26     0.02    8
       System I/O log file parallel write                           9,559       0.0           46.95     4.9     0.06       4.93     4.46     5.22     0.30    8
       User I/O   read by other session                            11,890       0.0           12.14     1.0     0.01       1.02     0.99     1.07     0.03    8
       System I/O control file sequential read                      8,279       0.0            9.62     1.2     0.01       1.16     1.13     1.18     0.02    8
       Cluster    gc buffer busy acquire                            1,638       0.0            8.86     5.4     0.01       6.77     3.35    11.89     3.36    8
       Other      CSS operation: data query                         1,600       0.0            7.74     4.8     0.01       4.84     3.76     6.00     0.81    8
       Concurrenc latch: cache buffers chains                      49,387       0.0            4.55     0.1     0.01       0.09     0.09     0.10     0.00    8

System Statistics - Per Second               DB/Inst: EXA/exa1  Snaps: 994-995

             Logical     Physical     Physical         Redo        Block         User
  I#         Reads/s      Reads/s     Writes/s   Size (k)/s    Changes/s      Calls/s      Execs/s     Parses/s   Logons/s       Txns/s
---- --------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ---------- ------------
   1      170,727.68    170,406.3          2.7      5,017.9         17.2          1.6        664.6          3.8       0.14          0.0
   2      160,987.67    162,898.8          2.5      4,833.9         22.4          1.4        630.4          5.9       0.13          0.1
   3      163,967.64    163,394.0          2.1      4,825.7         15.0          1.4        638.2          3.6       0.14          0.0
   4      174,957.44    174,551.8          3.1      5,130.8         12.7          1.3        679.2          2.1       0.13          0.0
   5      169,568.64    170,028.9          1.8      5,012.8         10.4          1.4        658.4          2.1       0.14          0.0
   6      172,990.65    172,771.3          3.4      5,082.2         13.6          1.4        671.8          2.3       0.14          0.0
   7      172,572.60    172,256.4          1.8      5,064.6         14.0          1.4        671.6          3.5       0.14          0.0
   8      171,493.42    170,899.7          2.6      5,016.6          9.8          1.4        665.8          2.1       0.14          0.0
 ~~~ ~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~~~
 Sum    1,357,265.74  1,357,207.1         20.0     39,984.6        115.1         11.5      5,280.0         25.3       1.10          0.3
 Avg      169,658.22    169,650.9          2.5      4,998.1         14.4          1.4        660.0          3.2       0.14          0.0
 Std        4,777.34      4,268.8          0.6        111.3          4.0          0.1         17.1          1.3       0.00          0.0

Update on 2014-Feb-08:

With work_unit=256 and only 160 slob sessions, I can push the IOPS over 1.3M. Although the cell is out of CPU(90%), CPU is pretty idle on db node(37%). To seek for 1.5M IOPS, I setup 768 slob schemas, try different small work_unit with more slob sessions. With work_unit=8 or work_unit=16, the db node is running out of cpu much faster than the storage cell. With work_unit=64, the CPU on db nodesand storage cell is relatively balance. The maximum IOPS observed with work_unit=64 is same as 1.4M(still can not get closer to 1.5M). Here are the graphs with work_unit=64, showing how IOPS, db node CPU utilization, Average Active Sessions, IO latency scale with the slob sessions.

--slob.conf
UPDATE_PCT=0
RUN_TIME=300
WORK_LOOP=0
SCALE=500000
WORK_UNIT=64
REDO_STRESS=HEAVY
LOAD_PARALLEL_DEGREE=32
SHARED_DATA_MODULUS=0

How to get extreme Exadata IOPS with SLOB》上有2条评论

  1. kevinclosson

    Even though the cells are nearly out of CPU I do think you can get even closer to the 1.5M by loading more than 256 schema users. Have you tried 48 users per host ( thus 384 total)?

    Another thing that sometimes pushes IOPS further is to reduce the “bite size” of the SLOB work loop. Try slob.conf->WORK_UNIT=16 just to see what that does.

    回复

发表评论

电子邮件地址不会被公开。 必填项已用*标注

您可以使用这些HTML标签和属性: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>