标签归档:Exadata

Bloom Filter For Non-Parallel query on Non-Partition Table

I notice an enhancement in 12.1.0.2.0 beta, bloom filter is enabled for non-parallel query on non-partition table. The enhancement may help improve the hash join performance, this page show a little demo. The testing is on Exadata X4-2 instance.

> SELECT * FROM PRODUCT_COMPONENT_VERSION;

PRODUCT          VERSION        STATUS
---------------------------------------- -------------------- --------------------
NLSRTL           12.1.0.2.0       Beta
Oracle Database 12c Enterprise Edition   12.1.0.2.0       64bit Beta
PL/SQL           12.1.0.2.0       Beta
TNS for Linux:         12.1.0.2.0       Beta

> select platform_id, platform_name from v$database;

PLATFORM_ID PLATFORM_NAME
----------- ----------------
   13 Linux x86 64-bit

Prepare two table for hash join, 10k rows in t1 and 10m rows in t2. 100 records for each value on the column t1.n1 and t2.n1.

drop table t1;
drop table t2;

-- t1 with 10,000 rows
create table t1
as
select
    rownum                      id,
    lpad(rownum, 10, '0')       v1,
    trunc((rownum - 1)/100)     n1,
    rpad(rownum, 100)           padding
from
    dual
connect by level <= 10000
;

-- t2 with 10,000,000 rows
create table t2
as
select
    rownum                      id,
    lpad(rownum, 10, '0')       v1,
    trunc((rownum - 1)/100)     n1,
    rpad(rownum, 100)           padding
from
    t1, t1
where
    rownum <= 1e7;

begin
    dbms_stats.gather_table_stats(
        ownname      => user,
        tabname      =>'T1',
        method_opt   => 'for all columns size 1'
    );

    dbms_stats.gather_table_stats(
        ownname      => user,
        tabname      =>'T2',
        method_opt   => 'for all columns size 1'
    );
end;
/

The hash join between t1 and t2 return only 100 rows, if the bloom filter is created after t1 is scanned, and applied on t2 when t2 is scanned, most of the rows of t2 is filtered at the first place. The sql monitor report show that how bloom filter takes effect. With bloom filter enabled, the rows returned from smart scan on t2 is 15k, cell offload efficiency is 100%(only 2MB bytes returned from Exadata). Without bloom filter, the rows returned from smart scan on t2 is 10M, cell offload efficiency is 84%(230MB bytes returned from Exadata).

alter system flush buffer_cache;
select
    /*+ monitor */
    count(t2.v1)
from
    t1, t2
where
    t1.id = t2.id
and t1.n1 = 0;

alter system flush buffer_cache;
select
    /*+ monitor no_px_join_filter(t2) */
    count(t2.v1)
from
    t1, t2
where
    t1.id = t2.id
and t1.n1 = 0;

Update on Mar 25th

Serial generation for bloom filters on Exadata is enabled in 12.1.0.2, and backported to 11.2.0.4 with bug 16710321. Hidden parameter _bloom_serial_filter control the function.

> @pd _bloom_serial_filter

NAME                  VALUE     DESCRIPTION
--------------------- --------- -------------------------------------------------------
_bloom_serial_filter  ON        enable serial bloom filter on exadata

How to get extreme Exadata IOPS with SLOB

From Exadata X3-2 datasheet, The maximum Flash IOPS is 1.5M, which is pretty impressive.

With SLOB and the help from Kevin Closson, I can get around 1.4M maxinum IOPS on X3-2 full rack. This Blog show how to use Kevin Closson’s SLob to push the IOPS on X3-2 over 1M. To push IOPS as hard as possible, I’ll use all the 8 db nodes on X3-2, to trigger ramdon read concurrently to the 14 cell nodes. The procedure is to setup 256 SLOB schema, keep the SLOB table into smart flash cache and warm up the cache by smart scan. Prepare the SQL*NET connection to the 8 db nodes, make sure the buffer cache is small enough, so the logical read can not hit the buffer cahce, and need to issue physical read request to cells. Here comes the preparation steps and final observations.

1. Slob 2

Get the Slob 2 source 2013.05.05.slob2.tar and patch with SLOB2_patch_2014.01.13.tar

SLOB 2 — A Significant Update. Links Are Here.
http://kevinclosson.wordpress.com/2013/05/02/slob-2-a-significant-update-links-are-here/

Patch README :
https://my.syncplicity.com/share/ien1nsqok0equbk/README

Patch Source:
https://my.syncplicity.com/share/vba5x0tgn05a3bk/SLOB2_patch_2014.01.13.tar

2. Setup 256 SLOB.

You need hack the setup.sh a little bit to setup over 128 schemas.

NOTIFY: Load Parameters (slob.conf):

LOAD_PARALLEL_DEGREE == 32

...

NOTIFY: Seed table loading procedure has exited.
NOTIFY: Setting up user   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32
NOTIFY: Waiting for background processes - Sun Jan 26 04:58:02 PST 2014
NOTIFY: Setting up user   33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64
NOTIFY: Waiting for background processes - Sun Jan 26 05:03:13 PST 2014
NOTIFY: Setting up user   65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96
NOTIFY: Waiting for background processes - Sun Jan 26 05:08:13 PST 2014
NOTIFY: Setting up user   97  98  99  100  101  102  103  104  105  106  107  108  109  110  111  112  113  114  115  116  117  118  119  120  121  122  123  124  125  126  127  128
NOTIFY: Waiting for background processes - Sun Jan 26 05:13:06 PST 2014
NOTIFY: Setting up user   129  130  131  132  133  134  135  136  137  138  139  140  141  142  143  144  145  146  147  148  149  150  151  152  153  154  155  156  157  158  159  160
NOTIFY: Waiting for background processes - Sun Jan 26 05:17:59 PST 2014
NOTIFY: Setting up user   161  162  163  164  165  166  167  168  169  170  171  172  173  174  175  176  177  178  179  180  181  182  183  184  185  186  187  188  189  190  191  192
NOTIFY: Waiting for background processes - Sun Jan 26 05:23:35 PST 2014
NOTIFY: Setting up user   193  194  195  196  197  198  199  200  201  202  203  204  205  206  207  208  209  210  211  212  213  214  215  216  217  218  219  220  221  222  223  224
NOTIFY: Waiting for background processes - Sun Jan 26 05:29:02 PST 2014
NOTIFY: Setting up user   225  226  227  228  229  230  231  232  233  234  235  236  237  238  239  240  241  242  243  244  245  246  247  248  249  250  251  252  253  254  255  256
NOTIFY: Waiting for background processes - Sun Jan 26 05:34:54 PST 2014

Connect to the SLOB database as user1/user1 via sqlplus and execute @./procedure.

3. Prepare 8 slob service and add the 8 entries to the tnsnames.ora

srvctl add service -d exa -s slob1 -r exa1 -a exa2
srvctl add service -d exa -s slob2 -r exa2 -a exa1
srvctl add service -d exa -s slob3 -r exa3 -a exa1
srvctl add service -d exa -s slob4 -r exa4 -a exa1
srvctl add service -d exa -s slob5 -r exa5 -a exa1
srvctl add service -d exa -s slob6 -r exa6 -a exa1
srvctl add service -d exa -s slob7 -r exa7 -a exa1
srvctl add service -d exa -s slob8 -r exa8 -a exa1

srvctl start service -d exa -s slob1
srvctl start service -d exa -s slob2
srvctl start service -d exa -s slob3
srvctl start service -d exa -s slob4
srvctl start service -d exa -s slob5
srvctl start service -d exa -s slob6
srvctl start service -d exa -s slob7
srvctl start service -d exa -s slob8

slob1 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db01)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob1)))
slob2 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db02)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob2)))
slob3 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db03)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob3)))
slob4 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db04)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob4)))
slob5 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db05)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob5)))
slob6 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db06)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob6)))
slob7 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db07)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob7)))
slob8 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = demo10db08)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = slob8)))

4. slob.conf

UPDATE_PCT=0
RUN_TIME=300
WORK_LOOP=0
SCALE=1000000
WORK_UNIT=256
REDO_STRESS=HEAVY
LOAD_PARALLEL_DEGREE=32
SHARED_DATA_MODULUS=0

# Settings for SQL*Net connectivity:
SQLNET_SERVICE_BASE=slob
SQLNET_SERVICE_MAX=8

5. init parameters

Huge page is enable, except one of the eight node dont have enough huge page to hold the whole SGA. 128M buffer cache size seems the smallest size with CPU_COUNT=32.

*.use_large_pages='TRUE'
*.db_cache_size  = 128m

6. Keep the Slob tables into smart flash cache and warm up SLOB tables in smart flash cache by 3 smart scans.

declare
n number;
begin
  for i in 1 .. 256
  loop
    execute immediate 'alter table user' || i || '.cf1 storage (CELL_FLASH_CACHE keep)';
    execute immediate 'select /*+ full(cf1)*/ count(*) from user' || i||'.cf1' into n;
    execute immediate 'select /*+ full(cf1)*/ count(*) from user' || i||'.cf1' into n;
    execute immediate 'select /*+ full(cf1)*/ count(*) from user' || i||'.cf1' into n;
  end loop;
end;
/

7. Obserzation

I’ve tried the number of slob sessions with 96, 128, 160, 192, 224 and 256. With 160 sessions, I can get the maximum IOPS, which is average 1.36M. From the screen capture, the peak IOPS is over 14M, which is pretty close to the 1.5M stated in the X3-2 Datasheet, and each of the 14 cells provide 100K IOPS evenly, which is good. From the awr of 160 session, “cell list of blocks physical read” is 3ms and “cell single block physical read” is 1.25ms. CPU utilization is up to 90% on cells, while it’s 37% on db nodes, seems in such extreme case, the cell processing power is not sufficient and become the first bottleneck to scale.

Top Timed Events                            DB/Inst: EXA/exa1  Snaps: 994-995

                               Wait                                   Event                      Wait Time                    Summary Avg Wait Time (ms)
       ---------------------------------------------------   ----------------------   ------------------------------   ----------------------------------------
  I#   Class      Event                                             Waits %Timeouts        Total(s) Avg(ms) %DB time        Avg      Min      Max  Std Dev  Cnt
----   ---------- ----------------------------------------   ------------ ---------   ------------- ------- --------   -------- -------- -------- -------- ----
   *   User I/O   cell list of blocks physical read            11,416,579       0.0       35,283.85     3.1    43.04       3.09     2.98     3.12     0.05    8
       User I/O   cell single block physical read              25,295,624       0.0       31,685.75     1.3    38.65       1.25     1.25     1.26     0.00    8
                  DB CPU                                              N/A       N/A       24,707.27     N/A    30.14                                          8
       Cluster    gc cr grant 2-way                               758,555       0.0          154.92     0.2     0.19       0.21     0.20     0.26     0.02    8
       System I/O log file parallel write                           9,559       0.0           46.95     4.9     0.06       4.93     4.46     5.22     0.30    8
       User I/O   read by other session                            11,890       0.0           12.14     1.0     0.01       1.02     0.99     1.07     0.03    8
       System I/O control file sequential read                      8,279       0.0            9.62     1.2     0.01       1.16     1.13     1.18     0.02    8
       Cluster    gc buffer busy acquire                            1,638       0.0            8.86     5.4     0.01       6.77     3.35    11.89     3.36    8
       Other      CSS operation: data query                         1,600       0.0            7.74     4.8     0.01       4.84     3.76     6.00     0.81    8
       Concurrenc latch: cache buffers chains                      49,387       0.0            4.55     0.1     0.01       0.09     0.09     0.10     0.00    8

System Statistics - Per Second               DB/Inst: EXA/exa1  Snaps: 994-995

             Logical     Physical     Physical         Redo        Block         User
  I#         Reads/s      Reads/s     Writes/s   Size (k)/s    Changes/s      Calls/s      Execs/s     Parses/s   Logons/s       Txns/s
---- --------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ---------- ------------
   1      170,727.68    170,406.3          2.7      5,017.9         17.2          1.6        664.6          3.8       0.14          0.0
   2      160,987.67    162,898.8          2.5      4,833.9         22.4          1.4        630.4          5.9       0.13          0.1
   3      163,967.64    163,394.0          2.1      4,825.7         15.0          1.4        638.2          3.6       0.14          0.0
   4      174,957.44    174,551.8          3.1      5,130.8         12.7          1.3        679.2          2.1       0.13          0.0
   5      169,568.64    170,028.9          1.8      5,012.8         10.4          1.4        658.4          2.1       0.14          0.0
   6      172,990.65    172,771.3          3.4      5,082.2         13.6          1.4        671.8          2.3       0.14          0.0
   7      172,572.60    172,256.4          1.8      5,064.6         14.0          1.4        671.6          3.5       0.14          0.0
   8      171,493.42    170,899.7          2.6      5,016.6          9.8          1.4        665.8          2.1       0.14          0.0
 ~~~ ~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~~~
 Sum    1,357,265.74  1,357,207.1         20.0     39,984.6        115.1         11.5      5,280.0         25.3       1.10          0.3
 Avg      169,658.22    169,650.9          2.5      4,998.1         14.4          1.4        660.0          3.2       0.14          0.0
 Std        4,777.34      4,268.8          0.6        111.3          4.0          0.1         17.1          1.3       0.00          0.0

Update on 2014-Feb-08:

With work_unit=256 and only 160 slob sessions, I can push the IOPS over 1.3M. Although the cell is out of CPU(90%), CPU is pretty idle on db node(37%). To seek for 1.5M IOPS, I setup 768 slob schemas, try different small work_unit with more slob sessions. With work_unit=8 or work_unit=16, the db node is running out of cpu much faster than the storage cell. With work_unit=64, the CPU on db nodesand storage cell is relatively balance. The maximum IOPS observed with work_unit=64 is same as 1.4M(still can not get closer to 1.5M). Here are the graphs with work_unit=64, showing how IOPS, db node CPU utilization, Average Active Sessions, IO latency scale with the slob sessions.

--slob.conf
UPDATE_PCT=0
RUN_TIME=300
WORK_LOOP=0
SCALE=500000
WORK_UNIT=64
REDO_STRESS=HEAVY
LOAD_PARALLEL_DEGREE=32
SHARED_DATA_MODULUS=0

SLOB on Exadata X3-2: Random Read IOPS

I try to benchmark the random read IOPS capacity on one of Exadata X3-2 db node with SLOB, Here comes some figure.

1. Parameters

Disable prefecting and avoid multiple block read by adding below 3 init parameters.

_db_block_prefetch_limit      0
_db_block_prefetch_quota      0
_db_file_noncontig_mblock_rea 0

Block size is 8K, db_cache_siz is set to 128M(seems this the lowest db_cache_size I can set 11.2.0.3)

cpu_count                     32
db_block_size                 8192
db_cache_size                 134217728

2. Segment size and sql

prepare 128 users, each have the table CF1 which size is around 8G(with scale=1000000 in slob.conf)

setup.sh IOPS 128

The loading of 8K block random random reads is triggered by below query.

SELECT COUNT(C2) FROM CF1 WHERE CUSTID > ( :B1 - :B2 ) AND (CUSTID < :B1 )

Observations

The Max IOPS is around 140K, when the concurrent process is 96. The avg wait time of ‘db file sequential read’ increase when the concurrent process beyond 64, and CPU utilization drops when the processes number beyond 96, seem the system is overloading and the performance degrades.

Update on 2013-Feb-09:

After below two change, I can gain 250K IOPS with 64 sessions on single db node(almost twice IOPS than before the change), when the cpu is 90%.

Change 1. Comment out the below parameters to enable db block prefect.

--*._db_block_prefetch_limit = 0
--*._db_block_prefetch_quota = 0
--*._db_file_noncontig_mblock_read_count = 0

Change 2. update the WORK_UNIT=16

--slob.conf
UPDATE_PCT=0
RUN_TIME=300
WORK_LOOP=0
SCALE=1000000
WORK_UNIT=16
REDO_STRESS=HEAVY
LOAD_PARALLEL_DEGREE=8
SHARED_DATA_MODULUS=0

The awr number of single block read and physical read, with 32/64/96/128 slob sessions.

[oracle@demo01db01 awr]$ grep "cell single block physical rea" * | sort -t_ -k1 -n
32_awr.txt:cell single block physical rea    9,257,339       3,778      0   39.3 User I/O
64_awr.txt:cell single block physical rea   15,161,265       8,301      1   43.2 User I/O
96_awr.txt:cell single block physical rea   19,802,040      13,751      1   47.6 User I/O
128_awr.txt:cell single block physical rea   17,269,328      15,964      1   41.4 User I/O
[oracle@demo01db01 awr]$ grep "Physical reads:" * | sort -t_ -k1 -n
32_awr.txt:  Physical reads:          197,068.2       59,670,688.0
64_awr.txt:  Physical reads:          249,996.2        3,594,314.9
96_awr.txt:  Physical reads:          253,909.9        3,669,155.5
128_awr.txt: Physical reads:          255,982.8        3,733,850.4