标签归档:bloom filter

深入理解Oracle的并行执行(上)

这是被ACOUG和OTN收录的关于Oracle并行执行的论文, 主要涉及了以下的主题:

  • Oracle 并行执行为什么使用生产者-消费者模型.
  • 如何阅读并行执行计划.
  • 不同的数据分发方式分别适合什么样的场景.
  • 使用 partition wise join 和并行执行的组合提高性能.
  • 数据倾斜会对不同的分发方式带来什么影响.
  • 由于生产者-消费者模型的限制, 执行计划中可能出现阻塞点.
  • 布隆过滤是如何提高并行执行性能的.
  • 现实世界中, 使用并行执行时最常见的问题.
  • 文章链接: OTN, ACOUG

    Bloom Filter For Non-Parallel query on Non-Partition Table

    I notice an enhancement in 12.1.0.2.0 beta, bloom filter is enabled for non-parallel query on non-partition table. The enhancement may help improve the hash join performance, this page show a little demo. The testing is on Exadata X4-2 instance.

    > SELECT * FROM PRODUCT_COMPONENT_VERSION;
    
    PRODUCT          VERSION        STATUS
    ---------------------------------------- -------------------- --------------------
    NLSRTL           12.1.0.2.0       Beta
    Oracle Database 12c Enterprise Edition   12.1.0.2.0       64bit Beta
    PL/SQL           12.1.0.2.0       Beta
    TNS for Linux:         12.1.0.2.0       Beta
    
    > select platform_id, platform_name from v$database;
    
    PLATFORM_ID PLATFORM_NAME
    ----------- ----------------
       13 Linux x86 64-bit
    

    Prepare two table for hash join, 10k rows in t1 and 10m rows in t2. 100 records for each value on the column t1.n1 and t2.n1.

    drop table t1;
    drop table t2;
    
    -- t1 with 10,000 rows
    create table t1
    as
    select
        rownum                      id,
        lpad(rownum, 10, '0')       v1,
        trunc((rownum - 1)/100)     n1,
        rpad(rownum, 100)           padding
    from
        dual
    connect by level <= 10000
    ;
    
    -- t2 with 10,000,000 rows
    create table t2
    as
    select
        rownum                      id,
        lpad(rownum, 10, '0')       v1,
        trunc((rownum - 1)/100)     n1,
        rpad(rownum, 100)           padding
    from
        t1, t1
    where
        rownum <= 1e7;
    
    begin
        dbms_stats.gather_table_stats(
            ownname      => user,
            tabname      =>'T1',
            method_opt   => 'for all columns size 1'
        );
    
        dbms_stats.gather_table_stats(
            ownname      => user,
            tabname      =>'T2',
            method_opt   => 'for all columns size 1'
        );
    end;
    /
    

    The hash join between t1 and t2 return only 100 rows, if the bloom filter is created after t1 is scanned, and applied on t2 when t2 is scanned, most of the rows of t2 is filtered at the first place. The sql monitor report show that how bloom filter takes effect. With bloom filter enabled, the rows returned from smart scan on t2 is 15k, cell offload efficiency is 100%(only 2MB bytes returned from Exadata). Without bloom filter, the rows returned from smart scan on t2 is 10M, cell offload efficiency is 84%(230MB bytes returned from Exadata).

    alter system flush buffer_cache;
    select
        /*+ monitor */
        count(t2.v1)
    from
        t1, t2
    where
        t1.id = t2.id
    and t1.n1 = 0;
    

    alter system flush buffer_cache;
    select
        /*+ monitor no_px_join_filter(t2) */
        count(t2.v1)
    from
        t1, t2
    where
        t1.id = t2.id
    and t1.n1 = 0;
    

    Update on Mar 25th

    Serial generation for bloom filters on Exadata is enabled in 12.1.0.2, and backported to 11.2.0.4 with bug 16710321. Hidden parameter _bloom_serial_filter control the function.

    > @pd _bloom_serial_filter
    
    NAME                  VALUE     DESCRIPTION
    --------------------- --------- -------------------------------------------------------
    _bloom_serial_filter  ON        enable serial bloom filter on exadata