Improve connect by statment by parallel pipelined function

在oracle不是原生支持并行处理的地方, 可以利用parallel pipelined function这个特性DIY你想要的并行的效果. 比如通过定制, 把数据并行导出到不同的数据文件, 就快导出速度. 这篇博客介绍对于包含connect by的sql语句的一个优化例子. 在实际的客户系统中, 结合Exadata计算能力和sql并行改造, 获得上百倍或者上千倍的性能提升也是常见的.

脚本准备, 准备一个100k的表

drop table t1;
-- t1 with 100,000 rows
create table t1
as
select
    rownum                      id,
    lpad(rownum, 10, '0')       v1,
    trunc((rownum - 1)/100)     n1,
    rpad(rownum, 100)           padding
from
    dual
connect by level <= 100000
;

begin
    dbms_stats.gather_table_stats(user,t1);
end;
/

以下的查询因为包含connect by, 不能使用并行处理, 花了3m5s.

select /*+ monitor */
    count(*)
from
(
    select
        CONNECT_BY_ROOT ltrim(id) root_id,
        CONNECT_BY_ISLEAF is_leaf,
        level as t1_level,
        a.v1
    from t1 a
    start with a.id <=1000
    connect by NOCYCLE id = prior id + 1000
);

  COUNT(*)
----------
     10000

Elapsed: 00:03:05.17

准备parallel pipelined table function, 对于每个root, 输入rowid, 单独进行一次connect by. 以下是parallel pipelined function用到的脚本.

create or replace package refcur_pkg
AS
    TYPE R_REC IS RECORD (row_id ROWID);
    TYPE refcur_t IS REF CURSOR RETURN R_REC;
END;
/

create or replace package connect_by_parallel 
as
   /*  Naviagates a shallow hiearchy in parallel, where we do a tree walk for each root */

    CURSOR C1 (p_rowid ROWID) IS     -- Cursor done for each subtree. This select is provided by the customer
    select  CONNECT_BY_ROOT ltrim(id) root_id, CONNECT_BY_ISLEAF is_leaf, level as t1_level, a.v1
          from t1 a
          start with rowid = p_rowid
          connect by NOCYCLE id = prior id + 1000; 

    TYPE T1_TAB is TABLE OF C1%ROWTYPE;
 
    FUNCTION treeWalk (p_ref refcur_pkg.refcur_t) RETURN T1_TAB
             PIPELINED
    PARALLEL_ENABLE(PARTITION p_ref BY ANY);

END connect_by_parallel;
/ 

create or replace package body connect_by_parallel 
as  
FUNCTION treeWalk (p_ref refcur_pkg.refcur_t) RETURN T1_TAB
          PIPELINED PARALLEL_ENABLE(PARTITION p_ref BY ANY) 
IS
  in_rec p_ref%ROWTYPE;
BEGIN
   execute immediate 'alter session set "_old_connect_by_enabled"=true';
   LOOP -- for each root
    FETCH p_ref INTO in_rec; 
    EXIT WHEN p_ref%NOTFOUND;
    FOR c1rec IN c1(in_rec.row_id)  LOOP -- retrieve rows of subtree
        PIPE ROW(c1rec);
    END LOOP;
  END LOOP;
  execute immediate 'alter session set "_old_connect_by_enabled"=false';  
  RETURN;
END  treeWalk;

END connect_by_parallel;
/

改写之后, sql花了41s, 相对原来有4倍多得提升.

SELECT
  /*+ monitor */
  COUNT(*)
FROM TABLE(connect_by_parallel.treeWalk (CURSOR
  (SELECT /*+ parallel (a 100) */
    rowid FROM t1 a WHERE id <= 100))) b;

  COUNT(*)
----------
     10000

Elapsed: 00:00:41.21

发表评论

电子邮件地址不会被公开。 必填项已用*标注

您可以使用这些HTML标签和属性: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>