在oracle不是原生支持并行处理的地方, 可以利用parallel pipelined function这个特性DIY你想要的并行的效果. 比如通过定制, 把数据并行导出到不同的数据文件, 就快导出速度. 这篇博客介绍对于包含connect by的sql语句的一个优化例子. 在实际的客户系统中, 结合Exadata计算能力和sql并行改造, 获得上百倍或者上千倍的性能提升也是常见的.
脚本准备, 准备一个100k的表
drop table t1; -- t1 with 100,000 rows create table t1 as select rownum id, lpad(rownum, 10, '0') v1, trunc((rownum - 1)/100) n1, rpad(rownum, 100) padding from dual connect by level <= 100000 ; begin dbms_stats.gather_table_stats(user,t1); end; /
以下的查询因为包含connect by, 不能使用并行处理, 花了3m5s.
select /*+ monitor */ count(*) from ( select CONNECT_BY_ROOT ltrim(id) root_id, CONNECT_BY_ISLEAF is_leaf, level as t1_level, a.v1 from t1 a start with a.id <=1000 connect by NOCYCLE id = prior id + 1000 ); COUNT(*) ---------- 10000 Elapsed: 00:03:05.17
准备parallel pipelined table function, 对于每个root, 输入rowid, 单独进行一次connect by. 以下是parallel pipelined function用到的脚本.
create or replace package refcur_pkg AS TYPE R_REC IS RECORD (row_id ROWID); TYPE refcur_t IS REF CURSOR RETURN R_REC; END; / create or replace package connect_by_parallel as /* Naviagates a shallow hiearchy in parallel, where we do a tree walk for each root */ CURSOR C1 (p_rowid ROWID) IS -- Cursor done for each subtree. This select is provided by the customer select CONNECT_BY_ROOT ltrim(id) root_id, CONNECT_BY_ISLEAF is_leaf, level as t1_level, a.v1 from t1 a start with rowid = p_rowid connect by NOCYCLE id = prior id + 1000; TYPE T1_TAB is TABLE OF C1%ROWTYPE; FUNCTION treeWalk (p_ref refcur_pkg.refcur_t) RETURN T1_TAB PIPELINED PARALLEL_ENABLE(PARTITION p_ref BY ANY); END connect_by_parallel; / create or replace package body connect_by_parallel as FUNCTION treeWalk (p_ref refcur_pkg.refcur_t) RETURN T1_TAB PIPELINED PARALLEL_ENABLE(PARTITION p_ref BY ANY) IS in_rec p_ref%ROWTYPE; BEGIN execute immediate 'alter session set "_old_connect_by_enabled"=true'; LOOP -- for each root FETCH p_ref INTO in_rec; EXIT WHEN p_ref%NOTFOUND; FOR c1rec IN c1(in_rec.row_id) LOOP -- retrieve rows of subtree PIPE ROW(c1rec); END LOOP; END LOOP; execute immediate 'alter session set "_old_connect_by_enabled"=false'; RETURN; END treeWalk; END connect_by_parallel; /
改写之后, sql花了41s, 相对原来有4倍多得提升.
SELECT /*+ monitor */ COUNT(*) FROM TABLE(connect_by_parallel.treeWalk (CURSOR (SELECT /*+ parallel (a 100) */ rowid FROM t1 a WHERE id <= 100))) b; COUNT(*) ---------- 10000 Elapsed: 00:00:41.21