类型转换

数据类型转换是数据库设计和开发中常见的错误, 经过型转换, 优化器可能无法精确地估算cardinality和cost, 导致次优的执行计划, 这里讨论三种情况:

1. Varchar2 –> Number: 如果把数字存为字符串, 查询的使用Number类型, 优化器会使用to_number函数把存储的字符串转为数字. 这种情况除了修改代码, 建立To_number的函数索引也可以解决.

2. Date –> Timestamp: 这是我见得最多的, 一些Java开发工具像Toplink, 默认时间类型是timestamp, 如果数据存储使用Date类型, 比较时优化器会加上internal_function转换, 对于internal_function, 无法通过增加函数索引解决.

3. Char –> Varchar2: 这种情况比较隐蔽, 如果不观察执行计划中的绑定变量, 或者用10046时间记录绑定变量, 可能不会发现. Char和Varchar一样的字符串比较, 可能会导致优化器错误的估算.

测试环境是10.2.0.3 Linux 32bit. 先准备一张表T, 一百万条记录:
1. V1列把 mod(id,1000) 存为Varchar2(10), 与数字类型比较, 测试 Varchar2 –> Number.
2. V2列九十万数据是字符串’BIG’, 存为Varchar2(10), 与类型为Char(3)的’BIG’, 测试 Char –> Varchar2. 在V2上收集柱状图统计信息, 验证char和varchar2比较时优化器会不会得出错误的Cardinality.
3. D1列是Date类型, 一百万数据分布在过去的一千天, 测试 Date –> Timestamp
4. ID是主键, n1和pad是打酱油的.

在列V1, V2和D1创建索引, 比较全表扫描和索引扫描之间的性能差别.

drop table t purge;

exec dbms_random.seed(UID);

create table t
(
	id	number,
	n1	number,
	v1	varchar2(10),
	v2	varchar2(10),
	d1	date,
	pad	varchar2(1000)
);

insert /*+ append */ into t
with
generator as ( 
	select
		level
	from
		dual connect by level <= 1000)
select
	rownum id,
	rownum n1,
	to_char(mod(rownum, 1000)) v1,
	case mod(rownum, 10)
		when 0 then lpad(mod(rownum, 1000), 3, '0')
		else 'BIG'
	end v2,
	trunc(sysdate) - dbms_random.value(0,1000) d1,
	lpad(rownum, 1000, '0') pad
from
	generator, generator;


begin
	dbms_stats.gather_table_stats(user,'t', --
 		method_opt=>'for columns size 1 id, n1, v1, d1, pad, v2 size 254');
end;
/

alter table t add constraint t_pk primary key(id)
using index(create unique index t_pk on t(id) nologging);

create index t_v1 on t(v1) nologging;
create index t_v2 on t(v2) nologging;
create index t_d1 on t(d1) nologging;

select
	column_name,
	num_distinct,
	density,
	num_buckets,
	histogram
from
	user_tab_cols
where
	table_name = 'T';

COL NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM
--- ------------ ---------- ----------- ---------
PAD       998088 1.0019E-06           1 NONE
D1        997595 1.0024E-06           1 NONE
V2           101 5.0630E-07         100 FREQUENCY
V1          1001 .000999001           1 NONE
N1        998088 1.0019E-06           1 NONE
ID        998088 1.0019E-06           1 NONE

6 rows selected.

执行计划用dbms_xplan.display_cursor打印, 选项是’iostats last peeked_binds’. 这里关注两点:
1. 比较E-Rows和A-Rows, 观察优化器在类型转换时能否做出正确的估算.
2. 通过逻辑读buffers比较执行计划的性能差异.

Varchar2 –> Number

变量n1是Number, V1是Varchar2; n1和V1列比较时, 条件从 v1 = :n1 变成to_number(v1) = n1, 导致V1列上的索引T_V1无法命中, 只能全表扫描; 与索引扫描相比, 逻辑读是163K比1005. 解决的方法可以修改代码, 使用Varchar2类型, 或者建立函数索引to_number(v1), 或者把V1列存为数据.

sid@V10> set serveroutput off

sid@V10> variable n1 number;
sid@V10> variable v1 varchar2(1)
sid@V10> exec :n1 := 1; :v1 := '1';

PL/SQL procedure successfully completed.

sid@V10> select /*+ gather_plan_statistics*/
  2  	     max(n1)
  3  from
  4  	     t
  5  where
  6  	     v1 = :n1;

   MAX(N1)
----------
    999001

sid@V10> select * from table(dbms_xplan.display_cursor
  2  (null,null,'iostats last peeked_binds'));

---------------------------------------------------------------
| Id  | Operation          | Name | E-Rows | A-Rows | Buffers |
---------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |     163K|
|*  2 |   TABLE ACCESS FULL| T    |    997 |   1000 |     163K|
---------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (NUMBER): 1

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TO_NUMBER("V1")=:N1)


23 rows selected.

sid@V10> select /*+ gather_plan_statistics*/
  2  	     max(n1)
  3  from
  4  	     t
  5  where
  6  	     v1 = :v1;

   MAX(N1)
----------
    999001

sid@V10> select * from table(dbms_xplan.display_cursor
  2  (null,null,'iostats last peeked_binds'));

-------------------------------------------------------------------------
| Id  | Operation                    | Name | E-Rows | A-Rows | Buffers |
-------------------------------------------------------------------------
|   1 |  SORT AGGREGATE              |      |      1 |      1 |    1005 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T    |    997 |   1000 |    1005 |
|*  3 |    INDEX RANGE SCAN          | T_V1 |    999 |   1000 |       5 |
-------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (VARCHAR2(30), CSID=873): '1'

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("V1"=:V1)

Date –> Timestamp

分别用date和timestamp的变量查询2012年第一天最大的n1. 由于绑定变量类型不同, 对于sql 1r12rhar4jt6u产生两个执行计划. Child number=1是使用date类型的执行计划, 优化器正确的估算cardinality=1000, 选择索引扫描路径, 一次执行消耗973个逻辑读, 其中5个用于索引T_D1的扫描, 其他968个逻辑读花在TABLE ACCESS BY INDEX ROWID的操作上. Child number=0是使用timestamp的执行计划, 经过转换之后cardinality是2495, 与1000看起来没有相差太多, 但是优化器却选择了全表扫描, 消耗了163K的逻辑读. 在dbms_xplan.display_curs的输出中, 从Predicate Information中可以看到, 通过函数internal_function, D1列被转成timestamp与:B1和:B2比较, 因此无法使用索引T_D1, 只能全表扫描, Peeked Binds部分对于两个Timestamp的变量没有任何输出. 这种情况, 无法用Index Hint强制使用T_D1, 或者建立函数索引internal_function(D1), 只能修改代码使用Date类型, 或者把日期改存为Timestamp.

sid@V10> declare
  2  v_d1 date      := to_date('2012-01-01','yyyy-mm-dd');
  3  v_d2 date      := to_date('2012-01-02','yyyy-mm-dd');
  4  v_t1 timestamp := to_timestamp('2012-01-01','yyyy-mm-dd');
  5  v_t2 timestamp := to_timestamp('2012-01-02','yyyy-mm-dd');
  6  n number;
  7  begin
  8
  9  select /*+ gather_plan_statistics*/
 10      max(n1) into n
 11  from
 12      t
 13  where
 14      d1 between v_t1 and v_t2;
 15
 16  select /*+ gather_plan_statistics*/
 17          max(n1) into n
 18  from
 19      t
 20  where
 21      d1 between v_d1 and v_d2;
 22
 23  end;
 24  /

PL/SQL procedure successfully completed.

sid@V10> select * from table(dbms_xplan.display_cursor
   2   ('1r12rhar4jt6u',null,'iostats last peeked_binds'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------
SQL_ID  1r12rhar4jt6u, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics*/ MAX(N1) FROM T WHERE D1 BETWEEN :B2 AND :B1

Plan hash value: 1010173228

----------------------------------------------------------------
| Id  | Operation           | Name | E-Rows | A-Rows | Buffers |
----------------------------------------------------------------
|   1 |  SORT AGGREGATE     |      |      1 |      1 |     163K|
|*  2 |   FILTER            |      |        |    968 |     163K|
|*  3 |    TABLE ACCESS FULL| T    |   2495 |    968 |     163K|
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(:B2<=:B1)
   3 - filter((INTERNAL_FUNCTION("D1")>=:B2 AND INTERNAL_FUNCTION("D1")<=:B1))

SQL_ID  1r12rhar4jt6u, child number 1
-------------------------------------
SELECT /*+ gather_plan_statistics*/ MAX(N1) FROM T WHERE D1 BETWEEN :B2 AND :B1

Plan hash value: 648301450

--------------------------------------------------------------------------
| Id  | Operation                     | Name | E-Rows | A-Rows | Buffers |
--------------------------------------------------------------------------
|   1 |  SORT AGGREGATE               |      |      1 |      1 |     973 |
|*  2 |   FILTER                      |      |        |    968 |     973 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T    |   1000 |    968 |     973 |
|*  4 |     INDEX RANGE SCAN          | T_D1 |   1002 |    968 |       5 |
--------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (DATE): 01/01/12 00:00:00
   2 - (DATE): 01/02/12 00:00:00

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(:B2<=:B1)
   4 - access("D1">=:B2 AND "D1"<=:B1)


47 rows selected.

Char –> Varchar2

V2列是’BIG’的数据有900K. 使用char(3)比较时, Cardinality只有1, 严重偏离实际的数据量, 选择索引扫描路径, 逻辑读是165K. 使用Varchar2(3)优化器算出正确的Cardinality 899K, 选择全部扫描, 逻辑读是163K. 这个例子虽然逻辑读相差不大, 实际中这种成本估算错误, 很容易导致次优的执行计划.

sid@V10> variable c2 char(3);
sid@V10> variable v2 varchar2(3);
sid@V10> exec :c2 := 'BIG'; :v2 := 'BIG';

PL/SQL procedure successfully completed.

sid@V10> select /*+ gather_plan_statistics*/
  2  	     max(n1)
  3  from
  4  	     t
  5  where
  6  	     v2 = :c2;

   MAX(N1)
----------
    999999

sid@V10> select * from table(dbms_xplan.display_cursor
  2  (null,null,'iostats last peeked_binds'));

-------------------------------------------------------------------------
| Id  | Operation                    | Name | E-Rows | A-Rows | Buffers |
-------------------------------------------------------------------------
|   1 |  SORT AGGREGATE              |      |      1 |      1 |     165K|
|   2 |   TABLE ACCESS BY INDEX ROWID| T    |      1 |    900K|     165K|
|*  3 |    INDEX RANGE SCAN          | T_V2 |      1 |    900K|    1886 |
-------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (CHAR(30), CSID=873): 'BIG'

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("V2"=:C2)


24 rows selected.

sid@V10> select /*+ gather_plan_statistics*/
  2  	     max(n1)
  3  from
  4  	     t
  5  where
  6  	     v2 = :v2;

   MAX(N1)
----------
    999999

sid@V10> select * from table(dbms_xplan.display_cursor
  2  (null,null,'iostats last peeked_binds'));


---------------------------------------------------------------
| Id  | Operation          | Name | E-Rows | A-Rows | Buffers |
---------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |     163K|
|*  2 |   TABLE ACCESS FULL| T    |    899K|    900K|     163K|
---------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (VARCHAR2(30), CSID=873): 'BIG'

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("V2"=:V2)

23 rows selected.

观察执行计划时, 如果Cardinality有明显的错误, 可以查看Predicate Information, 确定有没有internal_function/to_number这种转换函数, 或者Peeked Binds中有没有意想不到的数据类型, 判断有没有存在类型转换.

随着版本演进, 优化器越来越智能, 比如第三种情况, 使用CHAR(3)的字符串’BIG’和V2比较, 在版本10.2.0.5和11.2.0.2中, 没有导致错误的Cardinality. 不过为了避免类型转换带来的麻烦, 开发或者数据库设计时, 数据类型的选择需要谨慎对待.

2010-09-22 札达县托林寺

发表评论

电子邮件地址不会被公开。 必填项已用*标注

您可以使用这些HTML标签和属性: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>