分类目录归档:未分类

reading this week

Dumps
http://www.juliandyke.com/Diagnostics/Dumps/HEAPDUMP.html

Advanced Oracle Troubleshooting Session – PGA/UGA memory fragmentation
http://oracle-randolf.blogspot.com/

不均衡分区和绑定变量窥视导致的查询计划错误
http://www.hellodba.com/Cases/Unbalance_part_hist.html

dbms_xplan(3)
http://jonathanlewis.wordpress.com/2008/03/06/dbms_xplan3/

Treedump – 2
http://jonathanlewis.wordpress.com/2010/03/07/treedump-2/

Index Efficiency 3
http://jonathanlewis.wordpress.com/2010/03/03/index-efficiency-3/

treedump
http://jonathanlewis.wordpress.com/2009/08/17/treedump/

Index Explosion
http://jonathanlewis.wordpress.com/2009/07/28/index-explosion/

PDF Doc: Oracle B-Tree Index Internals: Rebuilding The Truth

Log File Sync – What is Wrong with this Quote?
http://hoopercharles.wordpress.com/2010/06/16/log-file-sync-what-is-wrong-with-this-quote/

Gary Millsap: Why You Should Focus on LIOs Instead of PIOs

Tim Gorman: THE SEARCH FOR INTELLIGENT LIFE IN THE COST-BASED OPTIMIZER

Migration of a Database to ASM

记录一下Migration之前碰到的问题:

1. 昨天用oracleasm安装asm_home的版本是10.2.0.1, db instance认不到asm diskgroup, 升级asm_home到10.2.0.5.0, 和ORACLE_HOME一样的版本
2. 在asm diskgroup上创建tablespace出现以下错误,

create tablespace sidtest datafile '+DATA/asmtest_01.dbf' size 10m
*
ERROR at line 1:
ORA-01119: error in creating database file '+DATA/asmtest_01.dbf'
ORA-15012: ASM file 'asmtest_01.dbf' does not exist
ORA-17502: ksfdcre:5 Failed to create file +DATA/asmtest_01.dbf
ORA-15081: failed to submit an I/O operation to a disk

3. 怀疑oracle用户对/dev/raw/中的裸设备没有读写权限,

[oracleasm@CS-RHEL raw]$ ls -l
total 0
crw------- 1 oracleasm asmdba 162, 1 Jun  2 17:05 raw1
crw------- 1 oracleasm asmdba 162, 2 Jun  2 17:04 raw2
crw------- 1 oracleasm asmdba 162, 3 Jun  2 17:04 raw3
crw------- 1 oracleasm asmdba 162, 4 Jun  2 17:04 raw4

[oracleasm@CS-RHEL raw]$ chmod a+rw *

[oracleasm@CS-RHEL raw]$ ls -l
total 0
crw-rw-rw- 1 oracleasm asmdba 162, 1 Jun  2 17:24 raw1
crw-rw-rw- 1 oracleasm asmdba 162, 2 Jun  2 17:19 raw2
crw-rw-rw- 1 oracleasm asmdba 162, 3 Jun  2 17:19 raw3
crw-rw-rw- 1 oracleasm asmdba 162, 4 Jun  2 17:19 raw4

4. 重启asm instance, create tablespace依然失败,这次是600错误

idle> create tablespace sidtest datafile '+DATA/asmtest_01.dbf' size 10m;

ERROR at line 1:
ORA-00600: internal error code, arguments: [kfmsFindGrp00], [Unknown error in NM], [], [], [], [],
[], []

5. 重启database instance, tablespace 创建成功. asm diskgroup终于可用了.

Migrate 用到的脚本:

重启instance

SHUTDOWN IMMEDIATE
startup mount

backup datafile copy到ASM Diskgroup

CONFIGURE DEVICE TYPE DISK PARALLELISM 4;
BACKUP AS COPY DATABASE FORMAT '+DATA';

backup spfile 到ASM Diskgroup

run {
   BACKUP AS BACKUPSET SPFILE;
   RESTORE SPFILE TO "+DATA/spfile";
}

新建pfile指向+DATA上面的spfile

/tmp/pfile.ora
SPFILE=+DATA/spfile

重启instance, 设置新的controlfile位置

SHUTDOWN IMMEDIATE
startup nomount pfile='/tmp/pfile.ora'
alter system set control_file='+DATA/ct1.f' scope=spfile sid='*';

重启instance, restore controlfile到+DATA, switch database 到diskgroug上, 直接open database, 不用recover database;

SHUTDOWN IMMEDIATE
startup nomount pfile='/tmp/pfile.ora'
restore control file from '/home/u02/app/oracle/product/11.1.0/db_1/oradata/SIDGD/control01.ctl';
alter database mount;
switch database to copy;
alter database open;

最后, 将pfile搬到$ORACLE_HOME/dbs, 这样Oracle每次启动都可以识别到+DATA/spfile

[oracle@CS-RHEL dbs]$ mv /tmp/pfile.ora initasmsid.ora

Ref:
10R1: http://download.oracle.com/docs/cd/B13789_01/server.101/b10734/rcmasm.htm
10R2: http://download.oracle.com/docs/cd/B19306_01/backup.102/b14191/rcmasm.htm#BRADV12121 (利用increment backup, 减少停机时间)

PS: 最后记得把online redo log 和temp tablespace Migrate到ASK上面

declare
   cursor rlc is
      select group# grp, thread# thr, bytes/1024 bytes_k, 'NO' srl
        from v$log
      union
      select group# grp, thread# thr, bytes/1024 bytes_k, 'YES' srl
        from v$standby_log
      order by 1;
   stmt     varchar2(2048);
   swtstmt  varchar2(1024) := 'alter system switch logfile';
   ckpstmt  varchar2(1024) := 'alter system checkpoint global';
begin
   for rlcRec in rlc loop
      if (rlcRec.srl = 'YES') then
         stmt := 'alter database add standby logfile thread ' ||
                 rlcRec.thr || ' ''+DATA'' size ' || 
                 rlcRec.bytes_k || 'K';
         execute immediate stmt;
         stmt := 'alter database drop standby logfile group ' || rlcRec.grp;
         execute immediate stmt;
      else
         stmt := 'alter database add logfile thread ' ||
                 rlcRec.thr || ' ''+DATA'' size ' ||  
                 rlcRec.bytes_k || 'K';
         execute immediate stmt;
         begin
            stmt := 'alter database drop logfile group ' || rlcRec.grp;
            dbms_output.put_line(stmt);
            execute immediate stmt;
         exception
            when others then
               execute immediate swtstmt;
               execute immediate ckpstmt;
               execute immediate stmt;
         end;
      end if;
   end loop;
end;

*
ERROR at line 1:
ORA-00350: log 3 of instance asmsid (thread 1) needs to be archived
ORA-00312: online log 3 thread 1:
'/home/u02/app/oracle/product/11.1.0/db_1/oradata/asmsid/redo03.log'
ORA-06512: at line 34
ORA-01623: log 3 is current log for instance asmsid (thread 1) - cannot drop
ORA-00312: online log 3 thread 1:
'/home/u02/app/oracle/product/11.1.0/db_1/oradata/asmsid/redo03.log'


sys@ASMSID> alter system archive log sequence 24 to '/home/rman/asmsid/';

System altered.
sys@ASMSID> alter database drop logfile group 3;
Database altered.

idle> @log

    GROUP#    THREAD#  SEQUENCE#   FIRST_CHANGE# STATUS           ARC      BYTES
---------- ---------- ---------- --------------- ---------------- --- ----------
MEMBER
------------------------------------------------------------
         1          1         25          337049 CURRENT          NO    52428800
+DATA/asmsid/onlinelog/group_1.264.720657717

         2          1          0               0 UNUSED           YES   52428800
+DATA/asmsid/onlinelog/group_2.260.720657723

         4          1          0               0 UNUSED           YES   52428800
+DATA/asmsid/onlinelog/group_4.256.720657711


RMAN> run {
2> set newname for tempfile 1 to '+DATA';
3> switch tempfile all;
4> }

Install ASM single instance in its own home

ref: http://www.oracle.com/technology/obe/demos/10gr2/asm_install_diff_home.viewlet/asm_install_diff_home_viewlet_swf.html
http://www.oracle.com/technology/obe/demos/admin/demos.html

注意的几点:
1. oracleasm必须对oraInventory目录有写权限
2. oracleasm必须对安装文件runInstaller有执行权限

安装结果:

[oracleasm@CS-RHEL ~]$ id
uid=523(oracleasm) gid=501(oinstall) groups=501(oinstall),522(asmdba)
[oracleasm@CS-RHEL ~]$ ps -ef | grep ASM
523      11155     1  0 23:29 ?        00:00:00 asm_pmon_+ASM
523      11157     1  0 23:29 ?        00:00:00 asm_psp0_+ASM
523      11159     1  0 23:29 ?        00:00:00 asm_mman_+ASM
523      11161     1  0 23:29 ?        00:00:00 asm_dbw0_+ASM
523      11163     1  0 23:29 ?        00:00:00 asm_lgwr_+ASM
523      11165     1  0 23:29 ?        00:00:00 asm_ckpt_+ASM
523      11167     1  0 23:29 ?        00:00:00 asm_smon_+ASM
523      11169     1  0 23:29 ?        00:00:00 asm_rbal_+ASM
523      11171     1  0 23:29 ?        00:00:00 asm_gmon_+ASM
523      11463 11462  0 23:33 ?        00:00:00 oracle+ASM (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
523      11571 11534  0 23:34 pts/1    00:00:00 grep ASM
[oracleasm@CS-RHEL ~]$ env | grep SID 
ORACLE_SID=+ASM
[oracleasm@CS-RHEL ~]$ echo $ORACLE_HOME
/home/u02/app/oracle/product/10.2.0/asm_1
[oracleasm@CS-RHEL ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on Tue Jun 1 23:34:41 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> select name from v$asm_diskgroup;

NAME
------------------------------
DATA

SQL> select name from v$asm_disk;

NAME
------------------------------
DATA_0003
DATA_0002
DATA_0001
DATA_0000

Reading this week outside greader

LGWR process priorities on the Solaris platform

Increasing LGWR priority would be justified if:
1) High user waits for “log file sync”
*AND*
2) LGWR itself is NOT waiting significant time for “log file parallel write”
*AND*
3) OS statistic “OS Wait-cpu (latency) time” for LGWR’s session is a
significant amount of logwriter’s time (you’ve got to sample this over an
interval, subtract CPU and wait times from the interval and hopefully then you
can compare the remainder to the statistics value. And you need
timed_os_statistics=true or statistics_level=all for that)

..if those conditions aren’t satisfied, then you won’t gain from increasing
LGWRs priority even if your CPUs are 100% utilized.
http://www.freelists.org/post/oracle-l/LGWR-process-priorities-on-the-Solaris-platform,2

Increasing Priority of lgwr Process using _high_priority_processes

To Tanel’s and Bob Sneed’s point, RT cpu priority is kernel preemptive. LGWR can block the interrupts being serviced by executing in RT mode in the CPU. LGWR needs those interrupts to be serviced and those interrupts quite probably could be I/O completion interrupts for LGWR to complete commit/log file sync processing. This issue of waiter blocking the interrupts server leads to an incorrect priority anomaly, analogous to a waiter of a resource evicting the holder of a resource off the CPU.
On the other hand, in RAC, LMS processes will wait for LGWR to complete ‘log file sync’ if the block to be transferred is considered “busy”. But, LMS is also running in RT mode. So if the LGWR process is not running in RT mode, then the LMS process can evict LGWR from the CPU. This also leads to incorrect priority anomaly.
If there are enough CPUs, in RAC, I would prefer, to run both LGWR and LMS* processes in RT mode (and VKTM in 11g). Then fence the interrupts to a different processor set so that LGWR/LMS* can not block those interrupts. In this way we should be able to keep interrupts/LMS/LGWR priorities straight. [At the very least, run LGWR in FX 60 in Solaris so that LGWR does not suffer from TS class scheduling latencies.]
It is important to also realize that, default number of LMS processes are quite excessive. For example, one of my client site had 28 LMS processes and all 28 of them running in RT mode! You can imagine how much CPU usage will be used by those LMS processes. In my opinion, in most cases, 3 or 5 LMS processes per instance is good enough to service GC traffic. So, if the number of LMS processes are excessive, then it is probably worthwhile to reduce that and then also increase LGWR priority

Manly Men Only Use Solid State Disk For Redo Logging. LGWR I/O is Simple, But Not LGWR Processing
Without elevated scheduling priority for LGWR, it can often be the case that the processes LGWR wakes up will take the CPU from LGWR. Some platforms (Legacy Unix) support the notion of making processes non-preemptable as well. On those platforms you can make LGWR non-preemptable. Without preemption protection, LGWR can lose CPU before it has exhausted its fair time slice.
Once LGWR loses his CPU it may be quite some time until he gets it back. For instance, if LGWR is preempted in the middle of trying to perform a redo buffer flush, there may be several time slices of execution for other processes before LGWR gets back on CPU.
http://kevinclosson.wordpress.com/2007/07/21/manly-men-only-use-solid-state-disk-for-redo-logging-lgwr-io-is-simple-but-not-lgwr-processing/

Improving Performance by Using a Cartesian Join 2
http://hoopercharles.wordpress.com/2010/05/18/improving-performance-by-using-a-cartesian-join-2/

Oracle ASSM三级位图块结构
http://www.eygle.com/archives/2007/07/oracle_assm_level3_bmb.html

悬赏:寻找ASSM三级位图块
http://www.ixdba.com/html/y2007/m06/127-assm-level3-block.html

relationship between statistics in time model(10G)

除了如下包含关系,同一level的statistics也有部分包含也被包含的关系.比如DB CPU包含了harse的CPU, sql execute elapsed time 包含了部分 DB CPU

1) background elapsed time
      2) background cpu time
1) DB time
    2) DB CPU
    2) connection management call elapsed time
    2) sequence load elapsed time
    2) sql execute elapsed time
    2) parse time elapsed
          3) hard parse elapsed time
                4) hard parse (sharing criteria) elapsed time
                    5) hard parse (bind mismatch) elapsed time
          3) failed parse elapsed time
                4) failed parse (out of shared memory) elapsed time
    2) PL/SQL execution elapsed time
    2) inbound PL/SQL rpc elapsed time
    2) PL/SQL compilation elapsed time
    2) Java execution elapsed time

ref: http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/dynviews_2087.htm#G2030340

reading of this two week

Oracle automatic segment space management: ASSM internal structures
http://www.dba-oracle.com/art_builder_assm.htm

Not KEEPing
http://jonathanlewis.wordpress.com/2010/03/20/not-keeping/

Row Directory
http://jonathanlewis.wordpress.com/2009/05/21/row-directory/

Index too big
http://jonathanlewis.wordpress.com/2010/03/25/index-too-big/

heap block compress
http://jonathanlewis.wordpress.com/2010/03/30/heap-block-compress/

Failed Login
http://jonathanlewis.wordpress.com/2010/04/05/failed-login/

Daylight Saving Time Causes Performance Issues!
http://prodlife.wordpress.com/2010/04/02/daylight-saving-time-causes-performance-issues/

Block cleanout – fast or delayed.
http://www.jlcomp.demon.co.uk/cleanout.html

Snapshot too old and delayed block cleanout
http://www.ixora.com.au/q+a/0103/24224451.htm

Parse Calls
http://jonathanlewis.wordpress.com/2007/07/03/parse-calls/

Nutshell – 1
http://jonathanlewis.wordpress.com/2009/10/14/nutshell-1/

blogroll of last week

Bind Variable Peeking – execution plan inefficiency
http://blog.tanelpoder.com/2010/02/02/bind-variable-peeking-execution-plan-inefficiency/

Session Snapper
http://tech.e2sn.com/oracle-scripts-and-tools/session-snapper

Sometimes things are easy (Part 1): How to fix wrapped execution plan text?
http://blog.tanelpoder.com/2010/01/18/sometimes-things-are-easy-part-1-how-to-fix-wrapped-execution-plan-text/

Scripts for showing execution plans via plain SQL and also in Oracle 9i
http://blog.tanelpoder.com/2009/05/26/scripts-for-showing-execution-plans-via-plain-sql-and-also-in-oracle-9i/

SQL_ID is just a fancy representation of hash value
http://blog.tanelpoder.com/2009/02/22/sql_id-is-just-a-fancy-representation-of-hash-value/

Julian Dyke
Library Cache Internals

unlock and gather index stats

in 10.2, Oracle Database now automatically collects statistics during index creation and rebuild

COMPUTE STATISTICS In earlier releases, you could use this clause to start or stop the collection of statistics on an index. This clause has been deprecated. Oracle Database now automatically collects statistics during index creation and rebuild. This clause is supported for backward compatibility and will not cause errors.

but this is not true when the table statistics is lock, the index statistics can’t be gather until the table staticstics is unlocked

sys@dbinst1>  select index_name, index_type, last_analyzed from user_indexes where index_name like 'CR_SHMTRESULT_IDX04';

INDEX_NAME                     INDEX_TYPE                  LAST_ANALYZED
------------------------------ --------------------------- -------------------
CR_SHMTRESULT_IDX04            NORMAL                      2009-05-17 12:34:53

app_user1@dbinst1> exec dbms_stats.gather_index_stats(user,'CR_SHMTRESULT_IDX04');
BEGIN dbms_stats.gather_index_stats(user,'CR_SHMTRESULT_IDX04'); END;

*
ERROR at line 1:
ORA-20005: object statistics are locked (stattype = ALL)
ORA-06512: at "SYS.DBMS_STATS", line 10640
ORA-06512: at "SYS.DBMS_STATS", line 10664
ORA-06512: at line 1

 1* select table_name, STATTYPE_LOCKED from all_ind_statistics where index_name like 'CR_SHMTRESULT_IDX04'
app_user1@dbinst1> /

TABLE_NAME                     STATT
------------------------------ -----
CR_INTERMEDIATE_SHMTRESULT     ALL

app_user1@dbinst1>  select table_name, STATTYPE_LOCKED from all_tab_statistics where table_name like 'CR_INTERMEDIATE_SHMTRESULT';

TABLE_NAME                     STATT
------------------------------ -----
CR_INTERMEDIATE_SHMTRESULT     ALL

after unlock the table’s statistics, the index stats can be regather

app_user1@dbinst1> exec dbms_stats.unlock_table_stats(user,'CR_INTERMEDIATE_SHMTRESULT');

PL/SQL procedure successfully completed.

app_user1@dbinst1>  select table_name, STATTYPE_LOCKED from all_ind_statistics where index_name like 'CR_SHMTRESULT_IDX04'
  2  /

TABLE_NAME                     STATT
------------------------------ -----
CR_INTERMEDIATE_SHMTRESULT

app_user1@dbinst1> select table_name, STATTYPE_LOCKED from all_tab_statistics where table_name like 'CR_INTERMEDIATE_SHMTRESULT';

TABLE_NAME                     STATT
------------------------------ -----
CR_INTERMEDIATE_SHMTRESULT

app_user1@dbinst1> exec dbms_stats.gather_index_stats(user,'CR_SHMTRESULT_IDX04');

PL/SQL procedure successfully completed.

app_user1@dbinst1> select index_name, index_type, last_analyzed from user_indexes where index_name like 'CR_SHMTRESULT_IDX04';

INDEX_NAME                     INDEX_TYPE                  LAST_ANALYZED
------------------------------ --------------------------- -------------------
CR_SHMTRESULT_IDX04            NORMAL                      2010-03-24 16:17:33

finally, don’t forget to lock the table stats again

app_user1@dbinst1> exec dbms_stats.lock_table_stats(user,'CR_INTERMEDIATE_SHMTRESULT');

PL/SQL procedure successfully completed.

app_user1@dbinst1> select table_name, STATTYPE_LOCKED from all_ind_statistics where index_name like 'CR_SHMTRESULT_IDX04'
  2  /

TABLE_NAME                     STATT
------------------------------ -----
CR_INTERMEDIATE_SHMTRESULT     ALL

app_user1@dbinst1> select table_name, STATTYPE_LOCKED from all_tab_statistics where table_name like 'CR_INTERMEDIATE_SHMTRESULT';

TABLE_NAME                     STATT
------------------------------ -----
CR_INTERMEDIATE_SHMTRESULT     ALL

如何查看Tibco EMS Queue里面的MSG被那台机器吃掉

可以看到队列CC.BBC.PROC_REQ.0001.JOB_REQ.PREP.CUSSTDAK997.IN.QUE的msg都被来自bwappserver11的应用吃掉了

tcp://appsever:7262> show consumers queue=CC.BBC.PROC_REQ.0001.JOB_REQ.PREP.CUSSTDAK997.IN.QUE
                                                                                Msgs Sent
    Id   Conn User   T Queue                                                SAS Sent Size   Uptime
700145 360644 b2bapp Q CC.BBC.PROC_REQ.0001.JOB_REQ.PREP.CUSSTDAK997.IN.QUE +T-    0    0 30:32:25
700146 360644 b2bapp Q CC.BBC.PROC_REQ.0001.JOB_REQ.PREP.CUSSTDAK997.IN.QUE +T-    0    0 30:32:25
700147 360644 b2bapp Q CC.BBC.PROC_REQ.0001.JOB_REQ.PREP.CUSSTDAK997.IN.QUE +T-    0    0 30:32:25
700148 360644 b2bapp Q CC.BBC.PROC_REQ.0001.JOB_REQ.PREP.CUSSTDAK997.IN.QUE +T-    0    0 30:32:25
700149 360644 b2bapp Q CC.BBC.PROC_REQ.0001.JOB_REQ.PREP.CUSSTDAK997.IN.QUE +T-    0    0 30:32:25

tcp://appsever:7262> show connections type=Q full
L  Version   ID     FSXT  S  Host         IP address     User   ClientID
                                Sess Prod Cons TmpT TmpQ     Uptime

.
.
.
J  5.1.0 V10 360644 ---Q  +  bwappserver11 146.222.75.148 b2bapp BW-Main-Connection-queue-BBC-APP-ACK-BBC-APP-ACK

                                  38   13   25    0    0   31:36:38
.
.
.

拖沓懒散

我想不出什么合适的词语来形容这种失望的心情.
头灯的盖子怎么会不见了,怎么会找不到了,刚刚花了一个小时的时间从房间找到客厅,又从客厅找回房间,就是找不到那个头灯的盖子.倒是发现Kymair小音箱的电池盖子了,终于把放在地板上一个多月的冲锋裤拿到阳台的水桶的准备明天去洗洗了, 稍微整理了桌上一毛两毛五毛一块的硬币和纸币.
每一次我把东西随便仍随便塞的时候,似乎总是假惺惺的安慰,在我需要它的时候它会自动出现的.
但结果总是像今晚这样让人抓狂, 像第一次上船底顶时不知怎么回事一只筷子就不见了,最后不得不向队友借双筷子来吃饭.

经常提醒自己,对自己的东西用心一点.