Showing posts with label Oracle RDBMS. Show all posts
Showing posts with label Oracle RDBMS. Show all posts

Saturday

ORA-01722: Invalid Number - Weird Problem


We have an ADF application, which hooks into Portal and SOA running on Weblogic 10.3.6 in a Fusion Middleware environment.

A few weeks ago after pushing a new release of the code to production, users encountered an ORA-01722 error in some screens of the application.    We captured the SQL and could run it via SQL*Plus and Sql Developer with no errors.

We opened an SR with Oracle and they offered some suggestions and in the end recommended a patch to fix a shared cursor bug.   We haven't applied that patch yet because we are on the latest PSU for 11.1.0.7 and that patch isn't available for it.  So we've issued a patch request with Oracle.

The weird problem is, that when my co-worker starts the WebLogic Managed Server, the users do not encounter the ORA-01722 error.   The only difference is that I am ssh'ing from a Mac and he is on windows.   We ssh into the same generic account and execute the same steps.  We've also confirmed that I can restart the server a dozen times with the error and my co-worker has never had the problem. We've verified this a few times.

Today we had some more maintenance and I tried restarting the server again.    Users encountered the same ORA-01722 error.   I restarted the server again but this time from my windows computer using SecureCRT.   Error disappears.

The following is a diff between a mac and windows ssh session.   There isn't a huge difference.


< TERM=xterm
---
> TERM=xterm-256color
5c5
< SSH_CLIENT=X.X.X.X 57952 22
---
> SSH_CLIENT=X.X.X.X 65516 22
7c7
< SSH_TTY=/dev/pts/1
---
> SSH_TTY=/dev/pts/4
10c10
< LS_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:
---
> LS_COLORS=
16c16
< LANG=en_US.UTF-8
---
> LANG=en_CA.UTF-8
23c23
< SSH_CONNECTION=X.X.X.X 57952 X.X.X.X 22
---
> SSH_CONNECTION=X.X.X.X 65516 X.X.X.X 22


The only one that catches my eye is the LANG env variable but I honestly don't see that as having any affect on the application.   If there was an environmental difference I would expect to see other types of issues.  ORA-01722 is pretty specific.

Also, we can only reproduce this on our production server.   If I restart DEV/TEST from my Mac there are no issues.    During a future maintenance window I may try changing the different environment settings after I ssh in from my Mac just to see if the error re-occurs.

Regardless, we'll still proceed with the patch request just in case it happens again.  Weird Problem!

Tuesday

Lucky or not?

 

Flashback technology has been around for quite awhile.   However, today was the first day I actually ever had to use it in a real life scenario.   Does that mean I am lucky or not?   

A few minutes ago developer came by and asked if Oracle kept versions of pl/sql code in the database.   I replied no and asked why.   Turns out that he was working on some code, compiled it, etc and it didn’t work properly.   He didn’t have a copy of the original package.

I wasn’t sure if it would work but I thought I would give flashback query a try.  Here is the SQL I used:

1  select text from all_source
2  as of timestamp
3  to_timestamp('29-MAY-2012 15:50:00', 'DD-MON-YYYY HH24:MI:SS')
4 where type = 'PACKAGE BODY' and name = 'MYNAME’

I saved the output and sent it over to the developer.   He was very happy.   Always nice to end a day on a high note!

Monday

11.2.0.2 Database Online Mode patching

I've read on a few blogs, such as Pythians, about the new online mode patching for 11.2.0.2.   However, up until this point I have not seen any applicable patches.     Currently i'm in the process of putting together a patch analysis for the Grid Control Application Management Pack.    As part of the pre-reqs for an R12 environment I have to apply a number of database patches.

Two of the patches 10160615 and 12400751 are able to be applied in Online Mode.   Its too bad 2 other patches can't be..  Regardless its nice to see that some patches are now able to be applied while the system is up.

For more information on Online Mode (Hot Patching) take a look at Metalink note:
RDBMS Online Patching Aka Hot Patching [ID 761111.1]

Friday

AUDIT_TRAIL = DB, Portal and Grid Control


If your not familiar with Oracle’s auditing features then a good place to start is with Oracle’s Documentation.   I personally think Oracle’s documentation, especially for the database products is great. 

http://docs.oracle.com/cd/B28359_01/network.111/b28531/auditing.htm#autoId1

The above documentation linked to above describes what auditing is, why you would need it, etc.   There’s no need for me to repeat it here.

Our environment is pretty small, with a limited number of people who have access to the databases, etc.   So normally I have the audit trail disabled, however for this particular database we did need to increase logging to investigate some issues.  After we were finished, we used NOAUDIT to disable the extra logging we enabled.   However we left AUDIT_TRAIL set to DB.

Move to a few months later and on a routine scan of database performance I noticed a query consuming a fair amount of resources. 

SQL Details: gh9pd08vhptgr
SELECT TO_CHAR(current_timestamp AT TIME ZONE 'GMT', 'YYYY-MM-DD HH24:MI:SS TZD') AS curr_timestamp, COUNT(username) AS failed_count
FROM sys.dba_audit_session
WHERE returncode != 0 AND timestamp >= current_timestamp - TO_DSINTERVAL('0 0:30:00')



clip_image001

By looking at the screenshot above it seems pretty nasty but in reality it was just a small spike on the Grid Control Top Activity chart.    As I mentioned, our environment is small and our servers have more than enough horsepower, so it didn’t have a significant impact on performance. 

From Metalink note: Slow Performance Of DBA_AUDIT_SESSION Query From EM [ID 829103.1]  “There is a known performance issue with DBA_AUDIT_SESSION table per non-published Bug 7633167”

However with AUDIT_TRAIL=DB, LOGON and LOGOFF’s were still be recorded.  Since this database hosts our Portal schema it mean upwards of 200k of entries going into the sys.aud$ daily, for a grand total of 32 million rows!

The note provides two options for the query above, the first is to disable the Failed Login Count Metric, the second is to purge the sys.aud$ table.  Note 73408.1 How to Truncate, Delete, or Purge Rows from the Audit Trail Table SYS.AUD$

Since at the moment we don’t require auditing to be enabled, then I am simply going to truncate the sys.aud$ table and disable auditing by setting the database initialization parameter AUDIT_TRAIL=none and restarting the database at the next maintenance window.

If you do require auditing then you should setup a purging strategy.  If you need to do this, some good blog articles to read are:

http://www.pythian.com/news/1106/oracle-11g-audit-enabled-by-default-but-what-about-purging/
http://damir-vadas.blogspot.com/2010/06/auditing-database.html

Wednesday

Fusion Middleware and 11g DB Password Expiry

As a few DBA’s have noticed, the 11g database has password expiry enabled.  This is not entirely a bad thing, I am in favor of this move.  However, if your not aware of this change then it can cause you some problems with your Fusion Middleware (FMW) 11g environment.

Developers contacted me with

ORA-28001: the password has expired.  

Originally I didn’t even think of the repository accounts being an issue.  I assumed it was a password policy in Oracle Internet Directory (OID) or WebLogic accounts they created for deploying applications.     After those were verified, then the only thing left was the database.

A quick look at dba_users showed a couple of accounts already expired or in grace status:

USERNAME                       ACCOUNT_STATUS                   LOCK_DATE EXPIRY_DA
------------------------------ -------------------------------- --------- ---------
DCM                            EXPIRED                                    12-MAY-10
ORASSO_PS                      EXPIRED                                    10-MAY-10
DEV_PORTAL                     EXPIRED                                    10-MAY-10
ODSSM                          EXPIRED                                    10-MAY-10
ORASSO                         EXPIRED(GRACE)                             20-MAY-10



You can view the password policy of the database default profile by looking at dba_profiles:

SQL> select * from dba_profiles;


PROFILE                        RESOURCE_NAME                    RESOURCE LIMIT
------------------------------ -------------------------------- -------- ---------------------
DEFAULT                        FAILED_LOGIN_ATTEMPTS            PASSWORD 10
DEFAULT                        PASSWORD_LIFE_TIME               PASSWORD 180
DEFAULT                        PASSWORD_REUSE_TIME              PASSWORD UNLIMITED
DEFAULT                        PASSWORD_REUSE_MAX               PASSWORD UNLIMITED
DEFAULT                        PASSWORD_VERIFY_FUNCTION         PASSWORD NULL
DEFAULT                        PASSWORD_LOCK_TIME               PASSWORD 1
DEFAULT                        PASSWORD_GRACE_TIME              PASSWORD 7


I personally do not like to have password expiry setup for database level application accounts.   In most cases the passwords for these accounts can’t be changed without downtime, so its best to have a policy were once a quarter (or whatever your corporate standards are) to manually change these passwords.

Since individual end users do not have their own database level accounts I modified the default profile.    If this is not the case for your server, you may want to create a new profile for application users so that you can have separate password policies.

The command to alter the default profile is:

SQL> alter profile default limit PASSWORD_LIFE_TIME unlimited FAILED_LOGIN_ATTEMPTS unlimited;




The next task was to re-enable expired accounts.   To do this the password for these accounts need to be changed manually and I would recommend reusing the same password.  One thing I need to do is investigate password changes for FMW accounts and see if there are any dependencies.   NOTE:  If PASSWORD_REUSE_MAX is not set to UNLIMITED you may not be able to reuse the previous password.

Remember back to the Fusion Middleware installation, you were prompted to create passwords for a number of repository accounts.  If any of these accounts have expired either issue the

alter user <username> identified by <password>;


or login as each user and you’ll be prompted for a new password.  As noted above, use the previous password.

You may notice or find out the hard way, that you don’t have the passwords for some of these accounts.    If you take a look at the DBA_USERS query above you’ll notice the ORASSO, ORASSO_PS, and DCM users.    When these accounts are created they are assigned random passwords.   Use ldapsearch, changing the OrclResourceName parameter for each account you need to find the password for:

[oracle@myserver ~]$ ldapsearch -b "orclReferenceName=<SID>.world,cn=IAS Infrastructure Databases,cn=IAS,cn=Products,cn=OracleContext" -D cn=orcladmin -h <OID Server> -p 3060 -q OrclResourceName=ORASSO

Please enter bind password:
OrclResourceName=ORASSO,orclReferenceName=MYDB.world,cn=IAS Infrastructure Databases,cn=IAS,cn=Products,cn=OracleContext
orclflexattribute1=true
orclresourcename=ORASSO
objectclass=orclResourceDescriptor
objectclass=top
orclpasswordattribute=AECR63ZP



The current password is identified by orclpasswordattribute. Now you can reset the password for these accounts as you did with the others above.

Tuesday

Its just a simple change right?

Putting a database in archivelog mode? Set the destination, the archive log format, shutdown immediate, startup mount, alter database archivelog, alter database open, alter system archive log current. Done.

Except, alter system archive log current failed:


ORA-19504: failed to create file "/u02/archive/ORCL/ORCL_1_1_686533687.dbf"
ORA-27040: file create error, unable to create file
Linux Error: 13: Permission denied



Huh, thats ok. This is an Oracle EBS environment and I created the archive destination directory with the applmgr user, not oracle. No problem, su - root, chown oracle:dba . , done.

alter system archive log current again, success!

Can't remember what query I issued next (its been a hazzy night) but I was greeted with the following error:


Errors in file /u01/oracle/ORCL/db/tech_st/10.2.0/admin/ORCL_myserver/bdump/orcl_smon_10547.trc:
ORA-00376: file 5 cannot be read at this time
ORA-01110: data file 5: '/u01/oracle/ORCL/db/apps_st/data/system05.dbf'



WTF? Just some background about this environment. I only cloned it the day before and I didn't have any backups. I was putting the database in archivelog mode so I could take backups.

This error typically means the datafile is offline, tho i'm not sure why. So I bring it online:


alter database datafile 5 online;
recover datafile 5;



But:


ORA-00283: recovery session canceled due to errors
ORA-00264: no recovery required


Hrmm.. ok, so I try to open the database:

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel



WTF? Check the alert log:


Errors in file /u01/oracle/ORCL/db/tech_st/10.2.0/admin/ORCL_rc1657/bdump/uat2_lgwr_11557.trc:
ORA-00600: internal error code, arguments: [3712], [1], [1], [1388],
[3796296244], [1388], [3796296241], []



Lovely. At this point I have 3 options:

1. Work with Oracle Support.
2. Rebuild the environment since I don't have any backups. Developers lose 1 day.
3. Back it up, research the error and see if I can find a solution.

I worked with Oracle support up until 6:30am, at which point I believe my analyst was finished for the day and went home. This is a UAT environment, so the SR is a P2. I wasn't expecting 24x7 support, so this was just bad timing.

I searched metalink and didn't find any solid hits. So I tried google. Not many hits, but I received one exact match, in chinese. Google graciously offered to translate the page for me and about the only useful text on the page was:

"Who came across the internal bug? 我用了alter database clear logfile group 后再重启动数据库就发生这个错误了。 I used the alter database clear logfile group restart the database after this error has occurred. "

At this point I had an hour left before testers started and only 2 options remaining. I figure I was going to have to rebuild the database anyways, so i'd try clearing the log files. I cleared the inactive groups without issue. The only group remaining was the current and if the clear logfile group command failed i'd have to proceed with the rebuild.

From the alert log:

alter database clear unarchived logfile group 1
Tue May 12 08:14:34 2009
Thread 1 advanced to log sequence 6 (preparation to clear logfile)
WARNING! CLEARING REDO LOG WHICH HAS NOT BEEN ARCHIVED. BACKUPS TAKEN
BEFORE 05/12/2009 08:14:34 (CHANGE 1) CANNOT BE USED FOR RECOVERY.
Clearing online log 1 of thread 1 sequence number 5
Tue May 12 08:17:34 2009
Completed: alter database clear unarchived logfile group 1



Success!!!

Alter database open; Success!!!


Thats my 6 hr simple change (most of that time was working with Oracle Support, taking a backup [only 5MB/s IO??!] of the environment before I played around with it, etc.) Now I need some sleep. You know your tired when you almost use dish washing liquid for cream in your coffee. Luckily an attentive coworker prevented disaster! Normally an all nighter doesn't leave me this drained but the previous night I only had 3 hrs sleep.

Thursday

WARNING: Oracle executable binary mismatch detected.

Interesting message appeared in my test environment alert log:

WARNING: Oracle executable binary mismatch detected.
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages


The message appeared to be pretty obvious.. For some reason it thinks the Oracle binary has changed somehow.
Note: 369260.1 describes the error and mentions that there could be many reasons but doesn't actually list them or refer to other notes which may contain more information. To suppress the message you can issure the alter system command as per the error message above.

I had to shutdown the environment to apply the patch, so I watched the alert log after startup to see if the error would reappear. I haven't made any changes to this environment, so i'm not sure what may have caused this message to appear. If it happens again i'll open an SR.

Have you seen this problem before? If so, did you find a root cause?

Flashback Drop – 2 Tables with the same name.

Flashback drop allows you to restore a table without having to perform a point-in-time recovery. In 10G the DROP DDL command has been changed internally to a rename the table and its associated objects (excluding foreign key constraints). Its a feature that I personally never have had to use. Although, leaving this feature enabled did cause some performance issues for me. (See: Previous blog posts about the Recycle bin feature.)

The basic syntax for restoring a dropped table is simply:

SQL> flashback table <tablename> to before drop;
Flashback complete.

Simple enough. But what if since the table was dropped, another table was created with the same name? By default, Flashback drop restores the most recent version of the table. In order to restore a previous copy you need to specify the recycle bin name. For example:

SQL> drop table EMP;

Table dropped.

SQL> create table EMP (name varchar2(50));

Table created.

SQL> insert into EMP values ('John Doe');

1 row created.

SQL> commit;

Commit complete.

SQL> select * from emp;

NAME
--------------------------------------------------------------------------------
John Doe

SQL> drop table emp;

Table dropped.

SQL> select object_name, original_name, type, droptime from user_recyclebin;

OBJECT_NAME
--------------------------------------------------------------------------------
ORIGINAL_NAME
--------------------------------------------------------------------------------
TYPE
---------------------------------------------------------------------------
DROPTIME
---------------------------------------------------------
BIN$Yr7gB0huGergRAADuvZ6hQ==$0
EMP
TABLE
2009-02-12:14:39:14

BIN$Yr7gB0hvGergRAADuvZ6hQ==$0
EMP
TABLE
2009-02-12:14:39:53


SQL> flashback table "BIN$Yr7gB0huGergRAADuvZ6hQ==$0" to before drop;

Flashback complete.

SQL> desc emp;
Name Null? Type
---------------- -------- ----------------------------
EMPNO NOT NULL NUMBER(4)
ENAME VARCHAR2(10)
JOB VARCHAR2(9)
MGR NUMBER(4)
HIREDATE DATE
SAL NUMBER(7,2)
COMM NUMBER(7,2)
DEPTNO NUMBER(2)

Tuesday

Windows, Firewalls and Redirects

We are in the middle of a network migration and we quickly discovered issues connecting to databases on Windows servers.   The two networks normally can’t talk to each other but to aid in the migration we placed some of the servers behind a NAT-enabled router.  You could ping the listener (tnsping) but when a connection was attempted the client would hang.  (SQL*Plus, Toad, etc)

One of our network admins noticed in a network trace that the listener would redirect the client to a specific port (which is normal) but it would tell the client to use the servers IP Address, not the NAT address.  So the client would try and connect to an IP address that didn’t ‘exist’.

There are two possible solutions to this problem.  The first, is to modify the firewall and enable SQLNet Inspection but it wasn’t successful for us.  We could connect to the database and execute small queries but large ones would hang after a certain amount of data was reached.   We disabled this feature and while the network team was investigating we tried the second option.

The second option is to add the parameter:

USE_SHARED_SOCKET = TRUE

to the servers registry (regedit) under \\HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOME<#>.   When this feature is enabled, it tells the Listener to spawn a new thread on the listening port.   As new connections arrive, they spawn threads as well, so in the end you have the listener and a number of connections all using port 1521.  The disadvantage to this option is that bouncing the listener also disconnections all sessions.

This option worked and since this setup is temporary we’ll stick with it.

Thursday

ORA-01555 and Automatic Undo

In the past couple of years of support Oracle E-Business Suite I have only hit ORA-01555: snapshot too old twice... About a year ago and again today.

Before you start to troubleshoot you need to gather some information:
  • If your unaware of the current settings for undo you should check. One way is to use the show parameter command but you can also check the database initialization parameter file or spfile.


    SQL> show parameter undo

    NAME TYPE VALUE
    ------------------------------------ ----------- -------------
    undo_management string AUTO
    undo_retention integer 1200
    undo_suppress_errors boolean FALSE
    undo_tablespace string APPS_UNDOTS1


  • Determine which time the error occurred from the alert log, which will help you narrow down the time for a query we will use later. You alert log should have an entry similiar to:

    Sat Aug 16 23:05:41 2008
    ORA-01555 caused by SQL statement below (Query Duration=1506 sec, SCN: 0x0000.a8d6d835):
    Sat Aug 16 23:05:41 2008
    SELECT X FROM Y WHERE A=B

Now that you the automatic undo settings and the time of the error execute the following query (substitute the timestamp of the error in your alert log in place of the one below, with some buffer at either end):

SQL> select to_char(begin_time,'MM/DD/YYYY HH24:MI') begin_time, UNDOBLKS, UNXPSTEALCNT, EXPSTEALCNT ,
SSOLDERRCNT, NOSPACEERRCNT, MAXQUERYLEN
from v$undostat
where begin_time between to_date('08/16/2008 22:30:00','MM/DD/YYYY HH24:MI:SS')
and to_date('08/16/2008 23:30:00','MM/DD/YYYY HH24:MI:SS')
order by begin_time;


BEGIN_TIME UNDOBLKS UNXPSTEALCNT EXPSTEALCNT SSOLDERRCNT NOSPACEERRCNT MAXQUERYLEN
---------------- -------- ------------ ----------- ----------- ------------- -----------
08/16/2008 22:32 10394 0 0 0 0 513
08/16/2008 22:42 5693 0 0 0 0 212
08/16/2008 22:52 10853 0 0 0 0 451
08/16/2008 23:02 8950 0 0 0 0 1653
08/16/2008 23:05 11284 0 0 1 0 1506

5 rows selected.


Based on this query we can determine a couple of things such as the number of transactions which required UNDO during the time period. How many undo blocks they required and whether there were any (successful) atempts to steal undo space from other transactions.

The query also shows which transactions received ora-1555 errors (SSOLDERRCNT) and whether or not there was enough space in the UNDO tablespace (NOSPACEERRCNT). If you look at the row above with a timestamp of 23:05, this is the transaction which produced my ora-1555 error. We know this because the SSOLDERRCNT column for this row has the value 1 and the timestamp is the same as the one in the log file. Since the MAXQUERYLEN for this transaction (1506) is greated than the undo_retention (1200) one solution would be to increase undo_retention.*

*There are other ways to solve this problem to try and avoid ora-1555 errors such as transaction tuning, etc. More detail can be found in Note:10630.1

For this particular case I will set it to 2000 and monitor the undo tablespace. By increasing undo_retention undo is kept for a longer period of time, so you should monitor the undo tablespace to make sure you have enough room. Oracle 9i+ has advisors which can help you determine the impact of raising undo_retention. The following screen shot is taken from Oracle 9i Enterprise Manager but you can also find better graphs in 10G Database or Grid Control:




*Note: this graph is not from same database as above. Its just for informational purposes.

We can determine from this graph if we were to double undo retention from 30 to 60 minutes we would need at most 70MB of space. This estimate is based on undo activity since the database was initially started.

If the UNDO tablespace was too small, you would see a value in the NOSPACEERRCNT column. Take a look at the following row with a timestamp of 23:22:


BEGIN_TIME UNDOBLKS UNXPSTEALCNT EXPSTEALCNT SSOLDERRCNT NOSPACEERRCNT MAXQUERYLEN
---------------- -------- ------------ ----------- ----------- ------------- -----------
08/16/2008 23:22 1034 10 0 1 1 1428

This row has a values for UNXPSTEALCNT and NOSPACEERRCNT. This means the transaction was not able to obtain space from the UNDO tablespace and attempted to steal 10 blocks (UNXPSTEALCNT) from other transactions. In this case we need to add space to the UNDO tablespace because it was not large enough to meet the undo_retention requirement.

Wednesday

Recycle bin and Fragmentation

Yesterday I posted about the recycle bin and how if you don't purge, it may affect performance of queries touching dba_free_space. We have had fragmentation issues and growth issues in certain tablespaces for awhile now.. We tried a couple of things to resolve it but they didn't work. I've been pushing to migrate our 11i environment to OATM (Oracle Applications Tablespace Model) for awhile thinking it would resolve our fragmentation issues by having a lower number of tablespaces with uniform extent sizes.

One of the tablespaces with fragmentation problems is GLD and GLX, data and index tablespaces for the GL module. As I noticed yesterday, most of the objects in the recycle bin were owned by GL and shortly after I posted the article I wondered if recycle bin was the source of our problems here as well.

Here is a query from dba_free_space which shows you the available free chunks of space and how many of them there are:


SQL> Select bytes/1024/1024, count(*)
2 From dba_free_space
3 Where tablespace_name = 'GLX'
4 group by bytes/1024/1024
5 /

BYTES/1024/1024 COUNT(*)
--------------- ----------
.0390625 25351
55.5859375 1
.1953125 9
.078125 10072



Lots of very small unusable chunks. So what happens to the chunks of free space if I purge the recycle bin?


1 Select bytes/1024/1024, count(*)
2 From dba_free_space
3 Where tablespace_name = 'GLX'
4 group by bytes/1024/1024
5* order by 1 DESC
SQL> /

BYTES/1024/1024 COUNT(*)
--------------- ----------
202.929688 1
138.984375 1
122.578125 1
107.109375 1
104.726563 1
103.554688 1
90.4296875 1
81.484375 1
[SNIP]


After the purge the largest chunk of same size fragments is 6. Quite a significant difference.... So the question is, why isn't Oracle reusing space of objects in the recycle bin? The answer is very simple. We have a custom alarm script which monitors next extent sizes and runs twice a day. If the alarm triggers, a page is sent to the DBA on call and typically they should perform the following steps:

1. Check and see how much free space is left in the tablespace.
2. Check the next_extent size of the object and make sure its not an unreasonably large request. (Some objects still have pct_increase > 0. Grr!)
3. If number 2 is true resize the next_extent, otherwise add more space to the tablespace.

Recycle bin is a "non-instrusive" feature and space will be reclaimed if:*
  • A user creates a new table or adds data that causes their quota to be exceeded.

  • The tablespace needs to extend its file size to accommodate create/insert operations


*Note: 265253.1

So, when Oracle comes along and needs to grow an object, we have already added a large chunk of space. Logically, Oracle will fulfill its request from this new chunk before reclaiming space from the recycle bin. If i'm mistaken in this conclusion please correct me.

I wonder if we were using OATM with uniform extent sizes would I have noticed this sooner? If I had 25000 extents of a uniform size (which would be the same as the request size) showing in dba_free_space i'm sure I would have investigated why they weren't being reused.

While I was reading up on the recycle bin and flashback drop preparing my last post I didn't recall seeing any gotcha's like this. So heads up, if your using GL (or have an application which using alot of interim tables) then this is a potential gotcha.

HotSpot Virtual Machine Error : 11

I've been working with Oracle for quite a while now and have more than my share of installs under my belt. Today a co-worker emailed me with the following error produced by the 10.2.0.1 runInstaller.

All installer requirements met.

Preparing to launch Oracle Universal Installer from /tmp/OraInstall2007-10-23_11-51-47AM. Please wait ...$ Starting Installer in advanced mode ...
Oracle Universal Installer, Version 10.2.0.1.0 Production
Copyright (C) 1999, 2005, Oracle. All rights reserved.


Unexpected Signal : 11 occurred at PC=0xFECD9110
Function=[Unknown. Nearest: JVM_MonitorWait+0x1F24]
Library=/tmp/OraInstall2007-10-23_11-51-47AM/jre/1.4.2/lib/sparc/client/libjvm.so

Current Java thread:
at oracle.sysman.oii.oiip.osd.unix.OiipuUnixOps.chgrp(Native Method)

.
.
.

#
# HotSpot Virtual Machine Error : 11
# Error ID : 4F530E43505002EF 01
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# Java VM: Java HotSpot(TM) Client VM (1.4.2_08-b03 mixed mode)
#
# An error report file has been saved as hs_err_pid27925.log.
# Please refer to the file for further information.


I'm still amazed that after hundreds of installs you can still hit a new error. In this case the oraInst.loc file, which points to the central inventory location, was not setup correctly. There was an old location which they commented out, but they forgot to specify a new one.

Ex. Sample oraInst.loc (On unix servers this file is normally stored in /var/opt/oracle or /etc)

#Oracle Installer Location File Location
#Wed Apr 30 14:51:42 EDT 2007
inventory_loc=/home/oracle/oraInventory
inst_group=dba


What is the Central Inventory? The Central Inventory stores information for all Oracle products installed on the server. Inside the inventory location there is an xml file (inventory.xml) which lists all of the ORACLE_HOMEs.

Ex. <Central Inventory Path>/ContentsXML/inventory.xml (In this example, using the path from the oraInst.loc file above you would look in /home/oracle/oraInventory/ContentsXML/inventory.xml)

<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 2001 Oracle Corporation. All rights Reserved -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<VERSION_INFO>
<SAVED_WITH>2.2.0.19.0</SAVED_WITH>
<MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="APPSDB_ORA" LOC="/app/oradb/920_64bit" TYPE="O" IDX="1"/>
<HOME NAME="APPSDB_ORA1" LOC="/app/oradb/1020_64bit" TYPE="O" IDX="2"/>
<HOME NAME="APPSIAS_ORA" LOC="/apps/appsora/oraora/iAS" TYPE="O" IDX="3"/>
</HOME_LIST>


As you can see, this server has both 9i and 10g database software and an application server. Inside each of these ORACLE_HOMEs there is a Local Inventory. The local inventory stores the list of components, patchsets and interim patches installed in that particular ORACLE_HOME. The file in this case is named comps.xml

ex. <Local Inventory Location>/ContentsXML/comps.xml (This file is rather large so just a small clip. This is from the 920_64bit ORACLE_HOME.)

<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 2001 Oracle Corporation. All rights Reserved -->
<!-- Do not modify the contents of this file by hand. -->
<PRD_LIST>
<TL_LIST>
<COMP NAME="oracle.server" VER="9.2.0.1.0" BUILD_NUMBER="0" REP_VER="0.0.0.0.0" RELEASE="Production" INV_LOC="Components/oracle.serv
er/9.2.0.1.0/1/" LANGS="en" XML_INV_LOC="Components21/oracle.server/9.2.0.1.0/" ACT_INST_VER="2.2.0.12.0" DEINST_VER="2.2.0.4.0" INS
TALL_TIME="2004.Dec.16 14:43:30 EST" INST_LOC="/mnt/app/visdb/920_64bit/oracle.server">
<EXT_NAME>Oracle9i Database</EXT_NAME>
<DESC></DESC>
<DESCID>COMPONENT_DESC</DESCID>
<DEP_GRP_LIST>
<DEP_GRP NAME="Optional" TYPE="O"/>
<DEP_GRP NAME="Required" TYPE="R"/>
</DEP_GRP_LIST>
<DEP_LIST>
<DEP NAME="oracle.rdbms" VER="9.2.0.1.0" DEP_GRP_NAME="Optional" HOME_IDX="1"/>

By default Oracle looks for the oraInst.loc file in /var/opt/oracle or /etc depending on your platform. But you can change the location if you wish, you'll just need to remember to specify the -invPtrLoc parameter with opatch and the runInstaller.

ex. opatch lsinventory -invPtrLoc <path to your>/oraInst.loc
runInstaller -invPtrLoc <path to your>/oraInst.loc


This relates back to an earlier post I made which shows you how to recreate your central inventory in an EBS environment.

Tuesday

TIP: Tracing a session via a logon trigger

Awhile back I found a tip which has come in handy quite a few times. Just recently I was troubleshooting an ORA-03124 error which occurred during a run of a BI tool. The BI tool spawns many sessions while gathering and processing data so its not practical to sit there, grab the sessions information and turn on tracing manually.

From Oracle Magazine Tip for Week of June 26, 2006:
CREATE OR REPLACE TRIGGER ON_LOGON_SCOTT
AFTER LOGON ON DATABASE
WHEN ( USER = 'SCOTT' )
BEGIN
execute immediate 'alter session set events ''10046 trace name context forever, level 12''';
EXCEPTION
WHEN OTHERS THEN
NULL;
END;


Of course, don't forget to disable it after!

AWR Operation failed

I know most are talking about 11g but we finally upgraded our production EBS database to 10g a few weeks ago. A few months ago, while a fellow DBA was on vacation, I used 10g's diagnostics features to troubleshoot a hanging issue in their data warehouse DB. The historical view which allows you to see what was happening in a database during a window of time is priceless and easily helped me resolve the issue. How often do users call you as they are experiencing performance problems? Never...! I'll be in a weekly meeting and users will say that they experienced issues 3 days ago! So I was looking forward to this ability.

Much to my surprise tho when I clicked on the performance tab in database control, the data was a couple of days old. 10 snapshots after the upgrade to 10g finished, it suddenly stopped working. I checked the alert log, trace files, etc but the only error I could find was if I tried to change the AWR settings. ie, changing the snapshot interval would consistently produce the following error:


ORA-13516: AWR Operation failed: only a subset of SQL can be issued ORA-06512: at
"SYS.DBMS_WORKLOAD_REPOSITORY", line 10 ORA-06512: at
"SYS.DBMS_WORKLOAD_REPOSITORY", line 33 ORA-06512: at line 1


A search of Metalink didn't result in any good hits so I opened a tar. An RDA and a few updates later they asked me to execute the following query to see if AWR was in restricted mode. In our case, it was not:

SQL> select nam.ksppinm name, val.KSPPSTVL, nam.ksppdesc description
2 from x$ksppi nam, x$ksppsv val
3 where nam.indx = val.indx AND
4 nam.ksppinm = '_awr_restrict_mode'
5 order By 1
6 /

NAME
--------------------------------------------------------------------------------
KSPPSTVL
--------------------------------------------------------------------------------
DESCRIPTION
--------------------------------------------------------------------------------
_awr_restrict_mode
FALSE
AWR Restrict Mode


The only other suggestion was to restart the database. This past weekend we had to apply some one-off patches so we took advantage of the maintenance window and restarted the database. Since then it has been working properly. I'm not a big fan of this kind of solution, I'd rather know the root cause and if its preventable. But in this case it seems Oracle Support is not sure either.