Wednesday

Oracle VM Server – Sharing /OVS via NFS

In order to enable High Availability for a server pool you have to have access to shared storage using either clustered OCFS2 on SAN, or ISCSI storage, or NFS on NAS. Lets say you don’t have access to that type of infrastructure or simply don’t need some of the HA features (live migration, automatic restart of failed VMs). There is a way to take advantage of some of the other cluster features without setting this up.

Oracle VM Server comes packaged with a NFS server, so you can share out the /OVS directory and mount it on other servers. This will provide you with the ability to cold start VM’s on different servers in your server pool. This eases the headache of manually moving around VM’s if a server is maxed on resources or diskspace. 

In our case, this works well for our DEV environment since we create/destroy/park a lot of VM’s. We have some older servers which don’t have a lot of storage and that which is does have is dreadfully slow.  We have a new server with plenty of blazingly fast disk and even over NFS its faster than local.

Scenario: For the steps below I’m assuming you have at least two independent Oracle VM servers, meaning they both reside in their own server pools and have been added to VM Manager. The goal is to export the filesystem from server A, mount it on B and add B to the same server pool as A.

1. Disable the Firewall. Disable the iptables firewall or add a rule. Since this is a dev environment I disabled iptables.
[root@server_a ~]# service iptables stop
Flushing firewall rules: [ OK ]
Setting chains to policy ACCEPT: filter [ OK ]
Unloading iptables modules: [ OK ]

To disable iptables so it doesn’t start on the next reboot use chkconfig.  Chkconfig will show you which runlevels a service is enabled in and provide you with the ability to turn it off.

[root@vmserver_a ~]# chkconfig --list  iptables
iptables 0:off 1:off 2:on 3:on 4:on 5:on 6:off
root@vmserver_a ~]# chkconfig iptables off
[root@vmserver_a ~]# chkconfig --list iptables
iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off

2. Delete the server pool for server  B. Login to VM Manager and delete the server pool for B. Click on the Server Pools tab, select the Server Pool B and click on the Delete button. You will see a screen similar to the following:
clip_image002

Do not select either option and click on Yes.

3. Export /OVS on Server A. Add the following to the /etc/exports file on Server A.

/OVS vmserver_b.domain.name(rw,sync,no_root_squash)


4. Start NFS on Server A. Execute the following on server A.

[root@vmserver_a OVS]# service nfs start
Starting NFS services: [ OK ]
Starting NFS quotas: [ OK ]
Starting NFS daemon: [ OK ]
Starting NFS mountd: [ OK ]

Set NFS to start automatically on reboot:

[root@vmserver_a OVS]#  chkconfig nfs on

5. Mount NFS export from Server A on Server B
a. First stop the Oracle VM Server services on Server B.  Otherwise /OVS will be busy and you won’t be able to unmount it.

[root@vmserver_b init.d]# service ovs-agent stop
OVSAgentServer going down...
OVSAgentServer stopped.

[root@vmserver_b init.d]# service ovsrepositories stop
Stopping OVS Storage Repository Mounter...
OVS Storage Repository Mounter Shutdown: [ OK ]


b. Unmount /OVS and remount it as /OVS_OLD 

[root@vmserver_b init.d]# umount /OVS


c. Edit /etc/fstab to mount /OVS from vmserver_a. I also mounted the local /OVS directory under /OVS_OLD in case there were vm’s I wanted to transfer.

Fstab:
/dev/sda1 /OVS_OLD ocfs2 defaults 1 0
Vmserver_a:/OVS /OVS nfs rw,bg,hard,nointr,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0
[root@vmserver_b init.d]# mount /OVS_OLD
[root@vmserver_b init.d]# mount /OVS


d. Restart the OVS services on Server B

[root@vmserver_b init.d]# service ovs-agent start
OVSAgentServer starting...
OVSLogServer starting...
OVSAgentServer starting...
OVSMonitorServer starting...
OVSPolicyServer starting...
OVSAgentServer started.

[root@vmserver_b init.d]# service ovsrepositories start
Starting OVS Storage Repository Mounter...
OVS Storage Repository Mounter Startup: [ OK
]

6. Add Server B to Server A’s server pool. Login in to Oracle VM Manager, select the Servers tab and click on the Add button. You will see a screen similar to the one below:
clip_image004

Enter the Server Host/IP for Server B. If its different than the name you want to give it, then enter Server Name as well. Type in the Server Agent Password for Server B and check Virtual Machine Server for Server Type. Click on the Add button when complete.

On the Server Pools tab, in the column for Server Pool A you should see that it now 2 servers in its pool.

Note: I am not an NFS expert so if you have any recommendations on better settings or how to improve performance, feel free to leave a comment.  I searched for recommended NFS settings and a few sites referenced Kevin Closson’s blog. Compared with the settings I used initially, these offered quite an improvement.   There are other ways to improve NFS performance such as using multiple NIC’s but I haven’t look into that yet.

11g Fusion Install issues, don’t interrupt the install…!

Here are a couple of tips I found out the hard way with while installing fusion 11g.

1. SSO - At one particular stage I was in the process of installing single sign-on 10g following these instructions. After following the step listed at Section 10.2 step 5 I needed to perform some maintenance on the server. So I shutdown the environment, performed the work and restarted the servers.

Before I could continue my install I needed to start up the services such as the database, weblogic server, OID, etc. However, when I went to install OID it would not start properly. Within the OID logs I found:

[host: 0002] [pid: 8145] [tid: 0] Guardian: WARNING: Connected to incorrect OID base schema version, (version=10.1.4.0.1).

If I had been paying more attention to the installation steps I would have noticed that step 5 changed the directory version number in the repository to 10.1.4.0.1. After you install OID 10g, you re-run the inspre11.pl script with the –op3 flag to reset the value back to 11g.

Unfortunately OID needs to be running before you execute inspre11.pl, so I had to update the repository manually with the following SQL:

update ods.ds_attrstore set attrval = 'OID 11.1.1.1.0' where entryid =1 and attrname = 'orcldirectoryversion';
commit;

After that OID 11g started up fine and I could rerun the instpre11.pl script with the –op2 flag. Then I continued with the SSO install.

Another tip, review the following note before installing OID 10g:

Subject: Oracle Identity Management 10g (10.1.4.0.1) Release Notes Addendum

Doc ID:
465847.1

It lists some pre-reqs which are not in the install guide and some known issues. The main one for me is that libdb.so.2 was missing. The note describes how to resolve that issue.

2. Portal – The Portal install itself went without any issues, however, at the configuration wizard stage it would hang creating the weblogic domain. Sometimes I would see java.lang.OutOfMemoryError: PermGen space messages in the install log as well. I found the following notes:

Subject: FMW 11g “IDM” or “Portal/Forms/Reports/Discoverer” Configuration Wizards on 64bit Plantforms Hang at 0% ‘ Creating Domain’

Doc ID: 865462.1

Subject: FMW 11g ‘ Portal/Forms/Reports/Discoverer’ Config Wizard Fails – “Error creating ASInstance”.. Unable to Connec tot ‘Admin Server’ ..PermGen

I was using the 32bit version of Weblogic which ships with the Sun and JRockit JDK’s on top of OEL 5.3 64bit. The notes say that you must use 64bit FMW 11g with a 64bit JDK on a 64 bit OS. I downloaded the latest JRockit because it apparently doesn’t suffer from permgen issues but I still hit the same problem(althought no permgen messages in the log file..). It wasn’t until I downloaded the latest Sun JDK 1.6.0_16 that the configuration wizard was able to create the domain.

3. SOA – After installing Portal, SOA seemed to complete without a hiccup… Until I tried to login to SOA. At one point I noticed errors when trying to start the SOA domain with the startWebLogic.sh script:

javax.security.auth.login.FailedLoginException: [Security:090304]Authentication Failed: User weblogic javax.security.auth.login.LoginException: [Security:090301]Password Not Supplied

However, I wasn’t being prompted for a password. I could hardcode the password within the startWebLogic.sh script but it would still fail, however with another error:

<Sep 1, 2009 10:08:33 AM EDT> <Critical> <WebLogicServer> <BEA-000386> <Server subsystem failed. Reason: weblogic.security.SecurityInitializationException: Authentication for user weblogic denied
weblogic.security.SecurityInitializationException: Authentication for user weblogic denied

Weblogic checks for the existence of a boot.properties file upon startup. If it exists, it reads the username and password from this file and doesn’t prompt you. I deleted this file and after that I was prompted for the password.. I was hoping that the stored password was incorrect, causing my problems but no such luck.

I talked to Oracle support and they believed that something was corrupted and the next step would be to recreate the admin account. Since I am using the latest version, they haven’t tested the steps yet and said they’d get back to me in a couple of hours. I searched google and found a few hits with the same problem. A few people tried re-installing and somehow everything worked fine after that. I took a backup of my domain configuration and decided to try and recreate it.

This time during the install, I selected the optional configuration options for the Administration server and Managed servers. (http://download.oracle.com/docs/cd/E12839_01/doc.1111/e13925/config_screens.htm#CJAIIADH) I noticed the ports being used have already been taken by the Portal install. I chose new ports, finished the install and everything started fine.

I’m guessing my authentication errors are because my Portal domain was up and SOA was trying to authenticate against it. When installing Fusion you let the configuration wizards create the weblogic domains, you don’t create them with the weblogic assistants. I guess they aren’t smart enough to detect that ports are already in use, which is kind of surprising. This is my first FMW 11g install, so it is possible I messed something up the first time I tried to install SOA.

4. Repository Creation Utility – Another problem I hit was trying to run the RCU utility for SSO on a 64bit OEL 5 environment. I couldn’t get it to work. Since this doesn’t need to run very often I put it on a 32bit Redhat 4.7 environment and ran it from there without issues.

So after 4 SR’s and a fair bit of reading I finally have a Fusion Middleware 11g environment. I was surprised to find its much easier to install e-Business Suite than FMW!