Archive for November, 2006

OCFS version 1 odd behavior

Saturday, November 25th, 2006

Last week I’ve been involved in troubleshooting and getting a 2-node RAC database to work for a customer. (linux, RHEL3, oracle 10.2.0.2)

Among other things, I’ve found corruptions on an OCFS filesystem. In order to try to fix that, I wanted to run fsck. But before that I wanted a backup (always preserve the way back!). The client’s backup suite is Tivoli Storage Manager.

It appeared Tivoli Storage Manager (TSM) only was able to do the backup at the speed of about 5 megabyte per second. This immediatly reminded me of the O_DIRECT behavior of OCFS, but it had become night, the client wanted to work the next day and not too much time was left; meaning: though luck, that is what we have.

In order to see if the filesystem was corrupt from a database’s perspective, I decided to run db verify (dbv). During the db verify (on the other node of the RAC database cluster), I decided to tar a databasefile and see what my speed was. That appeared to be around 50 megabyte per second. That’s odd…that’s ten times faster than the TSM backup did. Little later, I saw the performance of my tar session decrease to the same 5 megabyte per second as TSM did. Little later I saw the db verify be ready….

That made me think: could it be db verify influences the IO done by another session on another file? Seems unlikely, but because db verify is non-intrusive, I decided to try to run it again. The outcome struck me: I got the “old” 50 megabyte per second performance back. ???

Little later I decided to find out which database files already were backuped by TSM and do a db verify on that node too. IO performance increase of ten times makes us wait much less! The same happened there: TSM backup performance increased to around 50 megabyte per second!!

The next day I thought about how this could be. Because of the lack of time, I didn’t fetch the code (OCFS is open source) and read through it. These are my findings:

- OCFS is optimised to deliver high speed just for the oracle database, using O_DIRECT calls
- OCFS is never meant to be a “normal” filesystem
- The documentation clearly states that only O_DIRECT enabled tools should be used to do file maintenance. There are even O_DIRECT versions of tar, cp, md5sum, dd, etc. on the website
- The ability to do IO using normal/non-O_DIRECT tools seems to more of a “workaround” than normal behavior
- If some O_DIRECT tool opens a file on the OCFS filesystem, it locks THAT file, but seems to provide other non-O_DIRECT processes the ability to better optimised IO

Introducing glassbox: J2EE troubleshooter

Thursday, November 16th, 2006

Ever wondered what’s happening inside the application server? Ever had a fight with your developers about performance of a J2EE (tomcat/jboss/oc4j/iAS/jrun/…)? Or had a fight with developers who blame the database while you think (or know) it is not causing the problems?

There is a solution for that: Glassbox.

Glassbox is an application which can be deployed in a J2EE (java) application server, which uses AOP (aspect oriented programming) to monitor behavior of the J2EE server, and outputs operations ordered by average response time (total responsetime / number of executions). Besides that (which on itself is a godsend for application server administrators), it highlights operations which exceed the threshold (default 1 second) and it highlights operations which throw an error.

The operation can be selected (conveniently using a browser), so specifications of this operation can be investigated. like:
- stack trace of the code executed
- input URL
- diagnosis
- number of times it ran within the threshold, exceeded the threshold and failed

This is how the glassbox application looks like:
Glassbox interface

Even more interesting, it can diagnose a slow database operation (as seen in the screen above), and display the SQL which is responsible:
Glassbox slow database diagnosis detail screen

The installation of glassbox is very straightforward for jboss, the procedure for installing it in the OC4J in iAS version 10.1.3 requires some tweaking of xml files. Contact me if you are interested in the installation procedure.

You can find glassbox, including the free download (and forums for support) here