Introducing pgstatspack, a statistics package for postgresql

May 10th, 2008

When performance or behavior during the development or test phase is not what you’ve expected, the normal way to go is explain or explain analyze the statements involved.

But how about production environments? And how can you tell what has happened? The most common scenario is probably when the behavior is unacceptable, so instant action is required. That is not the way you, as a DBA, want to be notified how about the dissatisfaction of the performance.

Some time ago, glenn fawcett of sun microsystems encountered probably a situation in which he wanted to know what was going on on a database level. He also had experience with the oracle database, and with a statistics package called “statspack”. What glenn did, was emulate some of the functionality of statspack, and create a version for postgresql, heavily focussed on logical IO and physical IO.

Later, this package was lost, and corresponding page also. In fact, I managed to get the package using the internet time machine (http://www.archive.org/web/web.php). Glenn recently got notified of this fact, and created a post about the package on his blog.

Well, I got into the same situation as Glenn. I have a postgresql database, and want to know what’s going on. In fact, I want to know more than just logical and physical IO, I want to know everything that is available. I also want to know postgresql version, database size, parameters, etc. Also, I want to make use of it to do analysis on daily basis of the snapshots, so I can do capacity management and do predictions about growth and performance.

That is why I’ve enhanced pgstatspack, and made it available open source: http://pgfoundry.org/projects/pgstatspack

Enjoy!

Playing with ASM disks on linux, part 2

February 20th, 2008

In the previous article about ASM disks we investigated the difference between an ASM disk used “raw” (using the blockdevice nowadays ;-) , no more /dev/raw/*), and an ASM disk used via ASM-lib. The difference is an ASM-lib name at offset 0×28 of the device.

According to Alejandro Vargas at Oracle, ASMlib has to following advantages:

ASMLib takes care of
1) Device name labels
2) Device ownership and permissions management
3) Async I/O management
4) I/O optimization
5) Sanity checkups

Device name labels
When devices are used as ASM-raw disks, they are being given a name (which is diskgroupname underscore disknumber, unless specified explicitly by the creator), but that’s the ASM disk name visible for the ASM instance, and only if the disk is mounted.
There is a specific ASM-lib name (at offset 0×28 of the device), which is visible in the path field (which means a disk is called ‘ORCL:YOURNAME’ instead of ‘/dev/sdb1′) in v$asm_disk.

Because devices are found using this name, it would suggest that this solves the known issue with device naming changes (a disk device in linux could point to a different disk after reboot, mainly after hardware changes) in linux. Is this really an issue?

Let’s move the contents of the /dev/sdb1 partition to the /dev/sdd1 partition and see if we get into trouble:


SQL> select g.name, d.name, d.path from v$asm_diskgroup g, v$asm_disk d where d.group_number=g.group_number;

NAME NAME PATH
------------------------------ ------------------------------ ------------------------------
DATADG YABBA /dev/sdb1

I’ve set ownership of the /dev/sdd1 device to oracle, the asm_diskstring is set to ‘/dev/sd*’, so we should see the other disk:

SQL> select group_number, disk_number, mount_status, header_status, mode_status, state, path from v$asm_disk;

GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE PATH
------------ ----------- ------- ------------ ------- -------- ------------------------------
0 0 CLOSED CANDIDATE ONLINE NORMAL /dev/sdd1
1 0 CACHED MEMBER ONLINE NORMAL /dev/sdb1

Now shutdown the instance, copy the contents of /dev/sdb1 to /dev/sdd1, and set ownership of /dev/sdb1 back to root. (which will make the /dev/sdb1 device invisible to the ASM instance.)

SQL> shut
ASM diskgroups dismounted
ASM instance shutdown
SQL>

The ASM instance is down. Now do some actions as root:

# dd if=/dev/sdb1 of=/dev/sdd1 bs=1M
5114+1 records in
5114+1 records out
5362850304 bytes (5.4 GB) copied, 707.67 seconds, 7.6 MB/s
# chown root /dev/sdb1

And start the ASM instance again:

SQL> startup
ASM instance started

Total System Global Area 284565504 bytes
Fixed Size 1299428 bytes
Variable Size 258100252 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted

Above we see the diskgroups (diskgroup actually; DATADG) are mounted. This suggests the ASM instance has picked up the other disk! Let see:

SQL> select g.name, d.name, d.path from v$asm_diskgroup g, v$asm_disk d where d.group_number=g.group_number;

NAME NAME PATH
------------------------------ ------------------------------ ------------------------------
DATADG YABBA /dev/sdd1

Look! The ASM instance picked up the ASM disk from another device!

This means we are not depended upon an extra naming facility from ASMlib for remapping of devices; the ASM instance itself is perfectly capable of handling device changes.

Device ownership and permissions management
This is very true; it is convenient to have a service wich reads the disk/device headers and makes them accessible to the ASM instance. In fact, that is what I see as the big advantage of ASMlib

It is more work the setup the udev rules file, but it isn’t a very big problem to setup the devices in /etc/udev/rules.d/50-udev.rules on the other hand…

Async I/O management
I am not sure what this means.
Upon reading that, it suggests asynchronous IO will be a problem without ASMlib.
AFAIK, asynchronous IO is enabled by default in the oracle executable, at least since version 10g. Let’s see:

-Check if the asynchronous library is linked with the oracle executable:

ldd /oracle/db/11.1.0.6/bin/oracle
linux-gate.so.1 => (0x00d8d000)
libskgxp11.so => /oracle/db/11.1.0.6/lib/libskgxp11.so (0x00595000)
librt.so.1 => /lib/librt.so.1 (0x00d47000)
libnnz11.so => /oracle/db/11.1.0.6/lib/libnnz11.so (0x00138000)
libclsra11.so => /oracle/db/11.1.0.6/lib/libclsra11.so (0x00aa2000)
libdbcfg11.so => /oracle/db/11.1.0.6/lib/libdbcfg11.so (0x0068e000)
libhasgen11.so => /oracle/db/11.1.0.6/lib/libhasgen11.so (0x009f6000)
libskgxn2.so => /oracle/db/11.1.0.6/lib/libskgxn2.so (0x00634000)
libocr11.so => /oracle/db/11.1.0.6/lib/libocr11.so (0x002e2000)
libocrb11.so => /oracle/db/11.1.0.6/lib/libocrb11.so (0x00c85000)
libocrutl11.so => /oracle/db/11.1.0.6/lib/libocrutl11.so (0x00110000)
libaio.so.1 => /usr/lib/libaio.so.1 (0×00119000)
libdl.so.2 => /lib/libdl.so.2 (0×0011b000)
libm.so.6 => /lib/libm.so.6 (0×00358000)
libpthread.so.0 => /lib/libpthread.so.0 (0×0011f000)
libnsl.so.1 => /lib/libnsl.so.1 (0×0037f000)
libc.so.6 => /lib/libc.so.6 (0×00396000)
/lib/ld-linux.so.2 (0×00b25000)

Well, the asynchronous library is linked with the oracle executable, that is for sure.
It *could* ignore the asynchronous library and still do synchronous calls.

Let’s see how the ASM client, my yoda database, is doing IO against ASM:
The checkpoint process issues a read and a write to the controlfile every 3 seconds:
(editted output of strace -p )

...
18:41:48 io_submit(1114112, 1, {{0x7bb110, 0, 1, 0, 18}}) = 1
18:41:48 io_getevents(1114112, 1, 128, {{0x7bb110, 0x7bb110, 16384, 0}}, {600, 0}) = 1
...
18:41:48 io_submit(1114112, 1, {{0x7bb110, 0, 0, 0, 18}}) = 1
18:41:48 io_getevents(1114112, 1, 128, {{0x7bb110, 0x7bb110, 16384, 0}}, {600, 0}) = 1
...
18:41:51 io_submit(1114112, 1, {{0x7bb110, 0, 1, 0, 18}}) = 1
18:41:51 io_getevents(1114112, 1, 128, {{0x7bb110, 0x7bb110, 16384, 0}}, {600, 0}) = 1
...
18:41:51 io_submit(1114112, 1, {{0x7bb110, 0, 0, 0, 18}}) = 1
18:41:51 io_getevents(1114112, 1, 128, {{0x7bb110, 0x7bb110, 16384, 0}}, {600, 0}) = 1
...

Well, these sure look like asynchronous calls…
The database is using asynchronous IO (synchronous IO would issue pread64 and/or pwrite64 calls on linux).

Now let’s see how the IO is done when ASM-lib is in use:
(editted output of strace -p ):

...
16:23:53 read(22, "MSA\0\2\0\7\0P\0\0\0\0\0\0\0\20\2075\20\0\0\0\0@\276K\0\0\0\0\0"..., 80) = 80
16:23:53 read(22, "MSA\0\2\0\7\0P\0\0\0\0\0\0\0\20\2075\20\0\0\0\0\0\0\0\0\0\0\0\0"..., 80) = 80
...
16:23:56 read(22, "MSA\0\2\0\7\0P\0\0\0\0\0\0\0\20\2075\20\0\0\0\0@\276K\0\0\0\0\0"..., 80) = 80
16:23:56 read(22, "MSA\0\2\0\7\0P\0\0\0\0\0\0\0\20\2075\20\0\0\0\0\0\0\0\0\0\0\0\0"..., 80) = 80
...
16:23:59 read(22, "MSA\0\2\0\7\0P\0\0\0\0\0\0\0\20\2075\20\0\0\0\0@\276K\0\0\0\0\0"..., 80) = 80
16:23:59 read(22, "MSA\0\2\0\7\0P\0\0\0\0\0\0\0\20\2075\20\0\0\0\0\0\0\0\0\0\0\0\0"..., 80) = 80
...

Kind of surprising; The read() call is a synchronous call, not an asynchronous call.

Let’s see what the filedescriptor number 22 is pointing to:


$ ls -ls /proc/13306/fd/22
0 lrwx------ 1 oracle oinstall 64 Feb 20 20:22 22 -> /dev/oracleasm/iid/0000000000000009

That’s a meta-device the ASM-lib kernel module creates for ASM-lib disks!

When using ASM-lib, the processes do not talk against the raw devices directly, they talk to the ASM-lib meta-device, and the ASM-lib kernel module takes care of the “real” IO against the devices.

This means ASM-lib IO is NOT raw IO: the ASM-lib kernel module is a layer between user processes and disk devices.

I haven’t had time to investigate the inner working of the ASM-lib kernel module (it’s open source!). Probably the optimisation is buffering some stuff, although not too much of the stuff; otherwise it wouldn’t work with clustering/RAC.

I/O Optimization
I guess this is the same point as Asynch IO management; using asynchronous IO the IO is optimised. Probably the user IO is meant by “Asynch IO Management” and the kernel IO is meant by “I/O Optimization”. I can’t judge on this part, because I do not know what optimisations the ASM-lib kernel module does at this time.

Sanity Checkups
One sanity check I can think of is ownership management of the ASM devices, so the ASM and database instances can access them. I really can not think of other sanity checkups. The consistency of the information in the ASM device header is checked by the ASM instance, not by ASM-lib. That would be a very obvious sanity check, but isn’t done by ASM-lib.

Playing with ASM disks on linux, part 1

February 13th, 2008

ASM (Oracle’s Automatic Storage Management) uses disk devices in Linux.
These devices are typically native SCSI devices (/dev/sd*), powerpath (/dev/emcpower*), etc.

The devices are used as raw devices (formerly using the /dev/raw/raw* meta-devices, nowadays directly on the devices itself), or as ASMlib devices.

What makes a device an ASMlib device, instead of a raw device?

Let’s create a diskgroup which uses a raw device on /dev/sdb1:


# chown oracle /dev/sdb1
# su - oracle
$ export ORACLE_HOME=/oracle/db/11.1.0.6
$ export ORACLE_SID=+ASM
$ /oracle/db/11.1.0.6/bin/sqlplus / as sysasm

SQL*Plus: Release 11.1.0.6.0 - Production on Tue Feb 12 15:22:52 2008

Copyright (c) 1982, 2007, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> alter system set asm_diskstring = '/dev/sd*' scope=both;

System altered.

SQL> col path format a30
SQL> select mount_status, header_status, state, path from v$asm_disk;
MOUNT_S HEADER_STATU STATE PATH
------- ------------ -------- ------------------------------
CLOSED CANDIDATE NORMAL /dev/sdb1

SQL> SQL> create diskgroup rawtest external redundancy disk '/dev/sdb1';

Diskgroup created.

State of the disk is altered in ASM:


SQL> select mount_status, header_status, state, path from v$asm_disk;

MOUNT_S HEADER_STATU STATE PATH
------- ------------ -------- ------------------------------
CACHED MEMBER NORMAL /dev/sdb1

Because this disk now belongs to a diskgroup, and has gotten a name “RAWTEST_0000″:


select g.name, d.name from v$asm_diskgroup g, v$asm_disk d where d.group_number=g.group_number;

NAME NAME
------------------------------ ------------------------------
RAWTEST RAWTEST_0000

How does this visualise at the operating system level?


$ od -a -x -A x /dev/sdb1 | less

000000 soh stx soh soh nul nul nul nul nul nul nul nul u # stx D
8201 0101 0000 0000 0000 8000 23f5 4482
000010 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
0000 0000 0000 0000 0000 0000 0000 0000
000020 O R C L D I S K nul nul nul nul nul nul nul nul
524f 4c43 4944 4b53 0000 0000 0000 0000
000030 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
0000 0000 0000 0000 0000 0000 0000 0000
000040 nul nul dle nl nul nul soh etx R A W T E S T _
0000 0a10 0000 0301 4152 5457 5345 5f54
000050 0 0 0 0 nul nul nul nul nul nul nul nul nul nul nul nul
3030 3030 0000 0000 0000 0000 0000 0000
000060 nul nul nul nul nul nul nul nul R A W T E S T nul
0000 0000 0000 0000 4152 5457 5345 0054
000070 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
0000 0000 0000 0000 0000 0000 0000 0000
000080 nul nul nul nul nul nul nul nul R A W T E S T _
0000 0000 0000 0000 4152 5457 5345 5f54
000090 0 0 0 0 nul nul nul nul nul nul nul nul nul nul nul nul
3030 3030 0000 0000 0000 0000 0000 0000
0000a0 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
0000 0000 0000 0000 0000 0000 0000 0000
*

Well, what do we see?
- ORCLDISK at offset 0×20
This means it’s an ASM disk.
- RAWTEST_0000 at offset 0×48
This is the disk
- RAWTEST at offset 0×68
This is the diskgroup name.
- RAWTEST_0000 at offset 0×88
This is the failgroup name (not used with external redundancy; nowhere to fail to)

Now let’s drop the diskgroup:


SQL> drop diskgroup rawtest;

Diskgroup dropped.

Now the diskgroup is dropped, but the disk still exists:

SQL> select group_number, mount_status, header_status, mode_status, state, name from v$asm_disk;

GROUP_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE
------------ ------- ------------ ------- --------
NAME
------------------------------
0 CLOSED FORMER ONLINE NORMAL

It’s header status is ‘FORMER’. This obviously means it used to be part of a diskgroup. But that’s not all:

When looking at the disk again using the ‘od’ utility (”octal dump”) it shows the diskheader is not updated at all.
This is probably how the alter diskgroup x undrop disk y features is implemented: the dictionary of the ASM instance has recorded the dropping of the disk, but the disk itself is untouched.

What difference makes an ASMlib disk?

First, make the disk disappear in ASM:

# chown root /dev/sdb1

Check in the ASM instance:

SQL> select * from v$asm_disk;

no rows selected

Now mark the disk as ASMlib:


# /etc/init.d/oracleasm createdisk SDB1 /dev/sdb1
Marking disk "/dev/sdb1" as an ASM disk: asmtool: Device "/dev/sdb1" is already labeled for ASM disk ""
[FAILED]

As I’ve written above, the diskheader is touched. As we see, the oracleasm utility (actually the asmtool command) checks the header and sees a valid ASM disk. But the oracleasm utility has some undocumented commands which let us do that:


# /etc/init.d/oracleasm force-renamedisk /dev/sdb1 SDB1
Renaming disk "/dev/sdb1" to "SDB1": [ OK ]

That should do the trick. Lets see if it did:

# /etc/init.d/oracleasm listdisks
SDB1

Let’s see what difference an ASMlib disk makes using the ‘od’ utility again:


od -a -x -A x /dev/sdb1 | less

000000 soh stx soh soh nul nul nul nul nul nul nul nul % C Z >
8201 0101 0000 0000 0000 8000 c3a5 be5a
000010 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
0000 0000 0000 0000 0000 0000 0000 0000
000020 O R C L D I S K S D B 1 nul nul nul nul
524f 4c43 4944 4b53 4453 3142 0000 0000
000030 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
0000 0000 0000 0000 0000 0000 0000 0000
000040 nul nul dle nl nul nul soh eot R A W T E S T _
0000 0a10 0000 0401 4152 5457 5345 5f54
000050 0 0 0 0 nul nul nul nul nul nul nul nul nul nul nul nul
3030 3030 0000 0000 0000 0000 0000 0000
000060 nul nul nul nul nul nul nul nul R A W T E S T nul
0000 0000 0000 0000 4152 5457 5345 0054
000070 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
0000 0000 0000 0000 0000 0000 0000 0000
000080 nul nul nul nul nul nul nul nul R A W T E S T _
0000 0000 0000 0000 4152 5457 5345 5f54
000090 0 0 0 0 nul nul nul nul nul nul nul nul nul nul nul nul
3030 3030 0000 0000 0000 0000 0000 0000
0000a0 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
0000 0000 0000 0000 0000 0000 0000 0000
*

Well, an ASMlib disk looks (not very surprisingly) quite much like a normal ASM disk.
We got the usual stuff at 0×20 (ORCLDISK; the ASM disk marker), 0×48 (RAWTEST_0000, the diskname), 0×68 (RAWTEST, the diskgroup). 0×88 (RAWTEST_000, the failgroup) Only visual difference is the ASMlib disk name at 0×28: SDB1!

How does that look like in the ASM instance:

SQL> alter system set asm_diskstring = '' scope=both;
SQL> col path format a30
SQL> select mount_status, header_status, state, path from v$asm_disk;
MOUNT_S HEADER_STATU STATE PATH
------- ------------ -------- ------------------------------
CLOSED PROVISIONED NORMAL
ORCL:SDB1

That’s it for this part. Next part we’ll investigate somewhat more of the inner working of both ASMlib and ASM.

Oracle and raw devices on linux

January 9th, 2008

Since RedHat Enterprise Linux version 5, raw devices are deprecated. According to this online manpage of raw(8). It states raw devices will exist for the life of RedHat Enterprise Linux 5.

Traditionally, raw (character) devices are used in cluster setups to be able to talk directly to disk devices (meaning no local buffering/caching takes place, which could corrupt cluster state advertising). The manpage also states:

If your application performs device access using this interface, Red Hat encourages you to modify your application to open the block device with the O_DIRECT flag.

This is exactly what is possible since database version 10.1.2.0.2 according to metalink note 357492.1; “Linux 2.6 Kernel Deprecation Of Raw Devices”. This means the blockdevice can be used, instead of /dev/raw/rawn. It’s also possible for the clusterware to use the blockdevice instead of raw. I’ve only tested it with the clusterware version 11.

I’ve updated the RAC on VMware article to use the blockdevices.