Discussion:
Best practices for IDS on virtual machine?
(too old to reply)
R. Spitz
2013-04-11 20:05:29 UTC
Permalink
Hi Informixers,

I'm system and database admin for our departmental IDS server. Currently we are running IDS 11.50 on Suse Linux Enterprise Server (SLES) 10 (32bit) on a 5 year old Xeon machine. Overall data and transaction volume is rather modest, total amount of data is about 3GB. Performance has never been an issue with the current hardware, but now its time to plan for a replacement server.

Company policy dictates that dedicated server hardware for departmental servers is to be avoided, so it will have to be a virtual machine under VMWare ESX. OS will be SLES 11 64bit, and we will probably move to the latest IDS 12.10 version.

I've always used raw devices on partitions in RAID1 configuration, but what is the best way to go in a virtual environment? I'm aware that IO performance might be rather poor compared to physical hardware, but I have no choice. The ESX admins promised me storage space on a RAID10 LUN, how should I configure this?

Raw devices seem to be a thing of the past. Is ext2 with DIRECT_IO (if available in IDS Workgroup Edition) still the recommended file system type? I'm unsure about the number of chunks and dbspaces in the virtual Environment: Does it make sense to use separate chunks and dbspaces for rootdbs, data and logical logs, or should I put everything into one large rootdbs since I cannot influence the physical storage anyway?

I'd appreciate any pointers to "best practices" in the virtual environment.

Regards, Richard
Nick Lello
2013-04-11 20:14:11 UTC
Permalink
ext2 with DIRECT_IO worked well for me under VMWare ESXi .... however get
as much memory as possible assigned to the VM and make your buffers larger
than normal.

Expect slower than usual checkpoints and adjust to cater.

Try to persuade the VMWare admins to give your VM higher than normal disk
priority.. every little helps.
Post by R. Spitz
Hi Informixers,
I'm system and database admin for our departmental IDS server. Currently
we are running IDS 11.50 on Suse Linux Enterprise Server (SLES) 10 (32bit)
on a 5 year old Xeon machine. Overall data and transaction volume is rather
modest, total amount of data is about 3GB. Performance has never been an
issue with the current hardware, but now its time to plan for a replacement
server.
Company policy dictates that dedicated server hardware for departmental
servers is to be avoided, so it will have to be a virtual machine under
VMWare ESX. OS will be SLES 11 64bit, and we will probably move to the
latest IDS 12.10 version.
I've always used raw devices on partitions in RAID1 configuration, but
what is the best way to go in a virtual environment? I'm aware that IO
performance might be rather poor compared to physical hardware, but I have
no choice. The ESX admins promised me storage space on a RAID10 LUN, how
should I configure this?
Raw devices seem to be a thing of the past. Is ext2 with DIRECT_IO (if
available in IDS Workgroup Edition) still the recommended file system type?
I'm unsure about the number of chunks and dbspaces in the virtual
Environment: Does it make sense to use separate chunks and dbspaces for
rootdbs, data and logical logs, or should I put everything into one large
rootdbs since I cannot influence the physical storage anyway?
I'd appreciate any pointers to "best practices" in the virtual environment.
Regards, Richard
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
--
Nick Lello | Web Architect
o +44 (0) 843.330.9374 | o 503.284.7581 ext. 418 | m +44 (0) 750.890.3456
| Skype: nicholas.lello
Email: nick.lello at rentrak.com
RENTRAK | www.rentrak.com | NASDAQ: RENT
Art Kagel
2013-04-11 20:20:50 UTC
Permalink
VM Best practices for databases:


- IBM's Hypervisor has the lowest overhead and highest IO rates.
- If you must use VMWare make sure you are using the vSphere hypervisor,
it's IO is about double the older hypervisors from VMWare or running
without one.
- Yes RAID10.
- As much as possible make sure you do not have competition for the
physical disks underneath your LUNS, ESPECIALLY not Windows Filesystems and
NEVER email stores. The access patterns are just too different from
database access patterns.
- EXT2 with DIRECT_IO, yes.
- Yes to separate dbspaces for high access data like rootdb, logical
logs, physical logs, high access tables and indexes. Informix assigns IO
resources per dbspace and per chunk depending on specifics. More dbspaces
and more chunks can lead to more total IO throughput. Keeping logical logs
on different LUNs from physical logs and from high IO data dbspaces will
take advantage of more physical resources like IO channels and spindles.


Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated nor
those of the entities themselves.
Post by R. Spitz
Hi Informixers,
I'm system and database admin for our departmental IDS server. Currently
we are running IDS 11.50 on Suse Linux Enterprise Server (SLES) 10 (32bit)
on a 5 year old Xeon machine. Overall data and transaction volume is rather
modest, total amount of data is about 3GB. Performance has never been an
issue with the current hardware, but now its time to plan for a replacement
server.
Company policy dictates that dedicated server hardware for departmental
servers is to be avoided, so it will have to be a virtual machine under
VMWare ESX. OS will be SLES 11 64bit, and we will probably move to the
latest IDS 12.10 version.
I've always used raw devices on partitions in RAID1 configuration, but
what is the best way to go in a virtual environment? I'm aware that IO
performance might be rather poor compared to physical hardware, but I have
no choice. The ESX admins promised me storage space on a RAID10 LUN, how
should I configure this?
Raw devices seem to be a thing of the past. Is ext2 with DIRECT_IO (if
available in IDS Workgroup Edition) still the recommended file system type?
I'm unsure about the number of chunks and dbspaces in the virtual
Environment: Does it make sense to use separate chunks and dbspaces for
rootdbs, data and logical logs, or should I put everything into one large
rootdbs since I cannot influence the physical storage anyway?
I'd appreciate any pointers to "best practices" in the virtual environment.
Regards, Richard
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
R. Spitz
2013-04-12 16:14:51 UTC
Permalink
Thanks for the quick replies. After a quick look at what the IDS 12.10 admin guide says, I'm thoroughly confused:

"Important: While you must use raw disk devices on UNIX to achieve better performance, recent advances in I/O caching for cooked writes can provide similar if not better performance. [...] If optimum performance is unimportant, you can configure the database server to store data in cooked files. Cooked files are easier to set up than raw disk devices."

This is a little contradictory in itself. Do raw devices still deliver best IO performance or not? On the other hand, I am not even sure that I would be able to use raw devices in the virtual environment.

If I go the route with ext2 filesystem and cooked files, what is the best strategy: Use one big ext2-formated partition and create a file for each chunk, or use several partitions with just one chunk file in each partition?

Regards, Richard
Cesar Inacio Martins
2013-04-13 00:19:19 UTC
Permalink
Hi Richard ,

check this practical test about the effect of the cooked file cache
(performance focus) :

| tablename = my_table
| size allocated = 3168 MB
| size used = 2974 MB
| rows = 33.120.000

I reconfigure my test instance with only 40MB of 4k pages buffer (where the
data is allocated)
| $ onstat -
| IBM Informix Dynamic Server Version 11.70.FC6 -- On-Line -- Up 00:05:43
-- 688536 Kbytes
| $ onstat -b | grep buff
| 0 modified, 10000 total, 16384 hash buckets, 2048 buffer size
| 0 modified, 10000 total, 16384 hash buckets, 4096 buffer size
| 0 modified, 10000 total, 16384 hash buckets, 8192 buffer size


The machine: Opensuse 12.2 with 4GB memory running over vmware ESXi with 1
SATA disk (this is a desktop, where I use for tests)
the chunks are into ext2 partition , but I'm not using DIRECT_IO , where
will make all difference here... because with it will behave as RAW .
| $ mount | grep ifx
| /dev/mapper/vgifxdados-lvifxdisco1 on /ifxdados type ext2 (rw,noatime)

Here I clear the Linux cache :
| jdivm06:~ # echo 3 > /proc/sys/vm/drop_caches

This output is from "dstat -tmd 5" utility (is a SAR improved)
| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:29:36| 429M 28.6M 2384M 1022M| 211k 75k
| 12-04 20:29:41| 429M 28.6M 2384M 1022M| 0 0
| 12-04 20:29:46| 429M 28.6M 2384M 1022M| 0 0
| 12-04 20:29:51| 215M 264k 298M 3350M| 16k 0 <<<< drop caches run
here
| 12-04 20:29:54| 214M 264k 298M 3351M|5461B 0

I run this select forcing a full scan, remember: 2.9 Gb will be read from
disk at least and the engine have only 40 MB of buffers.
I don't know why the output of my "time" command become all together on the
same line, anyway you can see , took 1 minute to run.
| $ echo "select {+full (my_table)} count(1) from my_table" | time
dbaccess testedb
| Database selected.
| (count)
| 33120929
| 1 row(s) retrieved.
| Database closed.
| 0.00 user 0.00 system 1:02.78 elapsed 0%CPU (0 avgtext+0avgdata
11248maxresident)k
| 0inputs+0outputs (0major+781minor)pagefaults 0swaps

dstat output : Check the memory usage and disk read.
| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:34:46| 214M 232k 320M 3329M| 958k 0
| 12-04 20:34:51| 213M 232k 320M 3330M| 0 0
| 12-04 20:34:56| 217M 336k 455M 3192M| 21M 0 <<<<< start the
select here
| 12-04 20:35:01| 219M 636k 755M 2888M| 60M 0
| 12-04 20:35:06| 217M 852k 970M 2675M| 43M 0
| 12-04 20:35:11| 217M 1032k 1151M 2494M| 36M 0
| 12-04 20:35:16| 217M 1120k 1238M 2407M| 17M 0
| 12-04 20:35:21| 218M 1160k 1284M 2360M|9162k 401k
| 12-04 20:35:26| 217M 1440k 1556M 2088M| 54M 345k
| 12-04 20:35:31| 217M 1684k 1801M 1843M| 49M 0
| 12-04 20:35:36| 218M 1928k 2044M 1600M| 49M 0
| 12-04 20:35:41| 218M 2264k 2379M 1264M| 67M 0
| 12-04 20:35:46| 219M 2472k 2587M 1054M| 42M 0
| 12-04 20:35:51| 219M 2720k 2839M 803M| 50M 0
| 12-04 20:35:56| 220M 3108k 3224M 416M| 77M 0 <<<<<< finished here
| 12-04 20:36:01| 219M 3200k 3319M 322M| 19M 0
| 12-04 20:36:06| 220M 3200k 3319M 322M| 0 0
| 12-04 20:36:11| 220M 3200k 3319M 322M| 0 211k
| 12-04 20:36:16| 220M 3200k 3319M 322M| 0 0


You must agree we will not use much buffer pool from engine right.
So if you go with RAW devices or DIRECT_IO here and run the same statemente
again, will take the same or close time.
With cooked file without DIRECT_IO , take only 9 seconds...

| $ echo "select {+full (my_table)} count(1) from my_table" | time
dbaccess testedb
| Database selected.
| (count)
| 33120929
| 1 row(s) retrieved.
| Database closed.
|
| 0.00user 0.00system 0:09.23elapsed 0%CPU (0avgtext+0avgdata
11248maxresident)k
| 0inputs+0outputs (0major+781minor)pagefaults 0swaps

dstat output :
| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:37:16| 220M 3200k 3319M 321M| 0 0
| 12-04 20:37:21| 220M 3200k 3319M 322M| 0 0
| 12-04 20:37:26| 220M 3200k 3319M 322M| 0 0
| 12-04 20:37:31| 220M 3200k 3322M 318M| 729k 0 <<< select run here
| 12-04 20:37:36| 219M 3200k 3329M 312M|1326k 0
| 12-04 20:37:41| 219M 3200k 3329M 312M| 0 0
| 12-04 20:37:46| 219M 3200k 3329M 312M| 0 0

This occur because the engine read the cooked file which is on Linux
cache...

Running the test for the third and last time... now clearing the cache
again...

| jdivm06:~ # echo 3 > /proc/sys/vm/drop_caches

| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:37:56| 219M 3200k 3329M 312M| 0 0
| 12-04 20:38:01| 219M 3200k 3329M 312M| 0 0
| 12-04 20:38:06| 220M 196k 2356M 1287M| 0 0 <<<< drop caches
here....
| 12-04 20:38:11| 217M 196k 339M 3307M| 74k 200k


we back to 1 minute...

| $ echo "select {+full (my_table)} count(1) from my_table" | time
dbaccess testedb
| Database selected.
| (count)
| 33120929
| 1 row(s) retrieved.
| Database closed.
| 0.00user 0.00system 0:56.56elapsed 0%CPU (0avgtext+0avgdata
11232maxresident)k
| 9080inputs+0outputs (11major+769minor)pagefaults 0swaps

And all was read again from the disk :
| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:38:16| 217M 196k 339M 3307M| 0 0
| 12-04 20:38:21| 217M 196k 339M 3307M| 0 0
| 12-04 20:38:26| 217M 196k 339M 3307M| 10k 0
| 12-04 20:38:31| 217M 260k 420M 3226M| 16M 0 <<< select start here
| 12-04 20:38:36| 216M 472k 635M 3012M| 43M 0
| 12-04 20:38:41| 218M 868k 1028M 2617M| 79M 0
| 12-04 20:38:46| 217M 1232k 1389M 2255M| 72M 0
| 12-04 20:38:51| 218M 1584k 1739M 1905M| 70M 0
| 12-04 20:38:56| 218M 1784k 1943M 1701M| 41M 254k
| 12-04 20:39:01| 218M 1860k 2019M 1625M| 15M 0
| 12-04 20:39:06| 218M 1980k 2139M 1504M| 24M 0
| 12-04 20:39:11| 218M 2184k 2341M 1302M| 40M 0
| 12-04 20:39:16| 219M 2556k 2714M 928M| 75M 0
| 12-04 20:39:21| 220M 2932k 3088M 552M| 75M 0
| 12-04 20:39:26| 219M 3136k 3323M 318M| 47M 0 <<< finish here
| 12-04 20:39:31| 220M 3136k 3323M 317M| 0 0
| 12-04 20:39:36| 220M 3136k 3323M 318M| 0 0


But,don't forget.. if you use OS cache and occur a power off or crash on
you OS ,you can loose more information and have more chance to go with a
corrupt database.

Regards
Cesar
Post by R. Spitz
Thanks for the quick replies. After a quick look at what the IDS 12.10
"Important: While you must use raw disk devices on UNIX to achieve better
performance, recent advances in I/O caching for cooked writes can provide
similar if not better performance. [...] If optimum performance is
unimportant, you can configure the database server to store data in cooked
files. Cooked files are easier to set up than raw disk devices."
This is a little contradictory in itself. Do raw devices still deliver
best IO performance or not? On the other hand, I am not even sure that I
would be able to use raw devices in the virtual environment.
If I go the route with ext2 filesystem and cooked files, what is the best
strategy: Use one big ext2-formated partition and create a file for each
chunk, or use several partitions with just one chunk file in each partition?
Regards, Richard
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
Art Kagel
2013-04-14 01:22:05 UTC
Permalink
Cesar, your timings for read-reread using the OS cache are credible and I
will say something about that at the end. However, note that whether you
use RAW or COOKED without DIRECT_IO or COOKED with DIRECT_IO your data is
safe! When you disable DIRECT_IO and are using COOKED chunks, Informix
opens all chunk files, except temp dbspaces, with the O_SYNC flag enabled
which causes any writes to those files to be immediately flushed to disk
before acknowledging the write as complete. Informix in turn does not
complete any transaction on an unbuffered log database unless the logical
log buffer write containing the COMMIT record has been acknowledged by the
OS that means that your data is safe! Now if you are using BUFFERED
logging, then there is a small risk that transactions accumulating in the
log buffers may not be written to disk until the buffer fills. That opens
a window of risk, but the risk is the same for RAW and COOKED with
DIRECT_IO if you use buffered logging.

Now to your timings. Yes, reading and rereading the same data from the
buffer cache will be faster with COOKED files than with RAW or DIRECT_IO
which bypass the buffer cache. However, you did not present any timings
for writing! That is where RAW and DIRECT_IO improve your performance.
With COOKED files and no DIRECT_IO Informix is writing from its buffers to
the OS buffers and the O_SYNC flag is forcing the OS to wake a sync thread
to write the newly dirtied cache pages to disk. Informix has to wait for
the extra memory copy then for the OS IO scheduler to wake an IO thread and
perform the write. With RAW devices and with DIRECT_IO enabled for COOKED
files, the copy from Informix memory to OS memory is skipped and an
Informix AIO VP performs the write directly or a KAIO thread schedules the
asynchronous IO in the OS kernel's asynchronous IO service and returns to
work without having to wait. This is at least 25-20% faster than an O_SYNC
write which is completely synchronous and so must be performed by
Informix's AIO VPs.

Test it!

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated nor
those of the entities themselves.


On Fri, Apr 12, 2013 at 8:19 PM, Cesar Inacio Martins <
Post by Cesar Inacio Martins
Hi Richard ,
check this practical test about the effect of the cooked file cache
| tablename = my_table
| size allocated = 3168 MB
| size used = 2974 MB
| rows = 33.120.000
I reconfigure my test instance with only 40MB of 4k pages buffer (where
the data is allocated)
| $ onstat -
| IBM Informix Dynamic Server Version 11.70.FC6 -- On-Line -- Up 00:05:43
-- 688536 Kbytes
| $ onstat -b | grep buff
| 0 modified, 10000 total, 16384 hash buckets, 2048 buffer size
| 0 modified, 10000 total, 16384 hash buckets, 4096 buffer size
| 0 modified, 10000 total, 16384 hash buckets, 8192 buffer size
The machine: Opensuse 12.2 with 4GB memory running over vmware ESXi with 1
SATA disk (this is a desktop, where I use for tests)
the chunks are into ext2 partition , but I'm not using DIRECT_IO , where
will make all difference here... because with it will behave as RAW .
| $ mount | grep ifx
| /dev/mapper/vgifxdados-lvifxdisco1 on /ifxdados type ext2 (rw,noatime)
| jdivm06:~ # echo 3 > /proc/sys/vm/drop_caches
This output is from "dstat -tmd 5" utility (is a SAR improved)
| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:29:36| 429M 28.6M 2384M 1022M| 211k 75k
| 12-04 20:29:41| 429M 28.6M 2384M 1022M| 0 0
| 12-04 20:29:46| 429M 28.6M 2384M 1022M| 0 0
| 12-04 20:29:51| 215M 264k 298M 3350M| 16k 0 <<<< drop caches
run here
| 12-04 20:29:54| 214M 264k 298M 3351M|5461B 0
I run this select forcing a full scan, remember: 2.9 Gb will be read from
disk at least and the engine have only 40 MB of buffers.
I don't know why the output of my "time" command become all together on
the same line, anyway you can see , took 1 minute to run.
| $ echo "select {+full (my_table)} count(1) from my_table" | time
dbaccess testedb
| Database selected.
| (count)
| 33120929
| 1 row(s) retrieved.
| Database closed.
| 0.00 user 0.00 system 1:02.78 elapsed 0%CPU (0 avgtext+0avgdata
11248maxresident)k
| 0inputs+0outputs (0major+781minor)pagefaults 0swaps
dstat output : Check the memory usage and disk read.
| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:34:46| 214M 232k 320M 3329M| 958k 0
| 12-04 20:34:51| 213M 232k 320M 3330M| 0 0
| 12-04 20:34:56| 217M 336k 455M 3192M| 21M 0 <<<<< start the
select here
| 12-04 20:35:01| 219M 636k 755M 2888M| 60M 0
| 12-04 20:35:06| 217M 852k 970M 2675M| 43M 0
| 12-04 20:35:11| 217M 1032k 1151M 2494M| 36M 0
| 12-04 20:35:16| 217M 1120k 1238M 2407M| 17M 0
| 12-04 20:35:21| 218M 1160k 1284M 2360M|9162k 401k
| 12-04 20:35:26| 217M 1440k 1556M 2088M| 54M 345k
| 12-04 20:35:31| 217M 1684k 1801M 1843M| 49M 0
| 12-04 20:35:36| 218M 1928k 2044M 1600M| 49M 0
| 12-04 20:35:41| 218M 2264k 2379M 1264M| 67M 0
| 12-04 20:35:46| 219M 2472k 2587M 1054M| 42M 0
| 12-04 20:35:51| 219M 2720k 2839M 803M| 50M 0
| 12-04 20:35:56| 220M 3108k 3224M 416M| 77M 0 <<<<<< finished here
| 12-04 20:36:01| 219M 3200k 3319M 322M| 19M 0
| 12-04 20:36:06| 220M 3200k 3319M 322M| 0 0
| 12-04 20:36:11| 220M 3200k 3319M 322M| 0 211k
| 12-04 20:36:16| 220M 3200k 3319M 322M| 0 0
You must agree we will not use much buffer pool from engine right.
So if you go with RAW devices or DIRECT_IO here and run the same
statemente again, will take the same or close time.
With cooked file without DIRECT_IO , take only 9 seconds...
| $ echo "select {+full (my_table)} count(1) from my_table" | time
dbaccess testedb
| Database selected.
| (count)
| 33120929
| 1 row(s) retrieved.
| Database closed.
|
| 0.00user 0.00system 0:09.23elapsed 0%CPU (0avgtext+0avgdata
11248maxresident)k
| 0inputs+0outputs (0major+781minor)pagefaults 0swaps
| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:37:16| 220M 3200k 3319M 321M| 0 0
| 12-04 20:37:21| 220M 3200k 3319M 322M| 0 0
| 12-04 20:37:26| 220M 3200k 3319M 322M| 0 0
| 12-04 20:37:31| 220M 3200k 3322M 318M| 729k 0 <<< select run here
| 12-04 20:37:36| 219M 3200k 3329M 312M|1326k 0
| 12-04 20:37:41| 219M 3200k 3329M 312M| 0 0
| 12-04 20:37:46| 219M 3200k 3329M 312M| 0 0
This occur because the engine read the cooked file which is on Linux
cache...
Running the test for the third and last time... now clearing the cache
again...
| jdivm06:~ # echo 3 > /proc/sys/vm/drop_caches
| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:37:56| 219M 3200k 3329M 312M| 0 0
| 12-04 20:38:01| 219M 3200k 3329M 312M| 0 0
| 12-04 20:38:06| 220M 196k 2356M 1287M| 0 0 <<<< drop caches
here....
| 12-04 20:38:11| 217M 196k 339M 3307M| 74k 200k
we back to 1 minute...
| $ echo "select {+full (my_table)} count(1) from my_table" | time
dbaccess testedb
| Database selected.
| (count)
| 33120929
| 1 row(s) retrieved.
| Database closed.
| 0.00user 0.00system 0:56.56elapsed 0%CPU (0avgtext+0avgdata
11232maxresident)k
| 9080inputs+0outputs (11major+769minor)pagefaults 0swaps
| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:38:16| 217M 196k 339M 3307M| 0 0
| 12-04 20:38:21| 217M 196k 339M 3307M| 0 0
| 12-04 20:38:26| 217M 196k 339M 3307M| 10k 0
| 12-04 20:38:31| 217M 260k 420M 3226M| 16M 0 <<< select start here
| 12-04 20:38:36| 216M 472k 635M 3012M| 43M 0
| 12-04 20:38:41| 218M 868k 1028M 2617M| 79M 0
| 12-04 20:38:46| 217M 1232k 1389M 2255M| 72M 0
| 12-04 20:38:51| 218M 1584k 1739M 1905M| 70M 0
| 12-04 20:38:56| 218M 1784k 1943M 1701M| 41M 254k
| 12-04 20:39:01| 218M 1860k 2019M 1625M| 15M 0
| 12-04 20:39:06| 218M 1980k 2139M 1504M| 24M 0
| 12-04 20:39:11| 218M 2184k 2341M 1302M| 40M 0
| 12-04 20:39:16| 219M 2556k 2714M 928M| 75M 0
| 12-04 20:39:21| 220M 2932k 3088M 552M| 75M 0
| 12-04 20:39:26| 219M 3136k 3323M 318M| 47M 0 <<< finish here
| 12-04 20:39:31| 220M 3136k 3323M 317M| 0 0
| 12-04 20:39:36| 220M 3136k 3323M 318M| 0 0
But,don't forget.. if you use OS cache and occur a power off or crash on
you OS ,you can loose more information and have more chance to go with a
corrupt database.
Regards
Cesar
Post by R. Spitz
Thanks for the quick replies. After a quick look at what the IDS 12.10
"Important: While you must use raw disk devices on UNIX to achieve better
performance, recent advances in I/O caching for cooked writes can provide
similar if not better performance. [...] If optimum performance is
unimportant, you can configure the database server to store data in cooked
files. Cooked files are easier to set up than raw disk devices."
This is a little contradictory in itself. Do raw devices still deliver
best IO performance or not? On the other hand, I am not even sure that I
would be able to use raw devices in the virtual environment.
If I go the route with ext2 filesystem and cooked files, what is the best
strategy: Use one big ext2-formated partition and create a file for each
chunk, or use several partitions with just one chunk file in each partition?
Regards, Richard
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
Cesar Inacio Martins
2013-04-14 02:05:20 UTC
Permalink
Hi Art ,

Hmm... I would like to disagree with you... but before that I will do the
write test.
What I remember from tests did on the past (studying about this behave) I
conclude the write I/O throughput bounds will be "defined" from the Linux
kernel dirty buffers flush/sync parameters and how much memory do you have
available to hold this dirty buffer (against the amount of data written
from the engine during a checkpoint for example).
Like I said , I *would like* but I do not dare disagree THE Art Kagel :)
I will test and post here my test ...

Regards
Cesar
Post by Art Kagel
Cesar, your timings for read-reread using the OS cache are credible and I
will say something about that at the end. However, note that whether you
use RAW or COOKED without DIRECT_IO or COOKED with DIRECT_IO your data is
safe! When you disable DIRECT_IO and are using COOKED chunks, Informix
opens all chunk files, except temp dbspaces, with the O_SYNC flag enabled
which causes any writes to those files to be immediately flushed to disk
before acknowledging the write as complete. Informix in turn does not
complete any transaction on an unbuffered log database unless the logical
log buffer write containing the COMMIT record has been acknowledged by the
OS that means that your data is safe! Now if you are using BUFFERED
logging, then there is a small risk that transactions accumulating in the
log buffers may not be written to disk until the buffer fills. That opens
a window of risk, but the risk is the same for RAW and COOKED with
DIRECT_IO if you use buffered logging.
Now to your timings. Yes, reading and rereading the same data from the
buffer cache will be faster with COOKED files than with RAW or DIRECT_IO
which bypass the buffer cache. However, you did not present any timings
for writing! That is where RAW and DIRECT_IO improve your performance.
With COOKED files and no DIRECT_IO Informix is writing from its buffers to
the OS buffers and the O_SYNC flag is forcing the OS to wake a sync thread
to write the newly dirtied cache pages to disk. Informix has to wait for
the extra memory copy then for the OS IO scheduler to wake an IO thread and
perform the write. With RAW devices and with DIRECT_IO enabled for COOKED
files, the copy from Informix memory to OS memory is skipped and an
Informix AIO VP performs the write directly or a KAIO thread schedules the
asynchronous IO in the OS kernel's asynchronous IO service and returns to
work without having to wait. This is at least 25-20% faster than an O_SYNC
write which is completely synchronous and so must be performed by
Informix's AIO VPs.
Test it!
Art
Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/
Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated nor
those of the entities themselves.
On Fri, Apr 12, 2013 at 8:19 PM, Cesar Inacio Martins <
Post by Cesar Inacio Martins
Hi Richard ,
check this practical test about the effect of the cooked file cache
| tablename = my_table
| size allocated = 3168 MB
| size used = 2974 MB
| rows = 33.120.000
I reconfigure my test instance with only 40MB of 4k pages buffer (where
the data is allocated)
| $ onstat -
| IBM Informix Dynamic Server Version 11.70.FC6 -- On-Line -- Up
00:05:43 -- 688536 Kbytes
| $ onstat -b | grep buff
| 0 modified, 10000 total, 16384 hash buckets, 2048 buffer size
| 0 modified, 10000 total, 16384 hash buckets, 4096 buffer size
| 0 modified, 10000 total, 16384 hash buckets, 8192 buffer size
The machine: Opensuse 12.2 with 4GB memory running over vmware ESXi with
1 SATA disk (this is a desktop, where I use for tests)
the chunks are into ext2 partition , but I'm not using DIRECT_IO , where
will make all difference here... because with it will behave as RAW .
| $ mount | grep ifx
| /dev/mapper/vgifxdados-lvifxdisco1 on /ifxdados type ext2 (rw,noatime)
| jdivm06:~ # echo 3 > /proc/sys/vm/drop_caches
This output is from "dstat -tmd 5" utility (is a SAR improved)
| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:29:36| 429M 28.6M 2384M 1022M| 211k 75k
| 12-04 20:29:41| 429M 28.6M 2384M 1022M| 0 0
| 12-04 20:29:46| 429M 28.6M 2384M 1022M| 0 0
| 12-04 20:29:51| 215M 264k 298M 3350M| 16k 0 <<<< drop caches
run here
| 12-04 20:29:54| 214M 264k 298M 3351M|5461B 0
I run this select forcing a full scan, remember: 2.9 Gb will be read from
disk at least and the engine have only 40 MB of buffers.
I don't know why the output of my "time" command become all together on
the same line, anyway you can see , took 1 minute to run.
| $ echo "select {+full (my_table)} count(1) from my_table" | time
dbaccess testedb
| Database selected.
| (count)
| 33120929
| 1 row(s) retrieved.
| Database closed.
| 0.00 user 0.00 system 1:02.78 elapsed 0%CPU (0 avgtext+0avgdata
11248maxresident)k
| 0inputs+0outputs (0major+781minor)pagefaults 0swaps
dstat output : Check the memory usage and disk read.
| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:34:46| 214M 232k 320M 3329M| 958k 0
| 12-04 20:34:51| 213M 232k 320M 3330M| 0 0
| 12-04 20:34:56| 217M 336k 455M 3192M| 21M 0 <<<<< start the
select here
| 12-04 20:35:01| 219M 636k 755M 2888M| 60M 0
| 12-04 20:35:06| 217M 852k 970M 2675M| 43M 0
| 12-04 20:35:11| 217M 1032k 1151M 2494M| 36M 0
| 12-04 20:35:16| 217M 1120k 1238M 2407M| 17M 0
| 12-04 20:35:21| 218M 1160k 1284M 2360M|9162k 401k
| 12-04 20:35:26| 217M 1440k 1556M 2088M| 54M 345k
| 12-04 20:35:31| 217M 1684k 1801M 1843M| 49M 0
| 12-04 20:35:36| 218M 1928k 2044M 1600M| 49M 0
| 12-04 20:35:41| 218M 2264k 2379M 1264M| 67M 0
| 12-04 20:35:46| 219M 2472k 2587M 1054M| 42M 0
| 12-04 20:35:51| 219M 2720k 2839M 803M| 50M 0
| 12-04 20:35:56| 220M 3108k 3224M 416M| 77M 0 <<<<<< finished here
| 12-04 20:36:01| 219M 3200k 3319M 322M| 19M 0
| 12-04 20:36:06| 220M 3200k 3319M 322M| 0 0
| 12-04 20:36:11| 220M 3200k 3319M 322M| 0 211k
| 12-04 20:36:16| 220M 3200k 3319M 322M| 0 0
You must agree we will not use much buffer pool from engine right.
So if you go with RAW devices or DIRECT_IO here and run the same
statemente again, will take the same or close time.
With cooked file without DIRECT_IO , take only 9 seconds...
| $ echo "select {+full (my_table)} count(1) from my_table" | time
dbaccess testedb
| Database selected.
| (count)
| 33120929
| 1 row(s) retrieved.
| Database closed.
|
| 0.00user 0.00system 0:09.23elapsed 0%CPU (0avgtext+0avgdata
11248maxresident)k
| 0inputs+0outputs (0major+781minor)pagefaults 0swaps
| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:37:16| 220M 3200k 3319M 321M| 0 0
| 12-04 20:37:21| 220M 3200k 3319M 322M| 0 0
| 12-04 20:37:26| 220M 3200k 3319M 322M| 0 0
| 12-04 20:37:31| 220M 3200k 3322M 318M| 729k 0 <<< select run here
| 12-04 20:37:36| 219M 3200k 3329M 312M|1326k 0
| 12-04 20:37:41| 219M 3200k 3329M 312M| 0 0
| 12-04 20:37:46| 219M 3200k 3329M 312M| 0 0
This occur because the engine read the cooked file which is on Linux
cache...
Running the test for the third and last time... now clearing the cache
again...
| jdivm06:~ # echo 3 > /proc/sys/vm/drop_caches
| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:37:56| 219M 3200k 3329M 312M| 0 0
| 12-04 20:38:01| 219M 3200k 3329M 312M| 0 0
| 12-04 20:38:06| 220M 196k 2356M 1287M| 0 0 <<<< drop caches
here....
| 12-04 20:38:11| 217M 196k 339M 3307M| 74k 200k
we back to 1 minute...
| $ echo "select {+full (my_table)} count(1) from my_table" | time
dbaccess testedb
| Database selected.
| (count)
| 33120929
| 1 row(s) retrieved.
| Database closed.
| 0.00user 0.00system 0:56.56elapsed 0%CPU (0avgtext+0avgdata
11232maxresident)k
| 9080inputs+0outputs (11major+769minor)pagefaults 0swaps
| ----system---- ------memory-usage----- -dsk/total-
| time | used buff cach free| read writ
| 12-04 20:38:16| 217M 196k 339M 3307M| 0 0
| 12-04 20:38:21| 217M 196k 339M 3307M| 0 0
| 12-04 20:38:26| 217M 196k 339M 3307M| 10k 0
| 12-04 20:38:31| 217M 260k 420M 3226M| 16M 0 <<< select start here
| 12-04 20:38:36| 216M 472k 635M 3012M| 43M 0
| 12-04 20:38:41| 218M 868k 1028M 2617M| 79M 0
| 12-04 20:38:46| 217M 1232k 1389M 2255M| 72M 0
| 12-04 20:38:51| 218M 1584k 1739M 1905M| 70M 0
| 12-04 20:38:56| 218M 1784k 1943M 1701M| 41M 254k
| 12-04 20:39:01| 218M 1860k 2019M 1625M| 15M 0
| 12-04 20:39:06| 218M 1980k 2139M 1504M| 24M 0
| 12-04 20:39:11| 218M 2184k 2341M 1302M| 40M 0
| 12-04 20:39:16| 219M 2556k 2714M 928M| 75M 0
| 12-04 20:39:21| 220M 2932k 3088M 552M| 75M 0
| 12-04 20:39:26| 219M 3136k 3323M 318M| 47M 0 <<< finish here
| 12-04 20:39:31| 220M 3136k 3323M 317M| 0 0
| 12-04 20:39:36| 220M 3136k 3323M 318M| 0 0
But,don't forget.. if you use OS cache and occur a power off or crash on
you OS ,you can loose more information and have more chance to go with a
corrupt database.
Regards
Cesar
Post by R. Spitz
Thanks for the quick replies. After a quick look at what the IDS 12.10
"Important: While you must use raw disk devices on UNIX to achieve
better performance, recent advances in I/O caching for cooked writes can
provide similar if not better performance. [...] If optimum performance is
unimportant, you can configure the database server to store data in cooked
files. Cooked files are easier to set up than raw disk devices."
This is a little contradictory in itself. Do raw devices still deliver
best IO performance or not? On the other hand, I am not even sure that I
would be able to use raw devices in the virtual environment.
If I go the route with ext2 filesystem and cooked files, what is the
best strategy: Use one big ext2-formated partition and create a file for
each chunk, or use several partitions with just one chunk file in each
partition?
Regards, Richard
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
Fernando Nunes
2013-04-14 14:56:34 UTC
Permalink
Best performance possible = RAW.
But RAW are not as easy to use as cooked files, specialy if your
environemtn is segregated, meaning, you - the DBA - don't have access to
root.

But If I'm not confusing posts, you mentioned 3GB of data.... Unlesee you
use a very small ammount of memory, there's no way you can get bad
performance with that :)
Regards
Post by R. Spitz
Thanks for the quick replies. After a quick look at what the IDS 12.10
"Important: While you must use raw disk devices on UNIX to achieve better
performance, recent advances in I/O caching for cooked writes can provide
similar if not better performance. [...] If optimum performance is
unimportant, you can configure the database server to store data in cooked
files. Cooked files are easier to set up than raw disk devices."
This is a little contradictory in itself. Do raw devices still deliver
best IO performance or not? On the other hand, I am not even sure that I
would be able to use raw devices in the virtual environment.
If I go the route with ext2 filesystem and cooked files, what is the best
strategy: Use one big ext2-formated partition and create a file for each
chunk, or use several partitions with just one chunk file in each partition?
Regards, Richard
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
--
Fernando Nunes
Portugal

http://informix-technology.blogspot.com
My email works... but I don't check it frequently...
Art Kagel
2013-04-14 01:23:44 UTC
Permalink
Gee, that response looks awfully familliar!?!?!

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated nor
those of the entities themselves.


On Sat, Apr 13, 2013 at 4:40 AM, colin mcdermott <
Post by Art Kagel
- IBM's Hypervisor has the lowest overhead and highest IO rates.
- If you must use VMWare make sure you are using the vSphere
hypervisor, it's IO is about double the older hypervisors from VMWare or
running without one.
- Yes RAID10.
- As much as possible make sure you do not have competition for the
physical disks underneath your LUNS, ESPECIALLY not Windows Filesystems and
NEVER email stores. The access patterns are just too different from
database access patterns.
- EXT2 with DIRECT_IO, yes.
- Yes to separate dbspaces for high access data like rootdb, logical
logs, physical logs, high access tables and indexes. Informix assigns IO
resources per dbspace and per chunk depending on specifics. More dbspaces
and more chunks can lead to more total IO throughput. Keeping logical logs
on different LUNs from physical logs and from high IO data dbspaces will
take advantage of more physical resources like IO channels and spindles.
Send new Informix-list mailing list postings to
If replying to a posting, the message number must appear on the
Subject line. This will maintain the discussion thread. The message
number is inside square brackets (e.g. [102]). You may edit the rest
of the Subject line.
To subscribe or unsubscribe log in to the Member Area of
http://www.iiug.org and click on "Subscribe to E-mail Lists".
You can reach the person managing the list at
1. Best practices for IDS on virtual machine? (R. Spitz)
2. Re: Best practices for IDS on virtual machine? (Nick Lello)
3. Re: Best practices for IDS on virtual machine? (Art Kagel)
5. Re: Execution Plan (Art Kagel)
7. Re: Execution Plan (Art Kagel)
---------- Forwarded message ----------
Date: Thu, 11 Apr 2013 13:05:29 -0700 (PDT)
Subject: Best practices for IDS on virtual machine?
Hi Informixers,
I'm system and database admin for our departmental IDS server. Currently
we are running IDS 11.50 on Suse Linux Enterprise Server (SLES) 10 (32bit)
on a 5 year old Xeon machine. Overall data and transaction volume is rather
modest, total amount of data is about 3GB. Performance has never been an
issue with the current hardware, but now its time to plan for a replacement
server.
Company policy dictates that dedicated server hardware for departmental
servers is to be avoided, so it will have to be a virtual machine under
VMWare ESX. OS will be SLES 11 64bit, and we will probably move to the
latest IDS 12.10 version.
I've always used raw devices on partitions in RAID1 configuration, but
what is the best way to go in a virtual environment? I'm aware that IO
performance might be rather poor compared to physical hardware, but I have
no choice. The ESX admins promised me storage space on a RAID10 LUN, how
should I configure this?
Raw devices seem to be a thing of the past. Is ext2 with DIRECT_IO (if
available in IDS Workgroup Edition) still the recommended file system type?
I'm unsure about the number of chunks and dbspaces in the virtual
Environment: Does it make sense to use separate chunks and dbspaces for
rootdbs, data and logical logs, or should I put everything into one large
rootdbs since I cannot influence the physical storage anyway?
I'd appreciate any pointers to "best practices" in the virtual
environment.
Regards, Richard
---------- Forwarded message ----------
Date: Thu, 11 Apr 2013 21:14:11 +0100
Subject: Re: Best practices for IDS on virtual machine?
ext2 with DIRECT_IO worked well for me under VMWare ESXi .... however get
as much memory as possible assigned to the VM and make your buffers larger
than normal.
Expect slower than usual checkpoints and adjust to cater.
Try to persuade the VMWare admins to give your VM higher than normal disk
priority.. every little helps.
Post by R. Spitz
Hi Informixers,
I'm system and database admin for our departmental IDS server. Currently
we are running IDS 11.50 on Suse Linux Enterprise Server (SLES) 10 (32bit)
on a 5 year old Xeon machine. Overall data and transaction volume is rather
modest, total amount of data is about 3GB. Performance has never been an
issue with the current hardware, but now its time to plan for a replacement
server.
Company policy dictates that dedicated server hardware for departmental
servers is to be avoided, so it will have to be a virtual machine under
VMWare ESX. OS will be SLES 11 64bit, and we will probably move to the
latest IDS 12.10 version.
I've always used raw devices on partitions in RAID1 configuration, but
what is the best way to go in a virtual environment? I'm aware that IO
performance might be rather poor compared to physical hardware, but I have
no choice. The ESX admins promised me storage space on a RAID10 LUN, how
should I configure this?
Raw devices seem to be a thing of the past. Is ext2 with DIRECT_IO (if
available in IDS Workgroup Edition) still the recommended file system type?
I'm unsure about the number of chunks and dbspaces in the virtual
Environment: Does it make sense to use separate chunks and dbspaces for
rootdbs, data and logical logs, or should I put everything into one large
rootdbs since I cannot influence the physical storage anyway?
I'd appreciate any pointers to "best practices" in the virtual
environment.
Regards, Richard
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
--
Nick Lello | Web Architect
o +44 (0) 843.330.9374 | o 503.284.7581 ext. 418 | m +44 (0)
750.890.3456 | Skype: nicholas.lello
Email: nick.lello at rentrak.com
RENTRAK | www.rentrak.com | NASDAQ: RENT
---------- Forwarded message ----------
Date: Thu, 11 Apr 2013 16:20:50 -0400
Subject: Re: Best practices for IDS on virtual machine?
- IBM's Hypervisor has the lowest overhead and highest IO rates.
- If you must use VMWare make sure you are using the vSphere
hypervisor, it's IO is about double the older hypervisors from VMWare or
running without one.
- Yes RAID10.
- As much as possible make sure you do not have competition for the
physical disks underneath your LUNS, ESPECIALLY not Windows Filesystems and
NEVER email stores. The access patterns are just too different from
database access patterns.
- EXT2 with DIRECT_IO, yes.
- Yes to separate dbspaces for high access data like rootdb, logical
logs, physical logs, high access tables and indexes. Informix assigns IO
resources per dbspace and per chunk depending on specifics. More dbspaces
and more chunks can lead to more total IO throughput. Keeping logical logs
on different LUNs from physical logs and from high IO data dbspaces will
take advantage of more physical resources like IO channels and spindles.
Art
Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/
Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated nor
those of the entities themselves.
Post by R. Spitz
Hi Informixers,
I'm system and database admin for our departmental IDS server. Currently
we are running IDS 11.50 on Suse Linux Enterprise Server (SLES) 10 (32bit)
on a 5 year old Xeon machine. Overall data and transaction volume is rather
modest, total amount of data is about 3GB. Performance has never been an
issue with the current hardware, but now its time to plan for a replacement
server.
Company policy dictates that dedicated server hardware for departmental
servers is to be avoided, so it will have to be a virtual machine under
VMWare ESX. OS will be SLES 11 64bit, and we will probably move to the
latest IDS 12.10 version.
I've always used raw devices on partitions in RAID1 configuration, but
what is the best way to go in a virtual environment? I'm aware that IO
performance might be rather poor compared to physical hardware, but I have
no choice. The ESX admins promised me storage space on a RAID10 LUN, how
should I configure this?
Raw devices seem to be a thing of the past. Is ext2 with DIRECT_IO (if
available in IDS Workgroup Edition) still the recommended file system type?
I'm unsure about the number of chunks and dbspaces in the virtual
Environment: Does it make sense to use separate chunks and dbspaces for
rootdbs, data and logical logs, or should I put everything into one large
rootdbs since I cannot influence the physical storage anyway?
I'd appreciate any pointers to "best practices" in the virtual
environment.
Regards, Richard
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
---------- Forwarded message ----------
Date: Fri, 12 Apr 2013 06:59:46 -0700 (PDT)
Subject: Execution Plan
Hi All,
Given a statement that is prepared this way
select
p.*
from
person p
join address a on a.person_id = p.id
where
(0 = $1 or p.name = $2)
and (0 = $3 or a.city = $4)
where $1-$4 are host variables and there is a non-unique index on both
p.name and c.city - what execution plan would be used?
Would conditional clause selection like this always do a table scan on
both tables, because it doesn't know prior to execution which indices to
use?
Or does table statistics aid in avoiding table scan and choose the
indices that are present?
Will the plan creation be deferred until the first execution with values
for the host variables?
Thoughts on this article?
http://use-the-index-luke.com/sql/where-clause/obfuscation/smart-logic
Page: 13-32
When a statement contains host variables, the database server replaces
the host variables with placeholders when it stores the statement in the
SQL statement cache. Therefore, the statement is optimized without the
database server having access to the values of the host variables. In some
cases, if the database server had access to the values of the host
variables, the statement might be optimized differently, usually because
the distributions stored for a column inform the optimizer exactly how many
rows pass the filter.
Thanks,
Tom
---------- Forwarded message ----------
Date: Fri, 12 Apr 2013 10:16:54 -0400
Subject: Re: Execution Plan
1. What version of Informix are you using? Different versions will
optimize this join differently.
2. What host language are you using that uses $<num> as a host var
that is not replaced until OPEN time?
3. Move the filters into the ON clause of the JOIN otherwise the
engine will have to perform a sequential scan of one or both tables, write
the joined tuples into a temp table and perform the filtered fetches from
the temp table even if you supplied constant values instead of replaceable
parameters. ANSO SQL rules require that filters in the WHERE clause be
applied post-join requiring the temp table.
4. Why not just run the thing with SET EXPLAIN enabled or the EXPLAIN
optimizer directive in it (with or without the AVOID_EXECUTE option) and
look at the actual query plan rather than have us speculate?
5. Also try this as a UNION (OK three UNIONed SELECTs) instead of the
OR conditions.
6. Also try it with the address table in a non-correlated subquery
AND p.id in (select a.id from address as a where a.city = $4)
There are many ways to write a SELECT statement and get the same
results. If you have not tested them all, you may not be using the best
one.
Art
Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/
Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated nor
those of the entities themselves.
Post by R. Spitz
Hi All,
Given a statement that is prepared this way
select
p.*
from
person p
join address a on a.person_id = p.id
where
(0 = $1 or p.name = $2)
and (0 = $3 or a.city = $4)
where $1-$4 are host variables and there is a non-unique index on both
p.name and c.city - what execution plan would be used?
Would conditional clause selection like this always do a table scan on
both tables, because it doesn't know prior to execution which indices to
use?
Or does table statistics aid in avoiding table scan and choose the
indices that are present?
Will the plan creation be deferred until the first execution with values
for the host variables?
Thoughts on this article?
http://use-the-index-luke.com/sql/where-clause/obfuscation/smart-logic
Page: 13-32
When a statement contains host variables, the database server replaces
the host variables with placeholders when it stores the statement in the
SQL statement cache. Therefore, the statement is optimized without the
database server having access to the values of the host variables. In some
cases, if the database server had access to the values of the host
variables, the statement might be optimized differently, usually because
the distributions stored for a column inform the optimizer exactly how many
rows pass the filter.
Thanks,
Tom
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
---------- Forwarded message ----------
Date: Fri, 12 Apr 2013 08:04:06 -0700 (PDT)
Subject: Re: Execution Plan
Thanks much, Art
Will answer more of the questions in a bit - I am working through some
developers in this posting.
We are on version 11.7.FC5
I do not know if this is worth adding or not.
A common coding practice is to make use of short circuiting. If you
know something will be false, you can use that to avoid executing the rest
if( 3!=3 && goDoAnExpensiveAndLongThing()) {
<skipped>
if( 4==4 || tryThisTheHardWay()) {
<executed and avoided the hard way>
where
(0 = $1 or p.name = $2)
Since SQL statements are not executed in a sequential manner, I am not
sure if it is worth highlighting that difference. This might be absolutely
obvious to everyone already.
---------- Forwarded message ----------
Date: Fri, 12 Apr 2013 11:31:30 -0400
Subject: Re: Execution Plan
Wait, are those filter "zero equal $1" and "zero equal $3"? I thought
that you just had a column named letter O! OK, those will evaluate to
false if $1 and $3 are not also zero. That's not a short circuit,
especially since they are OR'd with the other filters. They will be
evaluated every time.
Informix 11.70 supports multi-index scans so it's possible that multiple
indexes could be used to satisfy the multiple filters, but we'd need to
know the indexing structure. Again, the best thing to do is to try all of
the various versions of the select under SET EXPLAIN, time them and see
which runs fastest and has the most reasonable query plan (since the
specific values plugged into replaceable parameters can change the query
plan you always want a version of the select that will produce a reasonable
plan for any given value). Make sure the check the query plans to make
sure that your data distributions are up-to-date.
Note that prepared queries with replaceable parameters go through a final
optimization step after the parameter values are known at OPEN time (or the
first FETCH if the open is deferred).
Art
Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/
Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated nor
those of the entities themselves.
Post by R. Spitz
Thanks much, Art
Will answer more of the questions in a bit - I am working through some
developers in this posting.
We are on version 11.7.FC5
I do not know if this is worth adding or not.
A common coding practice is to make use of short circuiting. If you
know something will be false, you can use that to avoid executing the rest
if( 3!=3 && goDoAnExpensiveAndLongThing()) {
<skipped>
if( 4==4 || tryThisTheHardWay()) {
<executed and avoided the hard way>
where
(0 = $1 or p.name = $2)
Since SQL statements are not executed in a sequential manner, I am
not sure if it is worth highlighting that difference. This might be
absolutely obvious to everyone already.
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
--
Peace
Colin McDermott
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
Art Kagel
2013-04-14 01:29:12 UTC
Permalink
Sorry, Colin, I missed your question at the end. Best not to bottom post!

Yes, in part recommeding EXT2 is to avoid journaling overhead. The rest is
the copy-on-write feature which physically fragments your chunks over
time. As to the others, see my presentation from last year's IIUG
Conference, "Doing Storage Better" which includes a table of timings I ran
(available in the members' pages on the IIUG web site - www.iiug.org).
Also see a post to the forums that Eric did last month where he posted his
own timings which agree.

Yes, Informix logging eliminates the need for FS journaling. It's covered
with over 20 years of experience proving that it works, why duplicate it
with an inferior version?

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated nor
those of the entities themselves.


On Sat, Apr 13, 2013 at 4:45 AM, colin mcdermott <
Post by Art Kagel
- IBM's Hypervisor has the lowest overhead and highest IO rates.
- If you must use VMWare make sure you are using the vSphere
hypervisor, it's IO is about double the older hypervisors from VMWare or
running without one.
- Yes RAID10.
- As much as possible make sure you do not have competition for the
physical disks underneath your LUNS, ESPECIALLY not Windows Filesystems and
NEVER email stores. The access patterns are just too different from
database access patterns.
- EXT2 with DIRECT_IO, yes.
- Yes to separate dbspaces for high access data like rootdb, logical
logs, physical logs, high access tables and indexes. Informix assigns IO
resources per dbspace and per chunk depending on specifics. More dbspaces
and more chunks can lead to more total IO throughput. Keeping logical logs
on different LUNs from physical logs and from high IO data dbspaces will
take advantage of more physical resources like IO channels and spindles.
Apologies for the previous mispost, Moderators please delete.
I notice that you recommend ext2 with DIRECT_IO. Is this to avoid the
journaling overhead of ext3? Is there a substantial performance boost in
switching to ext2? Is JFS or Reiserfs better for this. I take it that the
Plog and Llog should remove the need for the journal in ext3.
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list
Loading...