Execution Plan

Discussion:

Execution Plan

(too old to reply)

t***@gmail.com

2013-04-12 13:59:46 UTC

Hi All,

Given a statement that is prepared this way

select
p.*
from
person p
join address a on a.person_id = p.id
where
(0 = $1 or p.name = $2)
and (0 = $3 or a.city = $4)

where $1-$4 are host variables and there is a non-unique index on both p.name and c.city - what execution plan would be used?

Would conditional clause selection like this always do a table scan on both tables, because it doesn't know prior to execution which indices to use?

Or does table statistics aid in avoiding table scan and choose the indices that are present?

Will the plan creation be deferred until the first execution with values for the host variables?

Thoughts on this article?

http://use-the-index-luke.com/sql/where-clause/obfuscation/smart-logic

From the Performance Guide:
Page: 13-32
When a statement contains host variables, the database server replaces the host variables with placeholders when it stores the statement in the SQL statement cache. Therefore, the statement is optimized without the database server having access to the values of the host variables. In some cases, if the database server had access to the values of the host variables, the statement might be optimized differently, usually because the distributions stored for a column inform the optimizer exactly how many rows pass the filter.

Thanks,
Tom

Art Kagel

2013-04-12 14:16:54 UTC

Permalink

Tom:

I don't have any easy answer for you, but I do have a few questions:

1. What version of Informix are you using? Different versions will
optimize this join differently.
2. What host language are you using that uses $<num> as a host var that
is not replaced until OPEN time?
3. Move the filters into the ON clause of the JOIN otherwise the engine
will have to perform a sequential scan of one or both tables, write the
joined tuples into a temp table and perform the filtered fetches from the
temp table even if you supplied constant values instead of replaceable
parameters. ANSO SQL rules require that filters in the WHERE clause be
applied post-join requiring the temp table.
4. Why not just run the thing with SET EXPLAIN enabled or the EXPLAIN
optimizer directive in it (with or without the AVOID_EXECUTE option) and
look at the actual query plan rather than have us speculate?
5. Also try this as a UNION (OK three UNIONed SELECTs) instead of the OR
conditions.
6. Also try it with the address table in a non-correlated subquery (both
with the OR and with a UNION instead) like:
AND p.id in (select a.id from address as a where a.city = $4)

There are many ways to write a SELECT statement and get the same results.
If you have not tested them all, you may not be using the best one.

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated nor
those of the entities themselves.

Post by t***@gmail.com
Hi All,
Given a statement that is prepared this way
select
p.*
from
person p
join address a on a.person_id = p.id
where
(0 = $1 or p.name = $2)
and (0 = $3 or a.city = $4)
where $1-$4 are host variables and there is a non-unique index on both
p.name and c.city - what execution plan would be used?
Would conditional clause selection like this always do a table scan on
both tables, because it doesn't know prior to execution which indices to
use?
Or does table statistics aid in avoiding table scan and choose the indices
that are present?
Will the plan creation be deferred until the first execution with values
for the host variables?
Thoughts on this article?
http://use-the-index-luke.com/sql/where-clause/obfuscation/smart-logic
Page: 13-32
When a statement contains host variables, the database server replaces the
host variables with placeholders when it stores the statement in the SQL
statement cache. Therefore, the statement is optimized without the database
server having access to the values of the host variables. In some cases, if
the database server had access to the values of the host variables, the
statement might be optimized differently, usually because the distributions
stored for a column inform the optimizer exactly how many rows pass the
filter.
Thanks,
Tom
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list

t***@gmail.com

2013-04-12 15:04:06 UTC

Permalink

Thanks much, Art

Will answer more of the questions in a bit - I am working through some developers in this posting.

We are on version 11.7.FC5

as an added FYI from one of the team :

I do not know if this is worth adding or not.

A common coding practice is to make use of short circuiting. If you know something will be false, you can use that to avoid executing the rest of the line - goDoAnExpensiveAndLongThing() :
if( 3!=3 && goDoAnExpensiveAndLongThing()) {
<skipped>

Even changing that into:
if( 4==4 || tryThisTheHardWay()) {
<executed and avoided the hard way>

That same approach would not have the same impact here:
where
(0 = $1 or p.name = $2)

Since SQL statements are not executed in a sequential manner, I am not sure if it is worth highlighting that difference. This might be absolutely obvious to everyone already.

Art Kagel

2013-04-12 15:31:30 UTC

Permalink

Wait, are those filter "zero equal $1" and "zero equal $3"? I thought that
you just had a column named letter O! OK, those will evaluate to false if
$1 and $3 are not also zero. That's not a short circuit, especially since
they are OR'd with the other filters. They will be evaluated every time.

Informix 11.70 supports multi-index scans so it's possible that multiple
indexes could be used to satisfy the multiple filters, but we'd need to
know the indexing structure. Again, the best thing to do is to try all of
the various versions of the select under SET EXPLAIN, time them and see
which runs fastest and has the most reasonable query plan (since the
specific values plugged into replaceable parameters can change the query
plan you always want a version of the select that will produce a reasonable
plan for any given value). Make sure the check the query plans to make
sure that your data distributions are up-to-date.

Note that prepared queries with replaceable parameters go through a final
optimization step after the parameter values are known at OPEN time (or the
first FETCH if the open is deferred).

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated nor
those of the entities themselves.

Post by t***@gmail.com
Thanks much, Art
Will answer more of the questions in a bit - I am working through some
developers in this posting.
We are on version 11.7.FC5
I do not know if this is worth adding or not.
A common coding practice is to make use of short circuiting. If you
know something will be false, you can use that to avoid executing the rest
if( 3!=3 && goDoAnExpensiveAndLongThing()) {
<skipped>
if( 4==4 || tryThisTheHardWay()) {
<executed and avoided the hard way>
where
(0 = $1 or p.name = $2)
Since SQL statements are not executed in a sequential manner, I am not
sure if it is worth highlighting that difference. This might be absolutely
obvious to everyone already.
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list

t***@gmail.com

2013-04-12 18:16:35 UTC

Permalink

What host language are you using that uses $<num> as a host var that is not replaced until OPEN time?

- "Java"

Why not just run the thing with SET EXPLAIN enabled or the EXPLAIN optimizer directive in it (with or without the AVOID_EXECUTE option) and look at the actual query plan rather than have us speculate?

- "Not sure how to do that for Informix. There is documentation on how to prepare statements without providing values for host variables for other databases (MySQL, Postgres etc.) but none for Informix. As per the following excerpt from the tuning guide:
Page: 13-32
When a statement contains host variables, the database server replaces the host variables with placeholders when it stores the statement in the SQL statement cache. Therefore, the statement is optimized without the database server having access to the values of the host variables. In some cases, if the database server had access to the values of the host variables, the statement might be optimized differently, usually because the distributions stored for a column inform the optimizer exactly how many rows pass the filter.

"We are trying to find out what the plan is, if different for when the host variables are not provided vs when they are."

There are many ways to write a SELECT statement and get the same results. If you have not tested them all, you may not be using the best one.

- "Points 3, 5 and 6 are all good, however this is more of a theoretical question given that particular type of query. We can use several approaches for specific cases and we have"

Art Kagel

2013-04-12 19:52:32 UTC

Permalink

As to getting the execution plan, in your Java app, or a small test app,
change the query as follows:

SELECT {+ EXPLAIN AVOID_EXECUTE} ... -- Produces a query plan and
estimated costs and estimated number of rows returned but no statistics
about an actual run.
SELECT {+ EXPLAIN} ... -- Produces a query
plan plus run time statistics to compare to the query plan's estimates.

The optimizer hint (from the opening to closing curly brace) preceeds the
first column in the projection clause of the query immediately after the
SELECT keyword. Then you just have to PREPARE the statement. The
execution plan will be written, by default, to a file named sqexplain.out
('l' intentionally left out) in the current directory if you are running it
on the server machine or in the user's home directory on the server machine
if you are running it remotely.

Just provide dummy values for the host variables. To compare the query's
plan with parameters versus with constant values you can run the query with
constants (and the EXPLAIN directive) and compare the sqexplain.out entries
for both.

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions
and do not reflect on my employer, Advanced DataTools, the IIUG, nor any
other organization with which I am associated either explicitly,
implicitly, or by inference. Neither do those opinions reflect those of
other individuals affiliated with any entity with which I am affiliated nor
those of the entities themselves.

Post by t***@gmail.com
What host language are you using that uses $<num> as a host var that is
not replaced until OPEN time?
- "Java"
Why not just run the thing with SET EXPLAIN enabled or the EXPLAIN
optimizer directive in it (with or without the AVOID_EXECUTE option) and
look at the actual query plan rather than have us speculate?
- "Not sure how to do that for Informix. There is documentation on how to
prepare statements without providing values for host variables for other
databases (MySQL, Postgres etc.) but none for Informix. As per the
Page: 13-32
When a statement contains host variables, the database server replaces the
host variables with placeholders when it stores the statement in the SQL
statement cache. Therefore, the statement is optimized without the database
server having access to the values of the host variables. In some cases, if
the database server had access to the values of the host variables, the
statement might be optimized differently, usually because the distributions
stored for a column inform the optimizer exactly how many rows pass the
filter.
"We are trying to find out what the plan is, if different for when the
host variables are not provided vs when they are."
There are many ways to write a SELECT statement and get the same results.
If you have not tested them all, you may not be using the best one.
- "Points 3, 5 and 6 are all good, however this is more of a theoretical
question given that particular type of query. We can use several approaches
for specific cases and we have"
_______________________________________________
Informix-list mailing list
http://www.iiug.org/mailman/listinfo/informix-list

Fernando Nunes

2013-04-13 13:42:10 UTC

Permalink

Informix optimizer will take into account that TRUE and OR don't require
the execution of other OR members and that FALSE and AND means immediately
FALSE.
It will try to evaluate the less costly member first if that helps avoid
calculating the other(s).
We're dealing with a real optimizer :)
This can be shown in practice.

Regards

--
Fernando Nunes
Portugal

http://informix-technology.blogspot.com
My email works... but I don't check it frequently...