Informix Query Tuning

Discussion:

(too old to reply)

m***@gmail.com

2016-12-06 15:47:43 UTC

Hi

I have a table called lead, which have about 500 thousand records and we need the following query to get executed.

SELECT skip 300000 first 75 * FROM lead WHERE ((enrollment_period IS NULL) OR (enrollment_period IN ('FT2015','F16','SUM2016','FALL2016','FALL2017','SP17'))) ORDER BY created_on DESC

The table lead has id column as the primary key and thus have clustered index in that column. This query is taking about 12 - 13 mins. When I added a non-clustered index on created_on and enrollment_period columns, it came down to 4 - 5 mins. Then I changed the clustered index from id column to this index, execution time came down further to about 50 seconds now.

Is there any other optimization scope available for this query? Overall, is there any other change that can be done so that the query will execute faster?

Thanks in Advance,
Manohar

Jack Parker

2016-12-06 16:03:45 UTC

Permalink

I would suggest getting the explain plan

"SET EXPLAIN ON;" before running your query to see what the engine is doing.

We know nothing about the DDL for this table or your application. That
info would be helpful, you could probably fragment the table by
something or other to improve performance. But without know what is in
the table...

Why don't you have an index on enrollment_period by itself? That would
seem to be the simplest thing.

How often is this query run?

Regards,
Jack Parker

Post by m***@gmail.com
Hi
I have a table called lead, which have about 500 thousand records and we need the following query to get executed.
SELECT skip 300000 first 75 * FROM lead WHERE ((enrollment_period IS NULL) OR (enrollment_period IN ('FT2015','F16','SUM2016','FALL2016','FALL2017','SP17'))) ORDER BY created_on DESC
The table lead has id column as the primary key and thus have clustered index in that column. This query is taking about 12 - 13 mins. When I added a non-clustered index on created_on and enrollment_period columns, it came down to 4 - 5 mins. Then I changed the clustered index from id column to this index, execution time came down further to about 50 seconds now.
Is there any other optimization scope available for this query? Overall, is there any other change that can be done so that the query will execute faster?
Thanks in Advance,
Manohar
_______________________________________________
Informix-list mailing list
http://members.iiug.org/mailman/listinfo/informix-list

m***@gmail.com

2016-12-07 01:08:49 UTC

Permalink

@Jack, @Mike, I am posting the Explain Plan below:

QUERY: (OPTIMIZATION TIMESTAMP: 12-06-2016 18:07:07)
------
SELECT {+explain} skip 300000 first 75 * FROM lead WHERE ((enrollment_period IS NULL) OR
(enrollment_period IN ('FALL14','SPR15','SUM15','FALL15','SPR16','SPR14','FALL16','WIN15','WIN14','SUM14')))
ORDER BY created_on DESC

DIRECTIVES FOLLOWED:
EXPLAIN
DIRECTIVES NOT FOLLOWED:

Estimated Cost: 154427
Estimated # of Rows Returned: 300707

1) ngp.lead: INDEX PATH

(1) Index Name: jxadmin.test
Index Keys: created_on enrollment_period (Key-First) (Reverse) (Serial, fragments: ALL)
Index Key Filters: ((ngp.lead.enrollment_period IN ('FALL14' , 'SPR15' , 'SUM15' , 'FALL15' , 'SPR16' , 'SPR14' , 'FALL16' , 'WIN15' , 'WIN14' , 'SUM14' )OR ngp.lead.enrollment_period IS NULL ) )

Query statistics:
-----------------

The final cost of the plan is reduced because of the FIRST n specification in
the query.

Table map :
----------------------------
Internal name Table name
----------------------------
t1 lead

type table rows_prod est_rows rows_scan time est_cost
-------------------------------------------------------------------
scan t1 100075 300707 100075 00:00.42 154428

I tried creating the index on enrollment_period and created_on separately, but the query uses the index on created_on only. I am giving the table structure below:

create table 'ngp'.lead (
id INT8 not null,
lead_status_type VARCHAR(10) not null,
prefix VARCHAR(10),
first_name VARCHAR(100) not null,
middle_name VARCHAR(100),
last_name VARCHAR(100) not null,
suffix VARCHAR(10),
address_line1 VARCHAR(100),
address_line2 VARCHAR(100),
address_line3 VARCHAR(100),
city VARCHAR(100),
state VARCHAR(10),
country VARCHAR(10),
postal_code VARCHAR(100),
email_address VARCHAR(100),
home_phone VARCHAR(100),
cell_phone VARCHAR(100),
ssn VARCHAR(100),
enrollment_period VARCHAR(10),
lead_source VARCHAR(10),
lead_source_referree_id INT8,
lead_date DATE,
credential_type VARCHAR(10),
ceeb_code INT,
school_id INT8,
high_sch_grad_year VARCHAR(4),
intended_program1 VARCHAR(10),
intended_program2 VARCHAR(10),
gender_type VARCHAR(10),
birth_date DATE,
race_type VARCHAR(10),
occupation_type VARCHAR(10),
interest1_type VARCHAR(10),
interest2_type VARCHAR(10),
interest3_type VARCHAR(10),
interest4_type VARCHAR(10),
interest5_type VARCHAR(10),
owner_int_org_id INT8,
lead_response_type VARCHAR(10),
response_date DATETIME YEAR TO FRACTION(5),
candidate_id INT8,
created_by VARCHAR(100),
updated_by VARCHAR(100),
created_on DATETIME YEAR TO FRACTION(5),
updated_on DATETIME YEAR TO FRACTION(5),
version INT8 default 0 not null
)

Please let me know if you have any other suggestions for this.

Thanks
Manohar

Mike Walker

2016-12-06 16:10:16 UTC

Permalink

Without digging into the distributions, the index that would may work best
for this query is on (enrollment_period, created_on desc), but would need
more info.

I would suggest that you post the query plan.

Make sure that you have PDQ set.

Are there many records where enrollment_period is null? You could try
turning this into a union, but that would cause complications with the
skip/first constructs and may require the use of a temp table.

I would guess that you are using the "skip 300000 first 75" to page through
the results set. There may be better ways to do this that would involve a
scroll cursor or putting the key fields in a temp table.

Mike

-----Original Message-----
From: informix-list-***@iiug.org [mailto:informix-list-***@iiug.org]
On Behalf Of ***@gmail.com
Sent: Tuesday, December 06, 2016 8:48 AM
To: informix-***@iiug.org
Subject: Informix Query Tuning

Hi

I have a table called lead, which have about 500 thousand records and we
need the following query to get executed.

SELECT skip 300000 first 75 * FROM lead WHERE ((enrollment_period IS NULL)
OR (enrollment_period IN
('FT2015','F16','SUM2016','FALL2016','FALL2017','SP17'))) ORDER BY
created_on DESC

The table lead has id column as the primary key and thus have clustered
index in that column. This query is taking about 12 - 13 mins. When I added
a non-clustered index on created_on and enrollment_period columns, it came
down to 4 - 5 mins. Then I changed the clustered index from id column to
this index, execution time came down further to about 50 seconds now.

Is there any other optimization scope available for this query? Overall, is
there any other change that can be done so that the query will execute
faster?

Thanks in Advance,
Manohar

m***@gmail.com

2016-12-07 01:13:52 UTC

Permalink

Mike,

I am skip..first clause for pagination as the user may even scroll to skip the data upto 300000 records to get the next 75 records. There is only one table involved. Do you think creating temp table will help. If so, kindly explain how I can do this.

Please let me know how I can use PDQ and scroll cursor to optimize this query.

Regards
Manohar