Level 0 onbar suddenly slow...

Discussion:

(too old to reply)

Malc P

2014-10-02 16:11:09 UTC

IDS9.31HC5, HP-UX11i (PA-RISC) on an HP system container running under Itanium

We're getting our level 0 archives slowing down by a factor 4 or so once the container has been running for a few days. We do an L0 every night which usually takes about 2 hours or so but it tends to go slow after a couple of days, taking 5 hours and maybe more.
What we're noticed is that the initial phase to collect the system info from sysmaster usually takes a few minutes but will start taking over an hour. This is the SQL it's running:
Informix Dynamic Server Version 9.30.HC5 -- On-Line -- Up 4 days 03:17:27 --
382864 Kbytes

Sess SQL Current Iso Lock SQL ISAM F.E.
Id Stmt type Database Lvl Mode ERR ERR Vers
12791 SELECT sysmaster CR Wait 300 0 0 9.03

Current SQL statement :
select ( sum ( chksize - nfree ) ) from sysdbspaces , syschunks where
sysdbspaces . dbsnum = syschunks . dbsnum and name = ?

Last parsed SQL statement :
select ( sum ( chksize - nfree ) ) from sysdbspaces , syschunks where
sysdbspaces . dbsnum = syschunks . dbsnum and name = ?

We're up to 3.8 million reads on that session in the current L0.
I've looked at whether it's arc_very_old_pages and none of the usual suspect indicators are there in onstat -g ath or onstat -g stk. We haven't got CCFLAGS=0x400000, should we?

We find that if we restart the container, it goes back to normal timings for a couple of days and then does it again. Any clues chaps?

Mark Scranton

2014-10-02 17:17:02 UTC

Permalink

Post by Malc P
IDS9.31HC5, HP-UX11i (PA-RISC) on an HP system container running under Itanium
We're getting our level 0 archives slowing down by a factor 4 or so once the container has been running for a few days. We do an L0 every night which usually takes about 2 hours or so but it tends to go slow after a couple of days, taking 5 hours and maybe more.
Informix Dynamic Server Version 9.30.HC5 -- On-Line -- Up 4 days 03:17:27 --
382864 Kbytes
Sess SQL Current Iso Lock SQL ISAM F.E.
Id Stmt type Database Lvl Mode ERR ERR Vers
12791 SELECT sysmaster CR Wait 300 0 0 9.03
select ( sum ( chksize - nfree ) ) from sysdbspaces , syschunks where
sysdbspaces . dbsnum = syschunks . dbsnum and name = ?
select ( sum ( chksize - nfree ) ) from sysdbspaces , syschunks where
sysdbspaces . dbsnum = syschunks . dbsnum and name = ?
We're up to 3.8 million reads on that session in the current L0.
I've looked at whether it's arc_very_old_pages and none of the usual suspect indicators are there in onstat -g ath or onstat -g stk. We haven't got CCFLAGS=0x400000, should we?
We find that if we restart the container, it goes back to normal timings for a couple of days and then does it again. Any clues chaps?

Malcom -

At a recent client, when archives became slow(er), it was a TSM stalling issue not an onbar/engine issue. Archives during month-end processing are incredibly slow but it's consistent and we've concluded that it's just month-end causing the overhead. The "stalling" issue was a separate one that usually occurred when TSM was struggling with storage pool(s) config or TSM simply went out to lunch.

It sounds like you're onto something with respect to the sysmaster look-up/reads, but the storage manager could be suspect?

HTH -
Mark

Mark Scranton
The Mark Scranton Group
"All Informix ... all the time."
www.markscranton.com

Malc P

2014-10-02 23:26:36 UTC

Permalink

Mark, thanks.
A bit more info - we have some crons that run every hour or so checking how much space is free in the chunks, blobspaces and tempdbs so we can issue DBA alerts and these also slow down drastically - they are all variations of the form:

dbaccess sysmaster <<EOF> /dev/null 2>&1
output to "${RESFILES}/dbspacefree.tmp" without headings
select name[1,30], sum(nfree) from sysdbspaces d, syschunks c
where d.dbsnum = c.dbsnum
and is_blobspace = 0
and is_sbspace = 0
group by 1
EOF

I did another instance restart this evening and it definitely cleared the problem once again.
It would seem sysmaster is misbehaving but for the life of me I can't see why. As I recall, UPDATE STATISTICS is not useful or necessary on sysmaster as it's basically just views into shared memory?

Mark Scranton

2014-10-06 15:38:33 UTC

Permalink

Post by Malc P
Mark, thanks.
dbaccess sysmaster <<EOF> /dev/null 2>&1
output to "${RESFILES}/dbspacefree.tmp" without headings
select name[1,30], sum(nfree) from sysdbspaces d, syschunks c
where d.dbsnum = c.dbsnum
and is_blobspace = 0
and is_sbspace = 0
group by 1
EOF
I did another instance restart this evening and it definitely cleared the problem once again.
It would seem sysmaster is misbehaving but for the life of me I can't see why. As I recall, UPDATE STATISTICS is not useful or necessary on sysmaster as it's basically just views into shared memory?

Agreed on the idea of UPD STATS against sysmaster. Yes - sysmaster is predominantly views into shared memory structures. There are (or at least used to be) some "real tables" (arc this and that), but not many that I recall.

Another possibility .... I have a current client that has over 230,000 partitions in a single engine. Many of my scripts take a very long time to run (sanity check scripts, etc.) simply due to this issue. But it's not "every so often" - it's every time and should be expected. Scanning sysptnprof takes a very long time due to the number of partitions. Haven't made a good connection on this for your situation but it came to mind.

Thanks -
Mark Scranton
The Mark Scranton Group
"All Informix ... all the time.
www.markscranton.com