OT: Unix stutter
Bill Vermillion
fp at wjv.com
Sat Sep 11 12:51:32 PDT 2004
On Fri, Sep 10 23:29 , while denying his reply is spam, John Esak
prattled on endlessly saying:
> > In that case it could be just swapping and flushing cache. There
> > have been instances where someone tunes something thinking it will
> > make it faster but it causes huge pauses. This will be allocating
> > large cache and then when it really has to be flushed the system
> > will spend the time doing that - and basically seem to be dead
> > during that time.
> This is the first time I've heard of this.
Jeff Lieberman and I had a discussion on this a few years back.
This was also a problem in many Linux systems. They made the cache
very large for fast performance, but when it had to be flushed it
would bog down. It has probably been fixed.
> You're right I never mentioned the system (this time). It is a
> SMP 2 3Ghz CPU"s with 1Gb of RAM. We run about 50 users, but at
> night when this problem happens only about 7 or 8 and only 1 of
> them is ever doing anything at a time. (meaning no time are two
> or three people doing different things. Just one big filePro
> app running with 6 - 8 users... and the backup.
I don't have anyone running SMP but ISTR some discussion on SMP
problems and there was a patch - but I may be confused on this.
Perhaps JPR has more definate information.
> Do you think a system this large and fast could have this cache
> clearing freeze up ... because I _do_ have the cache (nbufs and
> so on) set as high as they possibly can be. Should I _lower_
> these, perhaps?
You can try that. But I'd first run sar [see below] to see
what is happening. If you see huge amounts of disk i/o or
waits the huge cache could be it. I'm not one for chaniging things
until I find out what is causing it, that's why I suggested sar
first.
Are the drives on a cacheing controller? If so you could have
OS caache flushing to controller cache flushing to disk. I'm not
saying this is the problem, just that it could be.
> > You have never mentioned that you have run sar or given any info
> > about that. Have you run it?
> > Bill
> No, I never have run sar. It is so intermittent. I'll set up
> sar to run, though during a midnight backup and see what it
> produces. It is just that Tune-Up has not really shown anything
> in the way of huge system hogs other than Edge itself which
> pretty much always wins. I don't know how to track spikes with
> Tune-Up... maybe I'll call and ask them.
If you mean by 'sare is so intemittent' you mean the default that
runs hourly overnight and every 20 minutes during the day? That
is only going to give hourly and 20 minute averages and might not
point out a thing.
But you can run sar from the command line, and when I have a
problem that I need to see I'll set it up to run for 10 or more
itterations for 10 or more seconds. If you run it more than every
10 seconds sar itself can skew the results.
>From the command line just do this:
sar -o /tmp/sar 10 10
The first 10 is how many seconds to wait before you run it again.
The second 10 is how many times you wish it to run. The output
goes into the file 'sar' in tmp.
This is good to catch those thing that happen for a minute or so
and go away.
You could make this into a small shell script so someone could run
it if things get slow without having to call you. They could
then mail the results of the file to you.
Then to see what happened just do this
sar -A -f /tmp/sar
You can pipe that to a printer, less, or whatever.
--
Bill Vermillion - bv @ wjv . com
More information about the Filepro-list
mailing list