OT: Unix stutter
Bill Vermillion
fp at wjv.com
Fri Sep 10 09:45:35 PDT 2004
On Fri, Sep 10 11:43 , John Esak moved his mouse, rebooted for the
change to take effect, and then said:"
> This is going to seem like a strange addition to the lock-up
> problem listed below. But... I have the _exact_ same problem on
> a machine I service.
In none of the posts has anyone mentioned the hardware it is
running on. The OS mentioned, but as Mike posted, there have been
these problems reported on SMP/dual-processor machines, and as I
recall SCO took several attempts before getting that one fixed.
SMP is not an easy technology to get correct from what I see.
> It simply stops completely dead for no reason... usually 5 or 6
> seconds then carries on as if nothing was ever wrong. This has
> been happening to me when I log in over a modem... or through a
> telnet connection. I have asked them if they notice it at their
> place... and they have all said "no". It can't be my machine.
> It happens from different call-in locations, both modem and
> telnet.
A question on the different call in locations that you are using.
Is there a common provider on your side. Alternatively there will
be a common provider on their side.
I've seen pauses of 3-5 seconds quite a lot lately via my
connections. On machine I'm into daily this has occured a lot
recently. Running mtr I was seeing that Sprint from Orlando
was tending to drop packets and also have high packet return times.
Watching it right now I see one stop with a worst case ping
of 272ms but the best on that same IP is 37ms.
mtr [open source package] is a ping with traceroute that
continually updates the screen with packet loss, last time,
best time, average time, and worst time for EACH router along the
path.
So if they don't see a problem, and if it can't be your machine,
it pretty much leaves it to the inter-machine transport links.
> I think they are just inside filePro apps all day and when it
> happens to them, they just chalk it up to "the way the computer
> behaves". For my part, it is most disconcerting. I have given
> up trying to solve it. Has existed for about 3 years. Small
> system, small cpu, small memory, small disk, all sort of
> Pentium 100 level stuff. Not worth trying to figure out... it
> just is. :-)
In that case it could be just swapping and flushing cache. There
have been instances where someone tunes something thinking it will
make it faster but it causes huge pauses. This will be allocating
large cache and then when it really has to be flushed the system
will spend the time doing that - and basically seem to be dead
during that time.
> P.S. - Although fully loaded subsystems as Mark suggests might
> be it... because I have recently been experiencing the same
> thing on our server. At night when BackupEDGE runs, the system
> will _completely_ and _totally_ lock out every keystroke for
> just about 40 seconds... then it will carry on and complete all
> the keystrokes in a big rush... just as if the system Gods had
> pressed a big Control-S in the sky somewhere... and then 40
> seconds later pressed a big Control-Q. :-)
That could be on the link. It could also be a system with a large
load. I've seen this helping a friend with his system and I could
press keys, hit enter, etc, and nothing happened at all. And from
30 seconds to 3 minutes later, I'd get lots of # prompts where I'd
hit <ENTER>.
He was really overloaded, swapping like mad, have far too little
memory for what he was doing and there were about 1000 things in
the sendmail queue.
> I have had Microlite telnet in, run the same backup we do via
> cron at night and watch for about 40 minutes... wouldn't you
> figure it... absolutely no lockups... That night, several hours
> after this dismal attempt to show them the problem... SAME
> EXACT PROBLEM... and every day since then. I have not been able
> to diagnose it at all. Last night for example, I was talking
> with JPR in the FP Room. He asked me to do something... and
> I typed "pwd" to see where I was... my system froze for just
> about 40 seconds and then displayed the working directory.
> I didn't mention it to him, but I spent a few minutes again
> trying to figure out what causes this... it will happen about
> every 4 or 5 minutes... only during the backup though.
One question on backup. Could they be using SW compression in BE?
On a resource limited system this could cause part of it.
> Then all is perfect again. It would not be such a big
> problem... except that when it happens downstairs to the guys
> on the production line... they are in a filePro app entering
> numbers into a numeric field... the freeze up is co complete
> that the numbers just stay where they were typed... they don't
> right justify and cause the @when-leaving code I have to
> operate... Sooooo, the users usually think things are "stuck"
> and start pressing everything from BREAK to ESC to more numbers
> to ENTER.... essentially screwing everything up when the system
> returns to its senses and runs all those keystrokes. It is an
> extremely frustrating problem.
Since you know when the backups are occuring why not set sar up to
run at about 2 minute intervals during that time. If the other
problems occur during the day try to narrow that time frame down
and set up sar to run more often then. The typically hourly
sar reports won't show short-term problems. Just don't run
sar too quickly as it will skew the reports. When running it
manually don't run it oftener than 15 seconds. When I do this on a
slow system I'll run 20 interations with 15 second intervals.
> I have even enlisted the aid of Olympus Tune-Up folks to telnet
> in and see if they could see anything wrong.... no luck. (Still
> worth the time/money though because they showed me lots of
> other cool stuff I could do to improve performance. :-)
> Meanwhile, this freeze-up during backup stays around until I
> hopefully get lucky and just happen to stumble on the reason.
You have never mentioned that you have run sar or given any info
about that. Have you run it?
Bill
--
Bill Vermillion - bv @ wjv . com
More information about the Filepro-list
mailing list