OT: Unix stutter

John Esak john at valar.com
Fri Sep 10 08:43:32 PDT 2004


This is going to seem like a strange addition to the lock-up problem listed
below. But... I have the _exact_ same problem on a machine I service. It
simply stops completely dead for no reason... usually 5 or 6 seconds then
carries on as if nothing was ever wrong. This has been happening to me when
I log in over a modem... or through a telnet connection. I have asked them
if they notice it at their place... and they have all said "no". It can't be
my machine. It happens from different call-in locations, both modem and
telnet.  I think they are just inside filePro apps all day and when it
happens to them, they just chalk it up to "the way the computer behaves".
For my part, it is most disconcerting. I have given up trying to solve it.
Has existed for about 3 years. Small system, small cpu, small memory, small
disk, all sort of Pentium 100 level stuff. Not worth trying to figure out...
it just is.  :-)

John

P.S. - Although fully loaded subsystems as Mark suggests might be it...
because I have recently been experiencing the same thing on our server. At
night when BackupEDGE runs, the system will _completely_ and _totally_ lock
out every keystroke for just about 40 seconds... then it will carry on and
complete all the keystrokes in a big rush... just as if the system Gods had
pressed a big Control-S in the sky somewhere... and then 40 seconds later
pressed a big Control-Q.  :-)

I have had Microlite telnet in, run the same backup we do via cron at night
and watch for about 40 minutes... wouldn't you figure it... absolutely no
lockups... That night, several hours after this dismal attempt to show them
the problem... SAME EXACT PROBLEM... and every day since then. I have not
been able to diagnose it at all. Last night for example, I was talking with
JPR in the FP Room. He asked me to do something... and I typed "pwd" to see
where I was... my system froze for just about 40 seconds and then displayed
the working directory.  I didn't mention it to him, but I spent a few
minutes again trying to figure out what causes this... it will happen about
every 4 or 5 minutes... only during the backup though. Then all is perfect
again. It would not be such a big problem... except that when it happens
downstairs to the guys on the production line... they are in a filePro app
entering numbers into a numeric field... the freeze up is co complete that
the numbers just stay where they were typed... they don't right justify and
cause the @when-leaving code I have to operate... Sooooo, the users usually
think things are "stuck" and start pressing everything from BREAK to ESC to
more numbers to ENTER.... essentially screwing everything up when the system
returns to its senses and runs all those keystrokes.  It is an extremely
frustrating problem.  I have even enlisted the aid of Olympus Tune-Up folks
to telnet in and see if they could see anything wrong.... no luck. (Still
worth the time/money though because they showed me lots of other cool stuff
I could do to improve performance.  :-)

Meanwhile, this freeze-up during backup stays around until I hopefully get
lucky and just happen to stumble on the reason.

John



> -----Original Message-----
> From: filepro-list-bounces at lists.celestial.com
> [mailto:filepro-list-bounces at lists.celestial.com]On Behalf Of Fairlight
> Sent: Friday, September 10, 2004 10:04 AM
> To: filepro-list at seaslug.org
> Subject: Re: OT: Unix stutter
>
>
> On Fri, Sep 10, 2004 at 09:10:01AM -0400, Leefp1 at aol.com, the prominent
> pundit, witicized:
>
> > Several times per day this Unix box "stops" for 5-10 seconds for no
> > apparent reason.  I use the word stop as oppose to "lock-up" because it
> > is different than what I have ever seen before.  When it stops, no key
> > strokes are accepted from the console or any terminal.  You can't even
> > "Alt-Fn" to another screen.  The system just stops.  While stopped any
> > key strokes are NOT stored, i.e. when it "starts" the key
> strokes entered
> > during the stop are NOT processed.  It starts again on its own with no
> > action taken by a user.  It is not a fatal error but, obviously, very
> > annoying in a busy office environment.
>
> Very.
>
> Sounds like the CPU is being sucked dry--literally.  The equivalent of a
> load average of about 70 on an old Unisys 7000/40.
>
> Immediately after it recovers, what does 'uptime' say about the
> load average?
>
> > I asked this question about a few months ago when it first started and
> > did not get much response.  A few weeks ago this machine had a
> hard drive
>
> Probably someone like JPR will say that's because this isn't the "correct"
> forum for it.  I think I just saved him the trouble, even though I don't
> feel that way personally.
>
> > failure and I thought perhaps when I replaced the hard drive the problem
> > might go away (not sure why I thought that... just hoping).  But it
> > didn't, and the users are beginning to doubt my "guruness" since I can't
> > fix this.
>
> Only way I can see disk doing this is if you got an entire filesystem in
> a state where every process is thrown to a disk wait state.  Say, if you
> were running RAID and something really massive happened to slow down the
> subsystem where all buffers were full and possibly if it needed to swap on
> the same hardware, that could potentially do it.
>
> > Recently I have been suspecting a memory or processor problem, at least
> > something hardware related... my hardware builder thinks maybe a cache
> > problem.  I'm stumped.
>
> I wouldn't jump to either of those conclusions, seeing as it recovers.
>
> It sounds like a bottleneck somewhere.  Caching could do that,
> but it would
> be prudent to look at all potential factors, including the load on the
> server.  I don't suppose this coincides with a recently added cron job or
> the like?
>
> I can't see caching just falling on its arse and then recovering in a few
> seconds.  Well, I could -maybe- if the system was overheating.
>
> > Am I not thinking about some Unix setting?  Has anyone ever experienced
> > such a problem?  Any ideas where to start looking for the problem?  TIA.
>
> The problem is not being able to look at the process table -when- it's
> happening.  If one could do that, it might narrow the field a little.  If
> it's a sudden spike though, uptime should show it.
>
> What brand and type CPU are you using?  Are you using multiple CPU's?
>
> I can think of some conditions that could cause it, dependant on the
> answers to those two questions.  Overheating with sudden throttle-down
> constricting the CPU slices on top of a heavy load could do it.  Spinlocks
> in SMP/MPX could do it too.
>
> This really wants someone more hardware-minded than myself, since you're
> not going to get much out of the OS proper, but we have no shortage of
> those here.  Hey "Dad"? (*pokes Bill Vermillion*) Whatcha think?
>
> mark->
> --
> Bring the web-enabling power of OneGate to -your- filePro
> applications today!
>
> Try the live filePro-based, OneGate-enabled demo at the following URL:
>                http://www2.onnik.com/~fairlite/flfssindex.html
> _______________________________________________
> Filepro-list mailing list
> Filepro-list at lists.celestial.com
> http://mailman.celestial.com/mailman/listinfo/filepro-list



More information about the Filepro-list mailing list