Hosting filePro applications

Mon Jan 23 16:25:24 PST 2006

>From inside the gravity well of a singularity, Kenneth Brody shouted:
> Quoting Fairlight (Mon, 23 Jan 2006 12:15:27 -0500):
> [...]
> > d) CPU hogging.  If you leave fP on linux at the "select file" screen
> > (or any other of that kind), someone uses waitkey, etc., and you're
> > timesharing a system, that's potentially a nightmare wherein those
> > polling loops eat CPU in realtime.
> 
> None of the above-listed items eat CPU.

That actually depends on your environment.  I just did more testing to
track down the -specific- cause.

> [...]
> > There are (what I consider bugs, like that polling loop) issues in fP
> > that don't lend themselves towards a friendly multi-company shared
> 
> What polling loops?

I thought it was a polling loop because of the way it ate CPU.  It seemed
consistent with one.  Apparently it's not, upon more rigorous testing, and
I can verify that now.  However, there is a race condition issue.  Lesson
to me--don't go on first (or second) impressions.

This isn't something isolated to one system.  I've seen this on several
servers, and can replicate it at will.

After having tested it thoroughly this afternoon, I can see why you would
probably never have encountered it "normally", Ken.  It doesn't normally do
it from the command line, even if you redirect STDOUT and STDERR.  However,
try this in linux from a bash shell:

rclerk 0>/dev/null 1>/dev/null 2>/dev/null

Watch your CPU time on the process fly.  Note that it -only- races when
STDIN is closed.  If you -only- close STDIN, it appears to race a little,
but, not as hard--as opposed to the racing you'll see in realtime if all
three main fd's are closed.

Under fPcgi or OneGate (or any home-brew solution), this is -exactly- what
happens when someone makes a mistake and leaves off like a database name or
index selection or whatnot, or generates an error that requires a keypress.
All fd's are closed when the CGI dies but leaves the child running after
the several minutes (I believe it's 5) that the connection will stay open
without any traffic between server and browser.  To this day, fP has never
gracefully gone away in a CGI environment when its parent dies--at least
I've never seen it do so under any parent gateway and server.  I've never
tested it under Apache 2.x, so I can't tell if their new "kill any process
when its fd's are closed" code handles this, as I believe that only kills
the immediate child.

At any rate, this is likely only to be seen in CGI environments because
nobody's likely going to have cause to redirect or close STDIN on a
terminal.  That would be kind of pointless.  :)

Given the DoS attack I discovered in fpcgi 2.0 if you don't use the new
environment variables supplied, this can drag a system to its knees.
Imagine a 128 user license eaten alive, all racing like this, the user
license entirely soaked by virtual zombies, and the only thing that
actually stops the whole thing is manual intervention by killing the
processes.  Not pretty.  And that attack can be launched at next to zero
CPU cost on the attacking side, using 13 lines of perl and the URL-grabber
of your choice (curl, lynx, wget, RawQuery, etc.), as fast as you can
make the requests and sever the connections.  

It's bad enough when this will happen if someone generates a
keystroke-requiring error (say, a double-declared GLOBAL, Press Enter) by
accident, but when it could be deliberately invoked under fpcgi 1.0 (or 2.0
without using the alternate security measures introduced in that version),
that kind of adds insult to injury.  OneGate can't be -attacked- in this
fashion at all, but is still subject to a programmer's local configuration
error.  Unfortunately, most programmers I've worked with don't seem to
realise their processes are still running until the system becomes -very-
slow, or they exceed their user license and are forced to search for the
problem.  I've seen them stacked 20 deep before.  (And then they wonder why
things print and run s-l-o-w-l-y...)

Same thing happens with dreport as with rclerk above.  From experience, I
know that all 4 permutations of the programs do this.  I tested rclerk and
dreport specifically today, however.  I can swear to those with certainty.

Come to think of it, I just retested SCO OSR5 (Release = 5v6.0.0) now that
I know the -precise- trigger and why I never found it on SCO--because I
rarely do CGI on SCO.  The 5.0.x up to .14 runtime and development versions
on both SCO and linux exhibit this race condition if STDIN is closed along
with STDOUT and STDERR.  If it happens in 5.0.14, I have no real reason to
believe it was fixed in the upcoming 5.6 unless told otherwise.  Someone in
the beta program would have to test that.

On SCO, you'll need to run bash or some other shell that lets you close
STDIN using that syntax.  Their stock Bourne doesn't honour redirecting
with the "0>" syntax.  I used bash and voila, racing *clerk and *report at
my leisure.

You can see why I thought it was a polling loop, I'm sure.  99.9% of my
fP experience is -in- the CGI area for the last decade, and I see it all
the time when people make simple (yet common) mistakes.  I never figured
it wasn't applicable at the terminal.  Apparently it's not a polling loop
then, I see that now.  But there -is- a triggerable race condition of some
flavour present.

Since I finally traced the specific trigger for the bug, I'm CC'ing
support to put it through official channels to be fixed (hopefully).

Hope this helps.  

mark->