OT: Warning to Firefox users when developing web integration

Sat Feb 3 03:28:11 PST 2007

I'm putting this out there to save -someone- (hopefully) the seven or so
hours of hell I wasted out of the last 24.  Sooner or later, someone's
bound to run into this when stress-testing something, and they'll wonder
why their program "isn't working properly".  A fair amount of people here
do web development in general and/or including filePro in the mix, so I
think it's appropriate to warn people to save them trouble.  If you don't
want to know the full context of how and why I found this, jump to the
section marked "ACTUAL PROBLEM" below.

As background, I've had a throttling system in place for years for some
of my CGI programs.  I was going to alpha test a new version of one
such package, and I decided to test the throttling mechanism first,
it being relatively critical.  What this system does is prevent more
than a configurable, set amount of people from using programs under
this code at once.  For example, if you need to protect your filePro
license seats and only allow 5 of 16 to go to web use, you set it to
5.  Any concurrent requests then "hover" around (that's the technical
term!) waiting for session slots to free up, but expire after an (also
configurable) wait_attempts*wait_interval, which for me was 30 seconds.  In
testing, I was setting my job to simulate a long run...60 seconds...then
launching multiple requests at once.

The problem was requests made within a second of each other were not timing
out at the appropriate break point they should have.  When I narrowed the
scope of testing, I found that if I had a limit of 1 session, fired two off
simultaneously, and -expected- the second one to quit after 30 seconds, it
really waited 60 seconds plus change and finished executing.

I thought my code was broken.  You have NO idea how many checkpoint
debugging statements, debug files, and requests I went through.

An hour or two in, I discovered that if I ran my program from the command
line (it has that debugging ability), the lock system worked as expected,
with one dying off after 30 seconds with the appropriate message.  At this
point, all evidence led to it being some obscure difference between the
command line environment and the httpd environment in some way.  I tested
all sorts of things, including what role NFS might have played (none, I
eventually found out).

After about three more hours of this, my wife and I were talking and I was
relating my befuddlement.  She asked if it happened in other browsers.
Leave it to an engineer...

I tested it under IE6, lynx, my own B2B transport RawQuery, and lo and
behold--it worked -properly- under all of them.  

It was then a matter of trying to find out how Firefox was talking
differently with the server than other browsers.  I dumped environment
variables from all attempts and checked for differences.  Nothing of
relevance jumped out at me.  I kept up at this for quite a while before
calling it a night.

I just now, in the last hour, actually narrowed it down.  The salient
question I should have been asking was, "when", not "how" Firefox was
talking to the server.

***************
ACTUAL PROBLEM:
***************

Firefox will defer successive requests to the same URI if all GET and POST
data are identical (or if there is no such data), until previous such runs
are complete.  This is true in both 1.5.x and 2.0.x.

Assume you have two tabs.  Assume you put in
"http://somesite.com/cgi-bin/myprog" for the URL in both of them, and the
task will take 15 seconds to complete and return.  If you submit
these back to back, either manually in close succession (or any time before
the first has returned), or by hitting "Reload All Tabs", they both look
like they're waiting on the host, but one will come back after 15 seconds,
and the other will come back 15 seconds after that.

At first I caught this under a CGI program, having it spit out the
timestamp very first thing.  Lo and behold, I was receiving a start time
for the second process that followed the end time of the first request.
I found it wasn't even making the second request until after the first was
completed.

I figured maybe it was limited to cgi-bin files or .cgi and maybe .pl
suffixes, so I tried it against a file "index.php" which consisted entirely
of only the data between the "=====" lines:

=====
<?php
sleep(10);
?>
Done.
=====

I had a URL of "http://debughost.com/index.php".  That was it.  I fired two
off at once and got the exact same results.  They run sequentially, not
truly asynchronously, although it -looks- like they start simultaneously
from the UI's performance when you fire them off.

Now, here's the exception.  If you have GET or POST data, it changes the
behaviour.  If I submitted the following two requests:

http://debughost.com/index.php?a=1
http://debughost.com/index.php?b=2

...they would run and complete virtually simultaneously.  And I've run this
test enough times and enough different ways (including in POST data from
different referring pages) to be confident in this.

So it looks like if you GET or POST data differ, you're all set, right?

No.

On a hunch, I just opened five tabs.  The following URL's were done one at
a time in each successive tab:

http://debughost.com/index.php?a=1
http://debughost.com/index.php?a=2
http://debughost.com/index.php?a=3
http://debughost.com/index.php?a=4
http://debughost.com/index.php?b=qwerty

These were run against the same 10 second sleeping index.php shown above.
When I had all five tabs loaded, I hit "Reload All Tabs" and discovered it
was even worse than previously indicated.  I repeated the test three times,
and all results came back the same.

When I submitted all five, the first two requests completed after 10
seconds, back to back.  Then 10 seconds later requests three and four
completed.  And request five came in another 10 seconds after that.

##### TECHNICAL NOTICE #####
Firefox will only do one submission to a URI at a time if the GET and POST
data are either non-existant, or identical.  If either is present and
differs, it will do up to two at a time, but no more than that.  Pending
requests act visually like it's waiting on the server, complete with
moving indicators, etc.  In reality, it's not making requests over 1 or 2,
respective to situation above, concurrently.  The duplicates are handled
semi-sequentially (I say semi because it -will- do two at once if the
GET/POST data differ).
##### TECHNICAL NOTICE #####

I can only assume that this is some intentional DoS-prevention built
into the client.  I can see how it could be abused if you loaded 100
tabs individually and then slammed a server with 100 requests at once
to the same resource when you hit "Reload All Tabs".  However, other
browsers don't do it.  And that's a case for lawyers, not programmers.
This performance inhibitor doesn't belong there.  And although I may choose
to complain to Mozilla's developers about it, I doubt it will change.

This may or may not have been easier to diagnose had I had access to the
web server's logs, but I honestly would not have thought to look at that
data until about the time I put the timestamp outputs at the beginning of
my program.

At this point, I strongly advise against stress-testing any web development
implementation's throttling, QoS, or similar mechanisms using Firefox
through 1.5.0.9 or 2.0.0.1.  Doing so may lead to dented or cracked
keyboards, increased use of harsh language, hypertension, loss of a
complete Friday evening, and a significantly higher time:benefit ratio.
You have been warned.

mark->
-- 
Fairlight->   ||| "Reality is merely an illusion,    | Fairlight Consulting
  __/\__      ||| albeit a very persistent one." --  |
 <__<>__>     ||| Albert Einstein                    | http://www.fairlite.com
    \/        |||                                    | info at fairlite.com