OT: Warning to Firefox users when developing web integration

Sat Feb 3 11:40:48 PST 2007

At Sat, Feb 03, 2007 at 01:07:38PM -0500 or thereabouts, 
suspect Kenneth Brody was observed uttering:
> While I don't use Firefox, Opera has a configuration parameter "max
> connections to a server".  With this setting at the default of "8",
> if I were to submit 10 simultaneous requests to one server, it would
> send only 8 at the start.  Once one ended, it would send the 9th, and
> once another one ended, it would send the 10th.  (That is, it would
> not wait for all 8 to complete before sending the 9th.  It would
> merely wait for any of the 8 slots to open up.)

There -is- a parameter like this.  Go into about:config and look at
network.http.max-connections-per-server.  The default is 8.  

> Is it possible that Firefox has a similar parameter which is set to
> "2"?

Similar parameter, but it's a default of 8.

But that's per server.  Now, there's also another parameter that's set to
2 by default, and that's network.http.max-persistent-connections-per-server.

Again, that's per server, which is possibly why they come up in pairs when
the GET/POST data differs.

However, with the GET/POST data the same or absent, they only come up one
after another, after another when the URI is the same.

> What if you don't have everything take the same 10 seconds?  What if
> the parameter you passed was the sleep time, and you have them set
> to 5, 10, 15, 20?  Would it start the next session as soon as any of
> the existing sessions completed, rather than waiting for all to
> complete?

1) Doing so is at least a minor security issue.  I don't like to violate
the tenet of affecting -how- a program runs via client-side information, no
matter how sanitised.

2) Assuming I had variable timing even internally, I know exactly what
would happen.  If you had them set to use those variable times but your
GET/POST data was absent or identical, assuming you reloaded all four tabs
at once, you'd get returns at:

Tab 1: would return in 5 seconds
Tab 2: would return in 15 seconds
Tab 3: would return in 30 seconds
Tab 4: would return in 50 seconds

Note the fact that, since the second waits for the first to complete under
these conditions, the time of any individual tab's run is cumulative.
You're scaling upwards by adding run_time + prior_run_times for each tab.

3) If you passed them variably in fields, the GET/POST information would be
different, and they'd come back in pairs rather than singly.  My head is
spinning from the math overload of variable return times combined with
pairing of responses and figuring out what will or won't have fallen off
yet to make room for another to be part of a pair.  Suffice it to say it
only lessens the burden, not eliminates it.  unless the times were
incredibly short or scaled very, very slightly, you'd still hit a situation
that should not technically (IMHO) exist.

Additionally, the sleep time was put in place for regression testing.  It
was meant to simulate a run time that gave me long enough to play with the
throttling system and make sure that those in queue who had to wait to the
threshhold would indeed still drop, and that if I completed (in this case,
killed a process or closed a connection) job 2 of 3 maximum while job 4 was
waiting, that job 4 would take job 2's slot.  I refer to it as a
slot-locking routine.  It was designed to meet someone else's fP needs for
a custom package some years ago, and I adopted it myself because it also
met my needs.  It has the benefit of putting the slot number in a specific
place in the semaphore file's name so it can be parsed off and used for
such things as dreport [...] -sr %n, where %n stands for the slot number.
This guarantees no record locking conflicts, as our common idiom has been
to stand on whatever record in one control table is denoted by %n and do
all work via lookup.  It results in zero record contention.

But it's just a simulated run length.  The run lengths would be variable,
obviously, in any given situation.  The whole point is that Firefox's
behaviour vastly skewed my regression testing to the point that I thought I
had issues with a version of the code that's been out there for two months
shy of four years, and it threw me into a 7hr bug hunt to find -their-
bug.  The whole reason that segment of code is getting regression tested is
because I re-scoped my variables and changed the error handling flow for
the new design.  99.9% of the code is the same for explicitly the locking
as in my last version.  I just needed to make sure my lexical scoping was
correct, my new error handling flow functioned correctly, and that I didn't
whack anything with changes along those minor lines.  It's a critical
enough feature that it begged testing under the new design, despite having
99.9% identical code.  One wrong scope on a variable and you can make three
screens of code misbehave.  Take my word for it.  You scope something too
tightly and use it in that tight scope, and it might pass syntax check but
you'll end up with an undef or an entirely unexpected value if the same
variable name exists at a higher scope where it's used again.  I knew this
not to be the case, but I always test.

I sincerely thank the Mozilla.org developers for blowing a good part of my
afternoon, my entire evening, and part of a morning for something that
should have been testable in 15min tops.   ...Bastards.  I wish they could
know the pain of not getting in a single second of gaming for a day!

:)

Seriously, at one point it was a -joy- to look at someone else's bug on an
unrelated project yesterday for a couple hours, just as a distraction from
the fun I wasn't having due to this "feature".

On the plus side, I'm now on 2.0.0.1 as of last night, and despite losing
my Brushed theme which is no longer compatible, I like a lot of the new
features--especially the tab memory on close/restart.  Being able to
-remove- search engines is also nice.  The plugin system got a pleasant
overhaul and more attention to detail.  Overall, the "feature" that bit me
is present in 1.5 and 2.0.  I still prefer 2.0 for the features after even
only (12hrs - sleep_downtime).

As a footnote on the issue, I talked to a couple people in #firefox at
irc.mozilla.org before lunch (chatzilla is cool, if a bit unpolished) and
brought this up.  Apparently the 2-persistant-connection "rule" is in the
last paragraph of RFC 2616, section 8.1.4, which is of course the HTTP
1.1 spec.  They took the "SHOULD" a bit strictly, IMHO.  It's still not
correct, as it's doing some sort of limiting by URI rather than just by
server--otherwise I'd get pairs -every- time, and I have 6hrs of tests that
prove I get only singles unless GET/POST data are present and differ.  I
consider it a bug, and it's likely going to get bugzilla'd--the first time
I've bothered to do that since RH screwed up by omitting /etc/ptmp back in
RHL.  I'm that sufficiently annoyed that I'll edit what I posted here and
report it later in the weekend or early in the week.

For now, I'm going to go watch Babylon 5 and try to stop coding/testing
until tomorrow.  I'm already way further through alpha than I thought I'd
be.  I wasn't even supposed to be -at- the alpha testing stage for a week
or so, but I couldn't sleep right and got in a lot of coding yesterday
morning.

Anyone reading that wants something added into OneGate 4.x should use the
forum at http://duran.fairlite.com/forum/ to ask for it -soon- if they're
seriously interested.  The door is closing rapidly on features for 4.0.0,
and will only close faster as I get through alpha and into beta.  Still
open to more suggestions, but not for much longer for the immediate next
version.

mark->
-- 
Try our new SPF-0 lotion, SunScream[tm].  Get it while it's hot!