good show

Wed Jul 28 20:55:32 PDT 2004

I know you'll find it hard to believe, but on Wed, Jul 28 19:24 , Jay R. Ashworth
actually admitted to saying:" 

> On Wed, Jul 28, 2004 at 07:08:00PM -0400, John Esak wrote:
> > > Which is an excellent reason for always having at least two
> > > MX records in your DNS.
> > >
> > > When the main server is down the secondary holds them until
> > > it comes back up and then delivers them.

> > Bill,

> > Can you explain this to me further? We have had _serious_
> > problems with my favorite "Rapidsite/Verio". All mail
> > going to nexusplastics.com and a lot going to valar.com is
> > being delayed for this reason. They say their systems are
> > overloaded and they are putting in a fix within the next
> > 4-6 weeks that will fix the problem. Meanwhile, it has
> > been 4 to 6 weeks already and we actually have large (like
> > million dollar clients) telling us that if we can't be more
> > responsive to our email they will start looking elsewhere!!
> > How can we be responsive to email that comes 6 to 8 to 12
> > hours after it has been sent, or worse not at all. I have
> > been in nearly constant communication with lots of the tech
> > support people at Rapidsite, but they are pretty much giving
> > me the same patent answers that don't say anything. They want
> > me to switch our problem accounts (which I would guess be all
> > of them) to something called ViaVerio... some other product
> > they must offer. Why would I move to another product from
> > a company that is now providing me a product with failing
> > service. It's not very trust insipiring. We are now nearly at
> > a litigation point... Should I just switch to Level 3 as you
> > have suggested in the past, or is there something that can be
> > done with secondary MX records? Whay wouldn't they tell me
> > about such an option?

> > What are your suggestions?

> "Secondary MX" is the common name for a combination of techniques
> intended to reduce mail delivery failures.

> To have this really be reliable, you need (at least) two things, in
> each of two categories:

> You need for the master DNS zone for your domain to be served
> from at least 2 machines, and preferably 3 or 4, *on different
> backbones and uplink providers*. This way, mail will never
> bounce with "can't resolve domain", which is a soft bounce (the
> sending SMTP server will usually retry for up to 5 days).

It all depends upon where your machines are and just how reliable
you must be.  Five 9's is easily doable - six 9's [ approximately
30 seconds downtime per year ] gets to be a bit expensive.

Five 9's is about 5 minutes per year.  I was averaging that for the
past 2 years until 2AM Monday morning when a Cisco 7120 decided to
get finicky.  We lost a total of about 3 hours connection time
from 2AM to 6AM when I configured a machine to act as a router.
That's 4 hours total outage since March 2000.   Some of that lost
time was bringing the Cicso backup and then watching it fall over
again - while I was on the phone to tech support - in Australia.

Once I determined it was a sick puppy it was about 15 minutes
configuring a dual-nic machine to be a router.

> You need at least one extra machine to actually *receive mail*
> for your domain. These machines must have public, static IP
> addresses, and properly administered mail SMTP mail systems.
> You configure then in your DNS zone as additional MX records,
> with higher numbers in their MX records (and therefore lower
> priority).

> If a sending system tries to get mail to you, and for some reason
> cannot contact your primary MX server, it will try your secondaries in
> descending priority (ascending numerical) order.  Hopefully, *one* of
> them will be accessible.  As usual, the optimal situation is to have
> your secondaries in phsyically separate locations, on different
> backbones, just like your DNS servers.

Optimal can be expensive. And it depends on your needs and
it depends on your backbone.  I said I now totalled about 4
hour downtime in 4.5 years.  The backbone I'm connected to has
essentially no downtime.  There have been moments when they were
reconfiguring - and I had advance notice - and while they said
they expeced the network to be unavailable for up to 5 minutes, I
never saw that much.  And that last notice like that was two years
ago.   There are more phone companies in that building than I can
count.  And the last time I was in the carrier side the line
of Lucent Ascend devices stretched for many many feet.  I made a
rough estimate of 30,000 dial in connetions at that time - and
they've probably added more since then.

I'm on a 40Gbs global backbone - Level 3. Fibre comes into the
building from three separate locations. The battery room has
almost as much square footage as my house. Those will keep
everyting running for 6 to 8 hours. And the ONLY reason those are
there is in case the diesel generator doesn't start. The diesel
turns on in seconds after any power failure. It has a 6000 gallon
tank and puts out 1,250,000 watts - Caterpillar unit.

If you are a huge company - then having secondaries and DNS in
separate locations - may be a requisite. But depending on your
needs and what you use for a backbone, a separate backbone may
not be neccesary.   

> These secondary servers are configured to accept the mail
> for your domain, but not try local delivery -- they then
> attempt the delivery to your primary server themselves, for
> however long your secondaries are configured to try -- which,
> hopefully, you're in control of.

I handle secondary MX for a colo client with a flock of domains.
His machines get swamped at time - and I have been inside his [he
has given me access] and he's woefully short on memory.   So he can
get his mail server bogged down and things will come over to my
secondary MX machines until his machines start breathing normally
again.

If don't control your secondaries you should at least be able
to specify how long you want things to be held.   A decent provider
should do that for you.  However I've seen some places just totally
nuke the queues on a daily basis.  Those are usually smaller ISPs.

> Worst case, if your machine is running but your link has suffered
> backhoe fade, you might be able to sneakernet the mail spool from a
> secondary to the primary for delivery.

Only if they are close by and if you don't have a huge amount of
mail to deliver.

> The highest bandwidth data transport known to mankind is a FedEx plane
> full of DVD-ROM's.  (This used to be a station wagon fill of magtape,
> when someone at Duke coined it about Usenet; I've clearly updated.)

The problem with that is that it has a huge bandwidth but a very
poor temporal timeframe.  And when the station wagon full of mag
tapes analogy was made most backbones were in the 56K range as I
recall.

The Level 3 facility my machines are in has multiple OC-768 [40Gbs
second] links coming in.  I don't know if they've updated their
transatlantic link, but at one time it was 1.3 terabits/second.

Cisco has introduced a new router/switch with an aggregate
bandwidth of 92 terabits/second. Their old 12000s was their
flagship and they figured they'd sell only a hundred or so.

25,000 units and $5,000,000,000 later the new unit - CRS-1 [Carrier
Router System] is the one that will replace those.  Bandwidth needs
have grown faster than anyone had imagined.

While a plane full of DVD-ROMs may have a higher aggregate
bandwidth the time consumed burning those DVDs is one thing.
And if you had to ship it overnight, those should be ready to go no
later than 8PM for 10AM delivery.   

That's 14 hours, or 50400 seconds. In that time frame the CRS-1
will be able to move over 4.6 exabits or about 580 exabtyes in
that time frame.

580 Exabytes the standard terminology is about 580 quadrillion
bytes.

So it might be a close race between the plane and the data
providing you already have the DVDs made.   :-)

Bill

-- 
Bill Vermillion - bv @ wjv . com