OT: How to cluster two computers
Bill Vermillion
fp at wjv.com
Tue Oct 5 06:49:05 PDT 2004
In the last exciting episode of the filePro saga,
Walter Vaughan was heard to say:"
if: Tue, Oct 05 08:59
then: nm = Walter Vaughan
if:
then: show nm < "said:"
Walter Vaughan said:
> Fairlight wrote:
> >>>What exactly is your goal?
> *My* goal would be to remove single point failure. I've got a
> FreeBSD server with raid, but if the raid card fails or the
> motherboard fails, I'm offline until it's fixed or I switch
> over to it's backup.
> It'd be *really nice* to know that there'd be a machine that
> could cover for the downed solider within a second.
You need to make sure the downed machine is really down - and one
second may not be enough. That was just a random thought. Losing
connectons for a second could be the result of the network being
temporarily swamped - from any number of causes.
> What I guess is needed is still a single point aggregator
> to funnel the traffic between the workstation and the
> server-array. Other wise you'd still loose your telnet/ssh
> connection. You can swap IP addresses on two FreeBSD boxes in
> less than a second, but does anyone know how ssh/telnet would
> respond? It'd break the connection I'd guess, and you'd loose
> data.
You'd lose data for anything that was not backed up to the second
machine anyway. To avoid that you would need to have continual
backups. For this I would think something like a SAN array would
be the ticket. But I've not used one.
Thinking about it you really do need to lose your telnet session
as the second machine will have no idea of just what you were
doing as the stacks/data in the first machine are not going to
match the second machine. And even though the second machine
is a duplicate of the other unless you are performing mirror image
on the drives then I think the chances of the data lying on the
same tracks/sectors are extremely remote. That's just one possible
error scenario.
Secondly. Even though you can swap IPs on a FreeBSD system
instantly - and I have done this on running machines without
rebooting to move all new connections to a new web server during an
upgrade - you have one problem that may be a temporary show
stopper.
I say temporary as it depends upon how your network is configured.
The IP address is used on a local network only to get the MAC
address and from that point on all communications are done
MAC <-> MAC and the IP does not enter into it. The only time the
IP is used is when a machine first comes up and there is
no IP to MAC tranlastion in it's own arp table.
What happens next is going to be dependent on several things. And
those inlcude how the data being distributed and what is handling
this.
By that I mean are these using hubs or switches and are there any
routers in the schema.
Routers can hold hold ARP data for a long time - the larger the
router the longer the ARP-cache is retained. And that is because
you don't want to generate extra traffic renewing the cache.
On the Cisco 7513 I maintained the data in the ARP-cache was
retained for 4 hours from the time of the lookup.
But those are designed as core/edge routers have a lot of RAM for
holding a lot of data. Mine only had ONE ethernet card - and each
card had four ethernet connections and you could have up to 8
cards. That's why the retention time is so long.
So if you had a router that had a long retention time if you did
nothing else you'd be waiting until the cache timed out. So in
those cases you'd need to clear the cache in the routers.
I have not looked at what the default time is in smaller routers
but it should be just afew minutes - and is probably tuneable.
As to individual machines they may or may not need their tables
cleared for the same reason. If you are going to implement the
IP move scheme be SURE to test all the machine accessing your
server to see how their individual caches handle things.
Switches usually get things straightened out relatively quickly -
at least the switches I've been using - Cisco and Foundry Networks.
Those will usually try to find where the IP is after 10-20
seconds based on what I've seen when we've moved servers around.
So for almost 'instant' changeover you probably need to do things
in the way it is done for fail-over routers, and that is to change
the MAC address on the NIC. If you change the MAC address - since
everything is already communicating by MAC then the changeover
should be transparent. That way you don't have to wait
for arp-cache entries to expire or to manually clear them yourself.
If you don't do that you can get all sorts of 'internesting' things
happening. A client of mine was told by their SW vendor they
could just swap the connections between their Linksys router and
over to their SonicWall when the local owner needed access on the
weekends.
The Monday they could not communicate their prescription.
information with the medical company they were going to
The reason was when the OSR5 learned the MAC <-> ARP tranlastion it
stayed [and I don't know the default time out], so they were
rebooting the machine each time they made changes.
That was what their SW vendor said. And they were getting
frustrated. So I was called in after 3 or 4 weeks of this to look
at things. Their vendor had diddled the startup scripts to try to
fix this. So I undiddled them and told them that whenever they
changed just login as root and type tcp stop - followed
by tcp start.
SW vendors who are a gross short of a dozen clues seem to the
bane of our existance.
> Actually this goes back to the threads of this past weekend about
> offsite backups.
> Does anyone have a off-premise "hotsite" running filePro applications
> that can be brought up in seconds in the event of a fire/ theft/
> tornado/ earthquake/ natural and un-natural event?
That's a good question. I've normally only seen that associated
with very large corporations, and it's usually mainframe style
operations. For smaller systems it would almost seem best
to have the operations in a highly reliable secure serving location
and access everything remotely.
If you find information on this - let us know.
Bill
--
Bill Vermillion - bv @ wjv . com
More information about the Filepro-list
mailing list