variable lookups possible?
Bill Vermillion
fp at wjv.com
Thu Jul 22 07:20:33 PDT 2004
Fairlight, the prominent pundit, on Wed, Jul 21 22:47 while half
mumbling half-witicized:
> You'll never BELIEVE what Bill Vermillion said here...:
> > Did you miss the part that rsync is much faster when the far
> > file exists, and then only transmits the diffs of the files and
> > then edits the far file using those diffs. It also used file
> > compression to make transfer time shorter.
> I've known that for a long time. That doesn't mean you couldn't
> hit a condition where it's not in partial progress on a file,
> or it's done an index but not the key and you seek to the wrong
> place, etc.
I'd think moving indexes is counter-productive, as opposed to a
rebuild. And the paragraph makes me think you are going to be
using the file that you are copying to.
> > That surely sound like it is far more than a 'fancy replacement for
> > rcp'.
> It has bells and whistles, yes. I wasn't belitting rsync at
> all. I was -quoting- (or at least paraphrasing) from the man
> page itself, as released by the developers. :) It uses the same
> basic mechanism--altering the file directly with a rewrite. Not
> good if you're in the middle of something with another program
> on that file.
If you by '... something with another program on that file' you
mean the file to which you are writing, then I'd agree with that.
But I think rsyncing to a remote database that is in use would be
a recipe for disaster - not just because of the file itself being
used but other files upon which it depends - such as lookups to
other files in the FP world. You could easily have two files that
didn't match each other.
> > Current version of rsync is 2.6.2_1
> Yeah, well, I reported the latest I had access to as installed,
> which was from SuSE 9.0. And you know how linux vendors
> are with their rpm backports. The actual rpm versioning is
> rsync-2.5.6-176, so it's had 176 revisions of that package,
> ostensibly. It's probably roughly equivalent.
The BSD world has often been called 'source-centric'. About the
only thing I've installed in binary form is the cvsup - as I avoid
installing such things as languages that are only used to build
that.
Since the base name of rsync has changed to 2.6.2, I'd really be
suspicious. How are you supposed to know if you have the current
version if they use old version names and just patch like mad.
2.6.2 was released on April 30th. The first 2.6.0 was released
on Jan 1, 2004.
There are warnings rsync servers on Linux running 2.5.6 and at that
time they recommend "1. Update to (at least) rsync version 2.5.7
immediately". That is an exact quote from the site
at http://rsync.samba.org. This was in a December 4, 2003 security
advisory.
> > > The problem with this is that it has no way to let fP know that
> > > it may be working on the file (ie., it may be only partially
> > > copied/diff'd at any given point during its execution).
> > Valid point.
> Thanks. :) I was responding inline (something I should probably stop
> doing--it keeps biting me).
I get caught doing the same.
> If there was a way to protect against that, I
> would have expected you'd know it. I know you're a big rsync
> user/advocate. That you just said it was a valid point means I didn't
> overlook the obvious, at least.
> > Snapshots dont take too long. Then you can backup the snapshot
> > while the system continues. Once you make a snapshot, you can
> > copy it, make a tape backup, dump it, etc. Snapshots also mean
> > that you can be up and running after an inadvertent reboot without
> > waiting a long time - probably over an hour or so on a
> > multi-terabyte file-system. Charts on his site shows it
> > takes 3.5 seconds to snapshot a 7.7GB file system on an idle
> > system and 12.1 seconds to snapshot on active file system.
> Is this much different than the snapshots I'm used to with a
> .snapshot directory on Network Appliance RAID servers? Every
> directory has one, and I have 10 hourly, 9 daily, and one
> master tape backup directory under that, and can restore a file
> from any of them (which are all made at different times during
> the day, obviously) that I choose.
I don't know how those are implemented. A few years ago when
someone from Columbia Data Products contacted me, they had
one of the few applications extant that was making snapshots,
called SnapBack. They were also making a lot of SCSI drivers for
controller companies at that time. This was their first move
into the backup arena.
It made snapshots anytime the system was quiesecent - even if only
for a few seconds - but it's a totally MS targetted program.
SnapBack now show the company with that name resides in England.
The CDP site shows they have licensed their product to
Maxtor, and that their OEMs include Dell, HP, IBM, Veritas, ...
Alan described how he was doing that to me a long time ago - when
he wanted me to go to work for them [but I am so spoiled with my
current employer - Myself - that I could never leave me :-) ].
And I remember when Alan got started selling used Radio Shack
Model I and II computers under the name of "Godfather Computers".
> That's saved me more than once--not even just from accidental
> file removals, but even when I -thought- something would be
> easier to code than it was, I did it wrong, and need to revert
> and hadn't made a manual copy in advance, thinking it would
> be a cake walk. Sometimes reversion and starting over from a
> known-good point is easier than trying to hammer on something
> for hours when it's broken past a certain point.
I've been there. But I only had tape backups I'd made before I
started so the restore was a bit slower.
> > Night-time rsync usually isn't a problem, but as you point out it
> > can be during the day. The snapshot can be run often. That means
> > if someone accidentally deletes a file they can retrieve it from
> > the earlier snapshot. Since a snapshot only copies only block
> > that are actively used they dont take up that much space.
> But you're still altering a file that something else is
> actually accessing already (potentially), are you not?
You are not altering the source file, and in a database application
you should not be using the copy - at least from my POV.
I suggested used rsync to a company that was using a lot of
Linux servers driving remote display units. If you updated
an html page that was in use, a refresh would bring up the modified
page the next time. But the concept was a bit to extotic for them
so they ftp'd complete sites images to a a couple dozen machines every
night - not understanding that rsync would do only the change
and they could update in real time.
Some people just won't learn :-(
> > Really interesting concept. And a good read.
> I'll have a look in my copious free time. :)
I know that feeling - but it does give a good overview of how
it is done. It basically maps the inodes and blocks in use.
If any change is made the old block is copied to the snapshot
before any changes are made. Since most blocks on an HD are not
changed during normal operation, copying only blocks that change
into the snapshot, means that will not take up a great deal of
space.
You can see that even in a large key file the only things that
would be copied over would be just the blocks updated. Since this
is McKusick's brainchild and his reputation - it is a worthwhile
read.
Bill
--
Bill Vermillion - bv @ wjv . com
More information about the Filepro-list
mailing list