OT - backup supplements

Sun Feb 27 07:43:41 PST 2005

On Sun, Feb 27 00:51 , Men gasped, women fainted, and small children 
were reduced to tears as ryanx at indy.rr.com confessed to all:" 

> On Sat, Feb 26, 2005 at 09:14:08PM -0500, Bill Vermillion said:

> > > Hardware failure may not be the primary reason that backups
> > > come in handy but it does happen. If you're going to keep
> > > extra copies on disk instead of DVD, CD, or tape it would
> > > be better to keep them on a separate drive.

> > The lastest FreeBSD has snapshots and those could work well.
> > I've not gone that route yet so maybe Walter can tell us
> > if he's running FP on the 5.x series.

> > With snapshots you can even automate them, and then take the
> > snapshot and dump it, back it up, move it to another machine.

> Does that require journaling? Not that you would catch me without it,
> just wondering. How about a snapshot and rsync in crontab. Wouldn't
> that be ideal?

Journaling and soft-updates are two different approaches.

Journaling aims to keep the data intact but sometimes you have a
corrupt filesystem - while soft-updates first aim is to keep
the file system integrity over the data.

You should read the three papers at Kirk McKusick's site -
www.mckusick.com.  All are linked on the first page under
the 'information about soft-updates'.

The first is: 
Softupdates: A Technique for Eliminating Most Synchronous Writes
in THe Fast FIle System.

The second is:
Journaling Versus Soft Updates: Asynchronous Meta-Data Protection
In File Systems

The third is: 
Running "Fsck" In The Background

The first is written by McKusick and Granger [from CMU]

The second add three more author [I used to have a link to about
a 40-50 page document that was exceptionally well documented - and
ISTR it resided at either CMU or perhaps Columbia]

Dr.Margo Seltzer is one of the participants in that.  She's head
of the Systems Research Group at Harvard and I beleive is also
the writer of LFS - Log File System - which is pretty neat but in
the days of today's huge drives probably [my guess] won't work too
well.  It appears the DTFS [the SCO had and used in compressed mode
by default for storage capacity] is a form of LFS.   That is an
interesting file system approach as you NEVER get any file
fragmentation.   The file system looks like one huge circular
file [not to be confused with a trashcan].

The last paper shows how the soft-updates and snapshots work
to be able to bring a system up after a crash [which are getting
rare anymore] without having to wait for fsck to finish.

If you've ever waited on a fairly large hard drive to complete just
imaging how long it could/would be on something like
a 10 terabyte filesystem.

Often when member of the Linux group move to FreeBSD you'll see
comments/questions about why FreeBSD does not have journalling
and they will be pointed to these papers.

While many assume journaling is the only way to go these papers
show why journalling can also have problems.

McKusick is one of the principle architects of the FFS which
started in UCB's BSD - probably about 3.x [I'd have to search on
that] - which is the heart of most Unix OSes extant. He has also
written the UFS2 - the default filesystem in FreeBSD 5.3 [5.4
is due next week as I recall]. UFS2 was in 5.0 but was not the
default.  McKusick is a real expert when it comes to FS design.

The UFS2 filesystem pretty much takes care of all problems people are
seeing in relationship to file sizes, longevity, etc.

The inode structure has been changed - which may cause problems
with some programs that become quite initmate with it - as it has
moved from 128 byte to 256 bytes.  A plus is that there is actually
a file creation time.  Many assume the ctime - inode change time -
as creation time - and it was documented as such in Unix by SCO and
other vendors.

Indirect blocks will handle almost anything so you don't have the
slowdown with double-indirects as you can see with large files.

With the larger inode structure more direct block pointers will be
used before indirects, the larger default block allocation [user
changeable] also means that very large files can be accessed
with only 1 access to the inode.  In the orlginal S51 file systems
once you were over about 10K you started using indirect blocks.
One of the worst performace systems is the original Xenix file
system - but it grew up in the days when 10MB was a large drive.

I've also seen some pointers to [but have not read them] about the
time changes - so that the Y2038 problem no longer exists.  I think
the date range of approximately 138 billion years plus or minus
[either side of the epoch] will be more than sufficient - even for
those who may be suspended in cryogenic tanks in Arizona :-)

So you're comment that said:
"Does that require journaling? Not that you would catch me without it,"
suggests to me you should read the papers cited above.

Bill
-- 
Bill Vermillion - bv @ wjv . com