OT: multi-platform *nix blocksize detection

Fairlight fairlite at fairlite.com
Wed Oct 20 14:49:51 PDT 2004


Only Bill Vermillion would say something like:
> 
> But you are such a good strait man I couldn't resist.

If I'm such a strait man, why do I feel like I've lost my Bering?

> > True. Which is -why- I wanted a surefire way to find the
> > default block size--preferably without relying on 'dd' and 'ls'
> > to do so. It's tempting to look inside the source to 'ls' and
> > find out exactly how it comes up with its number of blocks. It
> > seems to be reliable on any system.
> 
> You can NOT rely on 'ls' because it will show you the length of
> the file based on the information in the inode by looking at the
> starting block, the ending block, and the ending byte in the last
> block. The actual disk space could be far far less than that.

ls -ls 

-s    Gives size in blocks, including indirect  blocks,  for
      each entry.

And obviously it's working correctly on each system.  I've never seen one
that doesn't.

> > As for your second question...yes, the operation should be
> > filesystem-dependant, although you have to keep in mind that
> > the VFS layer is in the kernel, and that's probably where
> > linux's bits/stat.h comment about 512-byte blocks for st_blocks
> > comes into play.
> 
> I disagree.  I would think it should be filesystem-INdependant.
> That way you should be able to use it among all OSes extant.

I meant to say it -should- be INdependant, and therefore the "although"
would have actually come into play.  That's what I get for typing, being on
the phone, and coding in other windows on and off all at once.

> Yup.  Even SCO stores very short files in the inode.  It makes good
> sense as it save an extra disk seek and therefore speeds up
> performance.  This is perfect for symlinks. 

Okay, that makes sense.  Although I've seen symlinks take 2 blocks before.
I saw one today, in fact.  I suspect that in reality, it's showing the
block count of the resolved link target.  But that's one area where
'ls' is hazy.

> And what about a system that has different block sizes and perhaps
> different filesystems on the same drive which it can access.

In -theory-, you should have been able to use st_blksize specifically for
this issue.  I don't know if it's just linux that does it incorrectly or if
other *nixen do it wrong as well, but no matter which system I'm on, or
which filesystem or kernel, it's set to 4096, which is -not- the block
size.  Of that I'm sure on the systems I've tested.  (This time I really am.)

> And I'd not depend on 'ls' as if you read the man pages for
> different versions you will see that even the options to 'ls'
> differ.

I've never known -s not to work.  My 'dir' alias is 'ls -Faslg' (I omit the
-g on some SysV, as it omits the owner rather than showing it.)  But I've
never seen a deviation in -s functionality.

> So it is pre-allocating the file - probably doing and lseek to
> the expected EOF.

Seems like, yes.

> > Byte sizes are useless. You have to use blocks for this
> > particular task, AFAICT.
> 
> That wan't clear from your first post.  Sorry.

NP.  My lack of clarity, apparently.

> Then the only thing that is going to let you how much space is
> being used AFAIK is 'dd' if that's what you are hunting for?

dd shouldn't work.  man lseek:

       The lseek function allows the file offset to be set beyond
       the end of the existing end-of-file of the file.  If  data
       is  later  written  at this point, subsequent reads of the
       data in the gap return bytes of zeros (until data is actu-
       ally written into the gap).

So in essence, if you dd from the file, you should end up with either the
full size of the file if a segment was written at the end, or a truncated
length that's inflated beyond what's actually contained in the file, as
there will be 10MB+ gaps in the file in some instances--sometimes
concurrently if a mesh came in poorly.  Wholly useless.

> The problem I can see is that if the file extends the file to the
> full length and there is not enought space on the drive when all
> pieces/parts are retreived you will have a corrupt file.

That's why you watch your disk space, or just keep large drives.  :)

> So you could see how big the file is going to be via an 'ls' but
> you'd have to use something like 'dd' to look as the amount of 
> real space available.   But that is going to be slow.

But 'dd won't (or shouldn't) work accurately, for the reasons I cited
above.

> I take it this is going to be for something like bittorrent?

It's for gtk=gnutella, actually.  Same concept.  Although I note that I
have GetRight for win95 and it's just a http/ftp download program that uses
multi-segmented downloading for download accelleration, which uses the same
technique.  (It's nice when you get sites that limit you to 40K/sec per
connection and you can just toss four or six segments at it, thus getting
it at full blast.)

> Is it checking for true free space?  If so you might snag their
> code and check it.  It uses the MIT License so you can do almost
> anything you want with it.

Well, gtkg is GPL'd, I believe.  I could look through it...I have the last
three trees sitting here.  I don't believe it does check for the full real
space when creating and extending the file--and it would be useless if it
did.  It would only be useful if it performs a check for free space before
starting each segment, as not only other files may have been altered and
grown in the meantime, but other applications could have sucked up a bunch
of space--and it can't track all of that just based on a single check at
file inception.

mark->
-- 
Bring the web-enabling power of OneGate to -your- filePro applications today!

Try the live filePro-based, OneGate-enabled demo at the following URL:
               http://www2.onnik.com/~fairlite/flfssindex.html


More information about the Filepro-list mailing list