OT: multi-platform *nix blocksize detection
Fairlight
fairlite at fairlite.com
Wed Oct 20 14:49:51 PDT 2004
Only Bill Vermillion would say something like:
>
> But you are such a good strait man I couldn't resist.
If I'm such a strait man, why do I feel like I've lost my Bering?
> > True. Which is -why- I wanted a surefire way to find the
> > default block size--preferably without relying on 'dd' and 'ls'
> > to do so. It's tempting to look inside the source to 'ls' and
> > find out exactly how it comes up with its number of blocks. It
> > seems to be reliable on any system.
>
> You can NOT rely on 'ls' because it will show you the length of
> the file based on the information in the inode by looking at the
> starting block, the ending block, and the ending byte in the last
> block. The actual disk space could be far far less than that.
ls -ls
-s Gives size in blocks, including indirect blocks, for
each entry.
And obviously it's working correctly on each system. I've never seen one
that doesn't.
> > As for your second question...yes, the operation should be
> > filesystem-dependant, although you have to keep in mind that
> > the VFS layer is in the kernel, and that's probably where
> > linux's bits/stat.h comment about 512-byte blocks for st_blocks
> > comes into play.
>
> I disagree. I would think it should be filesystem-INdependant.
> That way you should be able to use it among all OSes extant.
I meant to say it -should- be INdependant, and therefore the "although"
would have actually come into play. That's what I get for typing, being on
the phone, and coding in other windows on and off all at once.
> Yup. Even SCO stores very short files in the inode. It makes good
> sense as it save an extra disk seek and therefore speeds up
> performance. This is perfect for symlinks.
Okay, that makes sense. Although I've seen symlinks take 2 blocks before.
I saw one today, in fact. I suspect that in reality, it's showing the
block count of the resolved link target. But that's one area where
'ls' is hazy.
> And what about a system that has different block sizes and perhaps
> different filesystems on the same drive which it can access.
In -theory-, you should have been able to use st_blksize specifically for
this issue. I don't know if it's just linux that does it incorrectly or if
other *nixen do it wrong as well, but no matter which system I'm on, or
which filesystem or kernel, it's set to 4096, which is -not- the block
size. Of that I'm sure on the systems I've tested. (This time I really am.)
> And I'd not depend on 'ls' as if you read the man pages for
> different versions you will see that even the options to 'ls'
> differ.
I've never known -s not to work. My 'dir' alias is 'ls -Faslg' (I omit the
-g on some SysV, as it omits the owner rather than showing it.) But I've
never seen a deviation in -s functionality.
> So it is pre-allocating the file - probably doing and lseek to
> the expected EOF.
Seems like, yes.
> > Byte sizes are useless. You have to use blocks for this
> > particular task, AFAICT.
>
> That wan't clear from your first post. Sorry.
NP. My lack of clarity, apparently.
> Then the only thing that is going to let you how much space is
> being used AFAIK is 'dd' if that's what you are hunting for?
dd shouldn't work. man lseek:
The lseek function allows the file offset to be set beyond
the end of the existing end-of-file of the file. If data
is later written at this point, subsequent reads of the
data in the gap return bytes of zeros (until data is actu-
ally written into the gap).
So in essence, if you dd from the file, you should end up with either the
full size of the file if a segment was written at the end, or a truncated
length that's inflated beyond what's actually contained in the file, as
there will be 10MB+ gaps in the file in some instances--sometimes
concurrently if a mesh came in poorly. Wholly useless.
> The problem I can see is that if the file extends the file to the
> full length and there is not enought space on the drive when all
> pieces/parts are retreived you will have a corrupt file.
That's why you watch your disk space, or just keep large drives. :)
> So you could see how big the file is going to be via an 'ls' but
> you'd have to use something like 'dd' to look as the amount of
> real space available. But that is going to be slow.
But 'dd won't (or shouldn't) work accurately, for the reasons I cited
above.
> I take it this is going to be for something like bittorrent?
It's for gtk=gnutella, actually. Same concept. Although I note that I
have GetRight for win95 and it's just a http/ftp download program that uses
multi-segmented downloading for download accelleration, which uses the same
technique. (It's nice when you get sites that limit you to 40K/sec per
connection and you can just toss four or six segments at it, thus getting
it at full blast.)
> Is it checking for true free space? If so you might snag their
> code and check it. It uses the MIT License so you can do almost
> anything you want with it.
Well, gtkg is GPL'd, I believe. I could look through it...I have the last
three trees sitting here. I don't believe it does check for the full real
space when creating and extending the file--and it would be useless if it
did. It would only be useful if it performs a check for free space before
starting each segment, as not only other files may have been altered and
grown in the meantime, but other applications could have sucked up a bunch
of space--and it can't track all of that just based on a single check at
file inception.
mark->
--
Bring the web-enabling power of OneGate to -your- filePro applications today!
Try the live filePro-based, OneGate-enabled demo at the following URL:
http://www2.onnik.com/~fairlite/flfssindex.html
More information about the Filepro-list
mailing list