OT: multi-platform *nix blocksize detection

Bill Vermillion fp at wjv.com
Wed Oct 20 13:49:04 PDT 2004


On Wed, Oct 20 15:23 , Fairlight gie sprachen "Vyizdur zomen 
emororz izaziz zander izorziz", and continued with: 

> On Wed, Oct 20, 2004 at 02:27:42PM -0400, Bill Vermillion may or may not have
> proven themselves an utter git by pronouncing:

> > > Something is goofy on linux systems.

> > If thought that since the first Linux system I saw :-)

> You know....*FWAP!*  :)

But you are such a good strait man I couldn't resist.

> > I'm looking at a SuSE system now.  On it the /etc/profile
> > is 6845 bytes long.    ls -ls gives me 6845 bytes but ls -lk
> > gives me 7 - it has allocated 7K.

> You're surely not using ext2. Check your fs type. You
> -shouldn't- be. SuSE likes to default to ReiserFS, and if you
> don't use that, you'd probably want to use ext3. Both of those
> are journalling, ext2 is not.

> > Perhaps during install someone change the default block size
> > in mkfs.ext2  or mkfs.ext3 [or whatever] when the file system
> > was created.  You can not depend on all systems being installed
> > with the default block sizes.

> True. Which is -why- I wanted a surefire way to find the
> default block size--preferably without relying on 'dd' and 'ls'
> to do so. It's tempting to look inside the source to 'ls' and
> find out exactly how it comes up with its number of blocks. It
> seems to be reliable on any system.

You can NOT rely on 'ls' because it will show you the length of
the file based on the information in the inode by looking at the
starting block, the ending block, and the ending byte in the last
block. The actual disk space could be far far less than that.

> > The SuSE 9.1 I looked like is not jibing with the SuSE you
> > are indicating.  And shouldn't the file creation be independant of
> > the OS, and in particular the system utilities such as ls, df, et
> > al shouldn't be kernle dependant but upon the filesystem.

> 9.1 runs the 2.6 kernel, 9.0 runs 2.4. This is why I won't use
> 9.0. :)

I know you've said that before.

> As for your second question...yes, the operation should be
> filesystem-dependant, although you have to keep in mind that
> the VFS layer is in the kernel, and that's probably where
> linux's bits/stat.h comment about 512-byte blocks for st_blocks
> comes into play.

I disagree.  I would think it should be filesystem-INdependant.
That way you should be able to use it among all OSes extant.

> What's really weird is that I create a 1024-byte file on
> Solaris and I get an 8K reading for blocks. Funny thing is,
> if I just echo a few words into a file (18 bytes), I get 0K
> allocated. I wonder where Solaris thinks it's storing the data,
> and what the threshhold for using extra blocks is. What, is
> there room in the inode itself for short, tiny things?

Yup.  Even SCO stores very short files in the inode.  It makes good
sense as it save an extra disk seek and therefore speeds up
performance.  This is perfect for symlinks. 

> > > So it's not even a matter of scaling down to 512-byte blocks,
> > > it's a matter of always dividing by 2 to get the right number.

> > On that system.  What is there is another block size specified.
> > Linux permits 1024, 2048, and 4096.  Systems such as FreeBSD
> > default to 16384.  They do not recommend going above this - but
> > as I recall the maximum filesystem allocation can be huge.
> > In the XFS in Iris you can set allocation units to be 1MB long [or
> > more].  Which is ideal for large media files to keep the files
> > contiguous on the disk sub-system.

> It can be set. The problem isn't that there -are- different
> block sizes, the problem is -detecting- them without resorting
> to using 'ls'.

And what about a system that has different block sizes and perhaps
different filesystems on the same drive which it can access.

And I'd not depend on 'ls' as if you read the man pages for
different versions you will see that even the options to 'ls'
differ.

> > I can see the problem with getting it from multiple sources.
> > But why not look at the lenght of the file as it grows and look at
> > it in bytes then it should be portable anywhere.   As far as I know
> > the byte size will be indentical - and that's probably all you can
> > count on.

> Because it doesn't grow in bytes. The st_size value remains a
> constant from the file's creation--the maximum size that the
> file will ever be. It allocates the entire file so that it can
> seek to a segment start point and write the appropriate segment
> from any given individual source (multiples are done at once).

So it is pre-allocating the file - probably doing and lseek to
the expected EOF.

> Byte sizes are useless. You have to use blocks for this
> particular task, AFAICT.

That wan't clear from your first post.  Sorry.

> > You mean so that an 'ls' will show the size of the file, but
> > the 'df' will show only the blocks that have been used.

> > It's been years since I've diddle with these but an lseek will
> > let you extend a file beyond the current EOF and as you add
> > data the 'holes' in the middle will be filled.

> Appreciate that tidbit!

> And that's what they're doing in cases like this--extending the
> file out to the full length, then filling in the gaps as each segment
> becomes available.

Then the only thing that is going to let you how much space is
being used AFAIK is 'dd' if that's what you are hunting for?

The problem I can see is that if the file extends the file to the
full length and there is not enought space on the drive when all
pieces/parts are retreived you will have a corrupt file.

So you could see how big the file is going to be via an 'ls' but
you'd have to use something like 'dd' to look as the amount of 
real space available.   But that is going to be slow.

I take it this is going to be for something like bittorrent?

Is it checking for true free space?  If so you might snag their
code and check it.  It uses the MIT License so you can do almost
anything you want with it.

Bill
-- 
Bill Vermillion - bv @ wjv . com


More information about the Filepro-list mailing list