cloud server

Thu Mar 17 17:32:12 PDT 2016

On Thu, Mar 17, 2016 at 07:47:17PM -0400, Jose Lerebours via Filepro-list thus spoke:
> I must confess, this is totally new to me.  It blows my mind how is
> it that today we talk about terrabites, gigabites and 10,000mbps
> up/down etc. and still look at it as "slow".
> 
> In the not so far back times, xenix with 28k modem and a 10mb hd ran
> an outfit with nothing more than a couple of multiplexers to handle
> all the serial lines between locations.
> 
> You mention "windows clients" - Is the drag caused by the client or
> the application?  I really never heard of filePro to be a drag or
> hog for bandwidth.
> 
> I appreciate the educational info!

The problem is that storage sizes are increasing simultaneously with
transmission speeds.  

Sure, I used to have only 500GB of storage and a 1.5mbit down, 256kbit up
ADSL circuit.  Now I have 62mbit down, 6.2mbit up cable broadband, but I
also have 6TB of storage space to back up.  Not all of that is in use, but
it's my capacity.  

With compression, it takes 4.5 days to do a full backup and validation
with Acronis TrueImage to a WD EX4 NAS on my WiFi LAN.  That's actually
the recommended idiom for the EX4, is putting it directly on your router.
I had it directly hooked to the 1GB ethernet on my system at first, and
I saw -no- speed improvement at all.  It's the WD EX4's CPU, which is
under-specced for the throughput of anything higher than 50mbit.

As I was saying, it takes 4.5 days Ä to back up only 2.46TB of data in a
LAN situation.

Backup tech has not kept pace with the growing capacity of
consumer-oriented, commoditised storage.  To get anything fast -and- large,
you need to spend thousands.  That EX4 was $600.  I remember looking at
other solutions, "real" NAS vendors, SAN, etc.  It is -not- affordable
unless you have a huge revenue stream to justify getting into it.

In general, a lot of things have fallen by the wayside during this period
of tech growth since 1993.  Games used to have tightly written netcode
which could work over dialup, and now developers struggle to get
multiplayer working properly with a lot lower latency connections on
broadband.  A whole Linux -distribution- used to be able to run on a 4MB
386sx/25 (including TinyX!), and now the -kernel- won't even fit into 4MB,
let alone applications.  The linux kernel started requiring more RAM at
version 2.4.  I know, as the Cobalt Qube I had required a firmware flash to
let it take anything newer than 2.2.  So on the whole, tech is getting to
be less expensive, we have more capacity, higher speeds, and...fallen
standards.  That's a bad scenario any day of the week.

As for the Windows/filePro/networking/speed issue, you have to understand
what's happening under the hood.  The same thing holds true on Windows and
in *nix over NFS, by the way.  

Basically, think about your key or data segment.  I really only ever use
key, so we'll standardise on that for the scope of this discussion.  If you
look at your key, it's the full set of data.  If you're working with
relational tables, it's more than one key file, as well.  Let's say those
are the 32-bit limit of 2GB in size, just for the sake of argument.  Now,
you have indexes.  Those are much smaller files which contain information
which keeps you from having to sequentially read up to 2GB worth of data
every time you want something.  But if you reference the spec for indexes
at http://fptech.com/Products/Docs/FPFormat/autoix45.shtml you'll see the
math behind the BTree+ indexes.  You can figure out what fields you've used
for an index, do the math, and extrapolate out to the index size for a 2GB
file.  

Every time you look something up, the index is read.  I'm not sure if the
whole index needs to be read, or if BTree+ lets you stop early in the tree
once you find what you're looking for.  It really probably depends on the
purpose.  If you're running a report, it's probably more likely to need the
whole thing than if you're searching for a specific record.  But assume you
need the whole index.  Where's it stored?  It's stored either on the
Windows server hosting the files, or it's stored on a NAS/SAN for *nix,
usually under NFS.  In Windows, your client applications must drag it over
the network to read it, since there's no local copy.  This isn't like SQL,
where you submit a query and the server handles the heavy lifting.  The
filePro binaries on the client run identically to the ones on the server,
meaning they must read the whole file Ä except the clients must read it
across the network.  That entails IP, TCP, -and- protocol-level network
overhead.  The same is true of *nix systems using NFS or CIFS for their
storage.  And I can tell you that NFS was originally designed -so- bloated
that it was never meant to run on under 10mbit, just due to the overhead.
It is -not- an elegant protocol.  It has probably improved by v4, but
still, it's not lightweight.

This is the difference between client-server architectures:

*  With SQL the server takes a tiny query, does the heavy lifting directly
   on the server, and spits out -only- the result.  All file manipulation
   is in local storage (unless a SAN is involved, which is definitely
   possible in large deployment scenarios, but it's still less of an issue 
   due to the way the engines are optimised, and the faster protocols
   often in play (iSCSI being one example).  Also, you usually design
   the servers with decent provisioning and enough RAM to cache decently.

*  With filePro, the server is simply a datastore.  The 'client' is really
   written identically to the server, and requires that the index be read
   (possibly fully) in order to obtain the address within the key file to
   seek to to read the -actual- data.  Then it has to actually seek there
   and grab the data from the key file.  I can't speak to CIFS, but NFSv2
   and higher support and perform seek() on the server-side, so at least
   it doesn't have to ship the key file up to the seek point plus record
   size.  That said, if you are running a report which has to access a lot
   of (or all!) records, you -will- be shipping the majority or entirety
   of the key file over as well, since the non-index data is not stored in
   the index.

And this is why the fP client/server "model" sucks, compared to SQL.  Well,
one amongst several reasons.  We could get into the whole ACL issue, but
that's outside the scope of this discussion.  There are others, as well.

Doing this processing with large files on gigabit LAN is one thing,
although -still- painful, depending on the elegance/efficiency of the
coding, and the job requirements.  Doing it over mere broadband which
crosses even one hop which is narrower?  That's going to be far less
pleasant.  Also, you need to take fault tolerance (or lack thereof) into
account.  Across a WAN, iSCSI is a Very Bad Idea[tm].  I've hit fault
tolerance issues with it on a LAN.  I don't want to think about it on a
WAN.  An edge router falls over for some reason (RAM exhaused, BGP table
falls on the floor, name it...), and you've suddenly got a risk of data
corruption.

Oh, and compounding the issue, I'm guessing a lot of filePro "thin client"
boxes are built with very little RAM.  That's going to make disk caching
next to non-existent, which means you can't even count on caching for
performance benefits.

I would strongly recommend rethinking using cloud storage in realtime with
filePro.

mark->
-- 
Audio panton, cogito singularis.