Problem with lockfiles, and file ownership(?)

Del neroni3000 at comcast.net
Sun Feb 7 12:49:16 PST 2016


    I recently coded a really nifty, but very complex process involving an input (dclerk) process on one file repetitively issuing a system command to execute a dclerk process on another file.   This “called” process (as I will refer to it, even though it is executed via a system command) starts up in add records mode and, in the course of adding a record, also does lots of lookups and updating of records on multiple files.  When the called process is finished, it releases control back to the calling process via an Exit command, at which the calling process (or Mother task?) grabs another record and re-calls the “called” process.   Also, this “called” input process, which is normally used directly by the end users, in this case runs without human intervention, using the pushkey command to simulate human input.  
   To make a long story short, I began to run into lockfile and file ownership problems.  That is, the whole process would run a couple of times without any problem, very fast, doing a bunch of input and record updates that would take a human operator way, way more time to input manually – and then, on the third or fourth try, after restoring the very same data files, it would blow up with an error message of “lockfile not found”.  {This is the message I got most frequently, but I also sometimes got a Windows message 32, saying a file was currently in use by another process, or something to that effect, which was obviously untrue since no other processes were running.)  So I took a look at the lockfile, and it was there ok, so why didn’t filepro find it?  After a lot of messing around, I realized that this message really meant that the lockfile was still owned by by an earlier iteration of the same process and for that reason was not available later when the process again issued a lookup or update or whatever it was that prompted the error.  
    Every time I tested, I first restored the data files (data, key, index, and lockfiles) from a backup directory that never changed, and it would usually work just fine the first two or three times, and then it would pop up with some error or other, in different places and completely randomly.  In other words, it was not predictable or deliberately repeatable.  
    So I got desperate and started deleting lockfiles, just to see if that would solve the problem, and I put in commands to execute  “dprodir (filename) –l” in a lot of places where it was blowing up, and that seemed to reduce the frequency of the problem, but it still happened on the third or forth run through the same process with identical data.   
    After seeing it work so many times with identical (and even sometimes different) data, I decided that this was not an error in my code, but had to be some kind of dclerk5.0.14DN9/Windows 7 PRO operating system problem, maybe a timing problem of some sort, where files, like a lockfile, were not being released by Windows to keep up with the very fast running process, thus causing subsequent attempts to reuse the same file to fail.  In order to test my theory further, I switched from the multi-user version of filepro to a single user version that I own.  The problem with the lockfiles went away completely, and I ran the whole thing successfully a few times, but then it still failed.  It blew up with a system message saying an index was unavailable because it was in use by another program.  This pretty much confirmed my opinion about the problem with Windows not releasing ownership of files when the “called” routine exited, because it ran perfectly five times out of six, after restoring the data files each time.  So the question is, what to do about it?
    Well, no doubt I could convert the whole process to use dreport instead of repetitive system calls to dclerk – that should get rid of the “file is in use by another program” problem – which by the way should not be happening, since the files should all be released when the “called” process exits.  But that would entail a LOT of extra work at this point, and I like the way this works when it does.  Besides, I am stubborn when it comes to stuff like this.  However, it seems that there is not much that I can do, since the error is not in my code and getting it fixed, whatever it is, is highly unlikely.
    One thing I will try is to run it on a client machine that is faster and with Windows 10.  I don’t really expect that to work any differently, but it is worth a try.  I am also wondering if I could take the input processes that I am executing using the system command, and convert them to called programs (using the call instead of the system command) or subroutines of the “calling” process (using gosub), so that I won’t get the Windows error 32 problem.  That would also take a lot of work, but maybe less than converting to dreport, and I think it would probably solve the problem.  I would prefer to avoid the extra time that would take, so I would have to be sure it would solve the problem before I commit to doing it.  
    I would like to understand WHY this is happening, because I have used this technique before and never had this kind of problem with it.   If anyone can enlighten me, I would appreciate it.  If anyone has other suggestions about ways to get around the problem, I would be glad to hear them. 
    If you have read all the way through this admittedly long email, thank you for your patience.

Sincerely,
Del Neroni
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.celestial.com/pipermail/filepro-list/attachments/20160207/ce8a439d/attachment.html>


More information about the Filepro-list mailing list