rreport gagging on lockfile

Mon Feb 1 22:14:42 PST 2010

Only thing I can suggest at this point is run the process with the
interactive debugger. Completely lcear the lockfile before starting. (I mean
erase it).  Then step through each critical point until you can see exactly
what is causing the hang.

Are you familiar with the debugger?

John

> -----Original Message-----
> From: Tyler Style [mailto:tyler.style at gmail.com] 
> Sent: Monday, February 01, 2010 11:23 PM
> To: john at valar.com
> Cc: filepro-list at lists.celestial.com
> Subject: Re: rreport gagging on lockfile
> 
> 
> 
> John Esak wrote:
>  > 1. Okay, be more specific. You say you are using the lockinfo 
> script.  So, you can see exactly which record is being locked 
> by exactly 
> which binary.  What does it show?  Record 1 by dclerk, or record 1 by 
> dreport.... exactly what does lockinfo show.... by any chance are you 
> locking record 0?  Not something you could do specificially, 
> but filePro 
> does this from time to time.
> While I have the error message from rreport on one terminal 
> and the same 
> error message from rclerk on another, lockinfo will produce 
> "There are 
> NO locks on the "log_operations" key file."
> 
> While every call to rreport starts off with -sr 1, there is a 
> lookup - 
> in the processing that moves it to a random record (between 1 
> and 180) 
> as the first command to keep it from hogging the file.  Records 1-180 
> all exist.
> 
>  > 2. It's always easier when people say this has worked for 
> years.  So, 
> it must be something new added to the soup.  Have you removed 
> an index, 
> grown a field and not changed the size an index pointing to it.  Gone 
> past some imposed time barrier?  Used up too many licenses? Exceeded 
> some quota in some parameter?  Added groups or changed 
> permissions?  Run 
> a fixmog (fix permissions)?  Has a binary failed like dclerk 
> and you've 
> replaced it with a different copy?   Has the -u flag any 
> immpact on your 
> scenario?  I'm assuming a lot because you haven't 
> specifically shown how 
> you are doing things?  Is this happening from a system call?
> 
> Absolutely nothing has done to change the file or the 
> processing for a 
> couple years.  The only thing that has happened to the file 
> is that it 
> has grown larger over time.
> There is definitely no time limit imposed in the processing; 
> I don't see 
> how would that produce a lock issue, anyway?
> We have way more licenses than we can use after cutting 70% 
> of our staff 
> last year :P
> Exceeding a quota in a parameter would mean something had 
> changed with 
> the file or processing, and nothing has.
> We haven't changed groups or permissions in years either - 
> the current 
> setup is pretty static.
> Fixmog (our version is called 'correct') hasn't been executed 
> in months 
> according to the log it keeps.
> No binaries have been swapped in or out (we'd like to tho! 
> still haven't 
> got 5.6 to pass all our tests on our test box unfortunately)
> -u shouldn't make any diff; it's not used and if we needed to 
> use it I 
> am certain the need would have shown up sometime prior to this.
> 
> A typical use would be to add this to the end of a bash 
> script to record 
> that a script had completed running:
> ARGPM="file=none;processing=none;qualifier=hh;script=importshi
> p;user=$LOGNAME;no
> te=none;status=COMPLETED"
> /appl/fp/rreport log_operations -fp log_it -sr 1 -r $ARGPM -h 
> "Logging"
> 
> Most of the actual processing just parses @PM, looks up a 
> free record, 
> and puts data in the correct fields.
> 
> No other processing anywhere ever looks up the file; it is strictly a 
> log, nothing more, and the only processing that touches it (log_it) 
> always either run via a script command or a SYSTEM command.
> 
> Things we tried to see if they would help:
> * file had 600,000 records going back 4yrs, so we copied the data to 
> another qualifier, deleted the original qualifier, and copy back the 
> most recent 10,000 entries to see if it was just a size issue.
> * rebuilt all the indices.
> * rebooting the OS.
> 
> This logging hasn't been added to any new processing or scripts for 
> several months.
> 
>  >  
>  > I agree that the code would not seem to be importatn since it has 
> worked... before, so again, it seems like the environment has changed 
> somehow.  Maybe if we saw the whole setup, relevant code and all we 
> could give more suggestions.  Oh, I just thought of one... is it 
> possible you are looking up to a particular record, say 
> record 1... and 
> that record is not there anymore?
> 
> All the records being looked up to exist.  The environment is pretty 
> static - our needs have been pretty clearly defined by this point and 
> new systems are almost always implemented on our Debian boxes 
> as SCO is 
> so limiting and so badly supported.
> 
> Thanks for the ideas!  Hopefully my answers might light up a 
> bulb over 
> someone's head...
> 
> Tyler
>