File Pro And Bus Errors

Brian K. White brian at aljex.com
Fri Jul 8 12:07:57 PDT 2005


----- Original Message ----- 
From: "J. Ryan Kelley" <ryan.kelley at trinitytransport.com>
To: "fplist3" <Filepro-list at lists.celestial.com>
Sent: Friday, July 08, 2005 11:44 AM
Subject: Re: File Pro And Bus Errors


>
>
> J. Ryan Kelley wrote:
>
>> Good Morning,
>>
>> I've been working with Filepro for about 8 months now after recently 
>> graduating from college with a degree in Computer Science.  I've had 
>> experiences working with everything from C, and C++ to Fortran, to Lisp, 
>> and countless other languages, however never with FilePro until I started 
>> at my current position.  When I first started working with Filepro, I'll 
>> admit, I was impressed with how easy it was to step in and start working. 
>> In fact, I was doing some programming in my first few hours on my first 
>> day.  The primary thing that I have found unsettling from day one, 
>> however, is the presence of the dreaded Bus Error.  I'm quite familiar 
>> with Bus Errors (aka Segmentation Faults) from my days working in C, 
>> where attempting to access a memory location you don't have permissions 
>> to access would cause this problem and dump you out.  It was usually 
>> caused by an improperly indexed array, or an erroneous pointer.  This all 
>> made sense in C because of how low level it was, giving you direct access 
>> to the memory.
>> Filepro, at least relative to C, is much more high level.  There is no 
>> point at which you are attempting to manipulate the system memory 
>> directly so these Bus Errors should cease to exist, unless being caused 
>> by a compiler/interpreter.  When I've questioned co-workers and other 
>> filepro developers in the past about what causes these errors, and how 
>> they can be avoided, I always get one reply, "Remove your comments." 
>> When I was first told this, I disputed it, and disputed it.  After all, 
>> in all (or at least i thought all) compilers/interpreters comments are 
>> completely disregarded and treated as if they were simply white space. 
>> Sadly, I have seen several instances in which removing some comments can 
>> remedy these errors.  In my opinion, comments are a very important part 
>> of the code itself, especially when working in a system with several 
>> developers who need to be able to easily digest what another programmer 
>> was attempting to do.  For a comment to attempt to access an unauthorized 
>> memory location is illogical, and I'm assuming that there is a better 
>> explanation for all of this.  As the days wear on, the bus errors seem to 
>> become more and more prevalent and are creating some serious issues that 
>> take me away from my daily duties.  Can anyone provide some helpful 
>> insight onto this problem, do comments really cause Bus Errors?  If So 
>> why and how?  Is there a patch in the works? and what else in FilePro can 
>> cause these nasty errors.
>>
>> Thank you,
>>
> One of the suggestions that was given to me to track down the source of 
> these bus errors was to run the code through dclerk w/ the -db argument to 
> debug to see exactly where these bus errors are occuring, oddly enough 
> most (if not all) of these bus errors are occuring before debug even 
> begins to go through any of the code, yet if I run debug with out input 
> processing (-z xxyyzzz) all goes well.  Any ideas?
>
> Thanks

When it fails before hitting any processing, that usually means it hit 
something corrupt while reading the various files it reads before it gets to 
the processing, not a problem in the processing itself.
The fact that it doesn't crash by changing nothing but avoiding the 
processing implies that the problem is in the processing.

That's pretty interesting.

Processing is scanned completely and some things are done as a result before 
the first line of code is run.
Probably there is something in the processing that causes a crash that has 
nothing to do with running the process.

* Look for variable and array definitions and try moving them all to a spot 
outside the flow of processing and remove any duplicates along the way.
That just means go to the end of the table, and make sure the last line is 
and end or a return, and after that line add new lines that declare the long 
variables, define the short variables, and dimesion the arrays. Don't put 
any labels on any of these lines and don't put anything in anywhere else 
that will cause the focus of processing to go to this spot. It's in the 
table, but the running program never goes here.

* try defining variables that weren't defined where possible. some are left 
undefined on purpose but some could be defined.

* Look for ambiguity and technically-not-ambiguity-but-still-unwise and 
remove it. This is will be some work. It means look for variable names, 
lookup aliases, line labels, array names, that are the same as fp commands 
and other special fp words or the same as each other, or re-used for 
different purposes, and remove all ambiguity.
Example: If there is a line label or a lookup alias or a long variable named 
"input", and some if: statement somewhere else that says if: input  or  if: 
not input,  change the label/alias/variable and all references to it from 
input to inpt, (input is a command)
If there is a lookup like  lookup tom = fileaa -k yy ...
and then somewhere else a completely unrelated lookup but also named tom, 
like lookup tom = filezz -k nn ...
then change one of the lookups to something other than tom and fix up the 
subsequent references to it f course.
Although be careful, there are legitimate constructs that look like that, 
that you don't want to change.
Like several variations of a given lookup, and the subsequent code is used 
the same way by whichever one of several versions of the lookup were chosen.
Example:
if: aaa
then: lookup post = fileaaa ...
if: bbb
then: lookup post = filebba ...
if: ccc
then: lookup post = fileccc
if:
then: post(1)=1 ; post(2)=@id ; ... and so on.


I don't think I ever saw a bus error myself. I have seen out of memory 
errors caused by carelessly written loops and gotos and gosubs that resulted 
in a loop that runs forever, sometimes claiming a chunk of memory over and 
over forever until the user/process quota is used up. Or gosubs that can end 
up nesting too deep at run-time.

I have also seen crashes caused by technically "legal" but odd things. Even 
cases where something was not only technically legal according to the docs, 
but proven so by being used on other platforms.

At your site as a matter of fact I discovered a real doozy where not only 
did a crash only occur on a certain platform even though all other details 
were the same (same code, same data, same config file, same version of fp), 
but it even turned out that the crash didn't happen on another platforms 
binaries running on the first platform via emulation.

At your site, and your site only, clerk and/or report were crashing (core 
dump & exit) during the output destination select dialog invoked by the -pq 
command line option.
They crashed every time -pq was used, and only when -pq was used.

We kind of needed that so that was a problem.
You had Linux originally (wasn't our choice, but we didn't have strong 
argument to avoid it, since it was known to run at least OK and I considered 
myself perfectly competent to support it as far as install, config, handling 
any kind of problems that might come up, so we warily went along) and we 
couldn't change it lightly, but we did discover that sco binaries, running 
on linux, didn't crash.
We actually had to switch platforms with fp to get sco binaries but stayed 
running them on linux for about a year.

I did eventually find the cause of the problem and if I'd discovered this 
earlier we could've kept using the original linux binaries.
(though we had other problems on linux that were also intolerable so 
eventually it needed to go anyways.)

It took a long time to track it down because the thing that caused the crash 
was not anything illegal, and it was something that all my other boxes had 
and they didn't have any problem.

On all my boxes, I have stuff in /etc/profile to detect if the user is 
logging in from inside or outside the lan.
If they are logginging in from outside the lane, I set env variables to 
override the normal default printer and make their default printer do 
passthru-print.

I was doing that admittedly not the neatest way possible and I don't do it 
exactly the same today, but it was in production on a lot of boxes and quite 
proven.
If the user was not on the lan, then I set PFPT=on, and set 
PFPRINTER=pfpt-hpl
and the printer pfpt-hpl was defined like this:

printer5=pfpt-hpl,hplaser,,local laser w/pfpt

The reason was I wanted PFPRINTER to be set to something, and I wanted the 
printer type to be hplaser, but I didn't want any command associated with 
the printer. I only wanted fp to handle the printer on/off codes 100% via 
PFPT=on

I also had a similar printer definition for matrix printers.

Eventually I discovered that if the config file didn't have any printer 
definitions with an empty print command field like above, then linux 
binaries didn't crash even when -pq was used.

Your current problem may be caused by something odd in fp config, a 
component file with an odd name, like an output format named
out.<very long name>
out.<name with spaces>
out.<name with escape sequences from pressing arrow keys during 
naming/moving at the comandline under old/simple shells that don't have 
input line editing>
out.<name with other common "bad" characters>

Or a corrupt component file that, even though your command and your process 
might not require that component, (say a screen or an index that you didn't 
use) it may still get scanned during start up before it knows it won't need 
it and cause fp to barf right then.

Like that bad printer definition above. It didn't matter if actually I tried 
to choose that printer during the -pq dialog or not, and didn't matter if 
the users env had PFPRINTER=<that printer> or not.
It caused a crash merely because it existed and was part of the enviromnet 
fp needed to scan.

Or even more indirect relationships like, say... an output format with a 
certain print code on it somewhere, and in a certain print code table that 
print code refers to a download file (print code with %"filename"), and 
maybe there is something weird about that file. Something weirder than 
merely not existing or not having permission to read it. Those just cause a 
graceful error at print time.

The point is just that that's the kind of non-obvious and indirect 
relationship you have to be on the lookout for in general.

btw I forgot to say before, welcome to the club :)
(that wasn't meant to be sarcastic as in, fp problem sufferers, I meant fp 
users & developers)

Brian K. White  --  brian at aljex.com  --  http://www.aljex.com/bkw/
+++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
filePro  BBx    Linux  SCO  FreeBSD    #callahans  Satriani  Filk!
 



More information about the Filepro-list mailing list