grep, etc

Wed Aug 18 13:23:56 PDT 2004

Bob Stockler wrote:
> On Wed, Aug 18, 2004 at 12:09:47PM -0400, Brian K. White wrote:
>> Jay R. Ashworth wrote:
>>> On Wed, Aug 18, 2004 at 01:17:23AM -0400, Brian K. White wrote:
>>>> Jay R. Ashworth wrote:
>>>>> On Tue, Aug 17, 2004 at 11:08:11PM -0400, Brian K. White wrote:
>>>>>> Fairlight wrote:
>>>>>>> Sweet.  I should grab that.  :)  Basically a -useful- egrep.  :)
>>>>>>> Well, egrep is useful, but not as useful as it otherwise could
>>>>>>> be.
>>>>>>
>>>>>> I'd say the plain old stock sco grep was useful, since it aswered
>>>>>> the need very directly in one command without especially exotic
>>>>>> options or a pipeline. Didn't even require the stock egrep let
>>>>>> alone gnu grep or pcregrep.
>>>>>
>>>>> Well, except that it *didn't* fulfill the requirement of the
>>>>> poster who started this thread, without a big ungainly pipline
>>>>> wrapped around it.
>>>>
>>>> What are you talking about?
>>>>
>>>> The stated request was:
>>>>
>>>> sample data:
>>>> ISA*00000089** *
>>>> CLM*inv123456*a*b*c         -213
>>>> SV1*23456*25*A*B            -214
>>>>
>>>> I want to be able to view a range of lines, ie: from -213 to -216
>>>>
>>>> In what way does this fail to meet that request?
>>>> grep '-21[3-6]$' file
>>>
>>> Well, in the *exact* instance he used as an example, it would work.
>>>
>>> But what happens if he wants -213 through -226?
>>>
>>> There isn't actually, quite enough information about his
>>> requirements here to design a solution that's guaranteed to work.
>>
>> head-smack
>> you are right of course.
>> Bob, "I lose!"
>>
>> My next choice would have been awk by using * as the field
>> delimiter, sub*() to pluck out the 216, and if() the variable is
>> within the limits print $0. 3 or 4 lines minimum, 5 I think if I
>> can't find 216 by some better means than counting back from the end
>> of the field.
>
> I don't know what you're talking about here.  The beginning
> and ending strings are the final fields on the lines, so $NF
> would hold them.
>
> My initial solution was:
>
>   1 - when beginning $NF is matched, print the line and set a flag
>   2 - when ending $NF is matched, print the line and exit.
>   3 - if flag is set print the line.
>
> I have a little perl program (cgrep) that prints N (default 3)
> lines on each side of a line matching a pattern.  If you want
> an uneven number of lines that would be fine (though slower that
> the AWK program), but if you wanted an even number of lines its
> output would have to be piped through grep -v to remove either
> the first or last line.

three problems with that:

1) looks to me like the field delimiter is * and so we should not count on
those spaces. Since we do a lot of edi also, I've seen the * used as
delimiter before also.

2) I would not have assumed that the records were in numerical order by the
last field. Maybe they are, maybe they only _usually_ are, maybe they are
_supposed_ to be, but I'd want to see anything that matches, especially if
it would have fallen in an unexpected place.

3) Given point 2 above, the set-run-until-unset method won't work, so I have
to evaluate the field numerically by testing for
greater-than/less-than/equals. That means I have to extract "216" out of
  -216"

Given the sample data, I don't _really_ know that the last field won't
contain more than one dash sometime, or no dash at all, or have a space
after it or whatever. If the dash is a reliable and consistent feature, then
I can find it's position and snip everything from it to the end of the field
and treat that variable as an integer. I use the field rather than the line,
since maybe a dash can appear in some other field some time.

Brian K. White  --  brian at aljex.com  --  http://www.aljex.com/bkw/
+++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
filePro BBx  Linux SCO  Prosper/FACTS AutoCAD  #callahans Satriani