Edit question -- the answer

Kenneth Brody kenbrody at bestweb.net
Mon Dec 13 09:48:44 PST 2004


Apparently, there was much discussion in the filePro room last night
about my edit question.  (Sorry I wasn't there.  I didn't know about
it until this morning, as I was busy with the kids when John left his
voice message.)

Anyway, here goes...

(Please make sure to read this using a monospace font.)

Think of the edits language as a state machine, consuming characters
from the input buffer and producing characters into the output buffer.

Here is the "monexp" edit again:

==========
monexp   monexp1 | monexp2 | monexp3
monexp1  "Jan"<uary> | "Feb"<ruary> | "Mar"<ch> | "Apr"<il> | "May" | "Jun"<e>
monexp2  "Jul"<y> | "Aug"<ust> | "Sep"<tember> | "Oct"<ober> | "Nov"<ember>
monexp3  "Dec"<ember>
==========

Given the input "Feb       ", let's see what happens.

First, the "monexp" edit checks to see if "monexp1" is successful.

The "monexp1" edit checks for a match against the literal "Jan", which
obviously fails.  So, it moves to the next part of the "or" sequence:

    "Feb"<ruary>

Input:  |Feb       |    Output: |          |
         ^                       ^

The input pointer is pointing to a sequence that matches the literal
"Feb", and so it copies it into the output buffer:

Input:  |Feb       |    Output: |Feb       |
            ^                       ^

The input is not pointing to "ruary", so it gets inserted:

Input:  |Feb       |    Output: |February  |
            ^                            ^

The "Feb"<ruary> part of the edit has been satisfied, so none of the other
"or" parts are processed.

The "monexp1" edit has succeeded, so we return to "monexp".  Again, this
is an "or" series.  The first piece has succeeded, so the rest of the
pieces are not processed.

We have now finished processing the "monexp" edit, and are left with the
following state:

Input:  |Feb       |    Output: |February  |
            ^                            ^

We are currently in a "success" state, and there is nothing left of the
edit.  All remaining input characters are blanks, so the entire edit has
succeeded.

Note the statement "all remaining input characters are blank" in the
above paragraph.  If the edit has successfully reached completion, and
the only thing left in the input buffer are spaces, the edit has
succeeded.  This is why there is no need to strip trailing blanks in
most cases.

Using the same "monexp" edit, let's try "Feb 6     " as the input.
Everything remains the same as above, except that the final state is:

Input:  |Feb 6     |    Output: |February  |
            ^                            ^

Note that when the edit reaches completion, you have a non-blank
character remaining in the input buffer.  This causes the edit to
fail, and the cursor will be pointing to the " " after the "b".  (Try
it in deddef to confirm this, if you wish.)


Now, on to the question about why the other edit requires that you strip
trailing spaces...


    { {!" "!}@ | "'"<"'"> | * }


First, let's try it without the stripping of trailing spaces:

    { "'"<"'"> | * }

Given the following:

Input:  |foo'bar   |   Output:  |          |
         ^                       ^

Let's see what happens.  The first character doesn't match the literal
"'", and so the first part of the "or" fails.  The second part -- * --
matches any single character, and so must succeed:

Input:  |foo'bar   |   Output:  |f         |
          ^                       ^

For the same reason, both "o"s fail the first part and pass the second:

Input:  |foo'bar   |   Output:  |foo       |
            ^                       ^

We have now reached a single-quote, and so the "'" is processed.  It is
not followed by a second "'", and so it is added:

Input:  |foo'bar   |   Output:  |foo''     |
             ^                        ^

Once more, the "b", "a", and "r" each fail the first part and pass the
second:

Input:  |foo'bar   |   Output:  |foo''bar  |
                ^                        ^

(Note that first the "b" passes the entire or-sequence, followed by the
"a", followed by the "r".  It is not that "bar" passes, but rather each
character, individually, passes.)

At this point, you are probably saying "the edit has run to completion,
and all that's left are spaces, so why doesn't it succeed?"

However, you are wrong when it comes to "the edit has run to completion".
Remember -- this is an "any amount of ..." edit, due to the surrounding
braces.  As such, it continues until the input fails.  Since spaces will
fail the first part, but pass the second, it can never fail until it
hits EOF.  Therefore, the first two of the remaining three spaces pass,
leaving you with this state:

Input:  |foo'bar   |   Output:  |foo''bar  |
                  ^                        ^

The third space still passes the * part of the edit, but there is no
room in the output buffer to place it, causing the edit to fail.


+-------------------------------------------------------------------------+
| The difference here is that, in addition to the "any amount of..." loop |
| (as someone pointed out), the loop contains an "accept anything" entry. |
| That "accept anything" will accept all trailing spaces, and attempt to  |
| put them into the output buffer.                                        |
+-------------------------------------------------------------------------+


Now, let's try a simple "delete spaces" rather than "delete trailing
spaces" modification.  (ie: leave off the "@"):

    { {!" "!} | "'"<"'"> | * }

That will work fine in this case, as after the "r" is processed, and
we're left in this state:

Input:  |foo'bar   |   Output:  |foo''bar  |
                ^                        ^

the {!" "!} will accept and delete the spaces:

Input:  |foo'bar   |   Output:  |foo''bar  |
                   ^                     ^

We have now reached EOF, so the "any amount of..." ends.  Only at that
point has the edit "run to completion", and since there is no non-blank
input left, the edit will succeed.

However, without the "@" EOF marker, this edit will delete _all_ spaces,
not just trailing spaces.

Input:  |foo bar   |   Output:  |          |
         ^                       ^

The "f", "o", and "o" are accepted just as before:

Input:  |foo bar   |   Output:  |foo       |
            ^                       ^

However, the space matches our "accept and delete" {!" "!} part, and so
it gets deleted:

Input:  |foo bar   |   Output:  |foo       |
             ^                      ^

Again, the "b", "a", and "r" are accepted:

Input:  |foo bar   |   Output:  |foobar    |
                ^                      ^

And again, the remaining spaces are deleted:

Input:  |foo bar   |   Output:  |foobar    |
                   ^                   ^

We have now reached EOF, and the edit has successfully run to completion.

Unfortunately, the space between "foo" and "bar" has been removed, which
we did not want to happen.


Now, let's check the version of the edit that I posted:

    { {!" "!}@ | "'"<"'"> | * }


Input:  |foo's bar   |   Output:  |            |
         ^                         ^

First, the "f", "o", and "o" are accepted:

Input:  |foo's bar   |   Output:  |foo         |
            ^                         ^

Next, the "'" is accepted, and since it's not followed by a "'", one is
inserted:

Input:  |foo's bar   |   Output:  |foo''       |
             ^                          ^

Next, the "s" is accepted:

Input:  |foo's bar   |   Output:  |foo''s      |
              ^                          ^

Now, we are pointing to a space.  Let's check the first part of our "or"
sequence:

    {!" "!}@

There is a sequence of spaces which can be deleted:

Input:  |foo's bar   |   Output:  |foo''s      |
               ^                         ^

However, we are not at EOF.  Therefore, the {!" "!} has succeeded, but that
part of the edit, as a whole, fails, bringing us back to the previous state:

Input:  |foo's bar   |   Output:  |foo''s      |
              ^                          ^

The "or" sequence continues, failing the "'" and passing the *, thereby
accepting the space:

Input:  |foo's bar   |   Output:  |foo''s      |
               ^                          ^

Once again, the "b", "a", and "r" pass as before:

Input:  |foo's bar   |   Output:  |foo''s bar  |
                  ^                          ^

(Remember, they pass individually, not as a group.  Each one fails the
first and second part of the "or" sequence, and passes the third part.
After each one passes the third part, the next character starts the
entire "or" group again.  This is because the * only accepts one
character.)

Once again, we're back to the {!" "!} part of the edit, and we have a
sequence of spaces to accept and delete:

Input:  |foo's bar   |   Output:  |foo''s bar  |
                     ^                       ^

However, this time the @ matches EOF, causing this part of the edit as
a whole to succeed.

Finally, we're at EOF, so our "any amount of..." ends, and the edit
has run to completion.  Since there are no non-blank characters in the
input buffer left, the edit has succeeded.


So, there are two main things to remember here:

    When an edit successfully runs to completion, and there are no non-
    blank characters remaining in the input buffer, the edit has succeeded.
    (ie: the trailing spaces do not cause the edit to fail.)

and

    An "any amount of..." loop which accepts spaces, will accept trailing
    spaces as well.  There must be room in the output to fit these spaces.

-- 
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody        | www.hvcomputer.com |                             |
| kenbrody/at\spamcop.net | www.fptech.com     | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:ThisIsASpamTrap at gmail.com>



More information about the Filepro-list mailing list