• RegExp Behaviour in FSE

    From Deuce@VERT/SYNCNIX to All on Tuesday, February 28, 2006 23:42:00
    So, in fseditor.js, I plan on implementing for regular expression stuff (your welcome Angus). However, I'm unsure of one teensly little thing...

    Since fseditor does word wrapping, there isn't *really* an end of line unless the user presses enter... so, should /^(.*)$/ match a single wrapped line, or should it match from one CR to the next (in general, a complete paragraph)?

    How should the m and s flags interact with this?

    ---
    This sig is not directed at Jazzman.

    ---
    þ Synchronet þ My Brand-New BBS (All the cool SysOps run STOCK!)
  • From Angus McLeod@VERT/ANJO to Deuce on Wednesday, March 01, 2006 10:22:00
    Re: RegExp Behaviour in FSE
    By: Deuce to All on Tue Feb 28 2006 23:42:00

    So, in fseditor.js, I plan on implementing for regular expression stuff (you welcome Angus). However, I'm unsure of one teensly little thing...

    Appreciated!

    Since fseditor does word wrapping, there isn't *really* an end of line unles the user presses enter... so, should /^(.*)$/ match a single wrapped line, o should it match from one CR to the next (in general, a complete paragraph)?

    How should the m and s flags interact with this?

    Okay, my immediate thoughts: The RE applies to a string. What string
    will it apply to? Presumably, selected text, or the entire document, depending.

    Selected text *or* the entire document, might consist of several
    paragraphs. Therefore, the un-flagged RE should refer to the entire
    unmodified string, selection or all. The RE should (IMHO) treat both hard *and* soft newlines as newlines. Soft newlines as inserted by an
    automated wordwrap are only a convenience for the user. There shouldn't
    be any confusion. If the user, looking at the text *sees* a newline, then
    the RE should respond as expected. Were I to apply /^para/m to this parabraph, for instance, it should match *twice*, even though one instance
    of the word 'paragraph' was explicitly wrapped by the ENTER key, and the
    other was auto wrapped.

    The 'm' and 's' flags should modify the behaviour of the RE as normal, but (IMHO) it should treat *both* hard and soft newlines as newlines.

    Thoughts?



    ---
    Playing: "Try & love again" by "Eagles"
    from the "Hotel California" album
    þ Synchronet þ Making sure Jason works OK at The ANJO BBS
  • From Deuce@VERT/SYNCNIX to Angus McLeod on Wednesday, March 01, 2006 20:21:00
    Re: RegExp Behaviour in FSE
    By: Angus McLeod to Deuce on Wed Mar 01 2006 10:22 am

    Selected text *or* the entire document, might consist of several
    paragraphs. Therefore, the un-flagged RE should refer to the entire unmodified string, selection or all. The RE should (IMHO) treat both hard *and* soft newlines as newlines. Soft newlines as inserted by an
    automated wordwrap are only a convenience for the user. There shouldn't
    be any confusion. If the user, looking at the text *sees* a newline, then the RE should respond as expected. Were I to apply /^para/m to this parabraph, for instance, it should match *twice*, even though one instance of the word 'paragraph' was explicitly wrapped by the ENTER key, and the other was auto wrapped.

    The 'm' and 's' flags should modify the behaviour of the RE as normal, but (IMHO) it should treat *both* hard and soft newlines as newlines.

    Thoughts?

    That's pretty much how I'm leaning.. was just worried about the case where Dumb User posted something about Dumb User and I wanted to quickly do a nice simple s/Dumb User/Idiot/g and it only matches one. I mean... it's going to need to rewrap after a replacement anyways.

    The other option would be to use only hard CRs in the string, and allow the m flag to use wrapped lines. iirc, in general, ^ and $ match the beginning and end of a *STRING* not a line. So the m flag would make ^ and $ match lines. The s flag wouldn't need any special handling at all since it merely expands what . matches to include newlines. So your example would work as you expected with the m flag, and not match anything otherwise. ie:

    $_="Two line\nParagraphs\n";
    print "No flags\n" if(/^Para/);
    print "m flag\n" if(/^Para/m);

    Further, if it included soft CRs, that would be an extra bit of whitespace where there "really" isnt.

    ---
    This sig is not directed at Jazzman.

    ---
    þ Synchronet þ My Brand-New BBS (All the cool SysOps run STOCK!)
  • From Angus McLeod@VERT/ANJO to Deuce on Thursday, March 02, 2006 00:54:00
    Re: RegExp Behaviour in FSE
    By: Deuce to Angus McLeod on Wed Mar 01 2006 20:21:00

    That's pretty much how I'm leaning.. was just worried about the case where D User posted something about Dumb User and I wanted to quickly do a nice simp s/Dumb User/Idiot/g and it only matches one. I mean... it's going to need t rewrap after a replacement anyways.

    Hmmm. /s makes "." match "\n" but that about "\s" ? maybe you need to
    do something like s/Dumb\sUser/Idiot/sg and let the \s match the
    whitespace between "Dumb" and "User".

    Look, people who want to use complex RE's will simply have to *learn*
    complex RE's.

    iirc, in general, ^ and $ match the beginning an end of a *STRING* not
    a line.

    Correct. It doesn't apply to "lines" at all, unless of course, you store
    that line in a string.

    So the m flag would make ^ and $ match lines.

    Exactly. Assuming that you treat a soft CR as a newline for the purposes
    of ^ and $ same as a hard CR.

    The s flag wouldn't need any special handling at all since it merely expands what . matches to include newlines. So your example would work as you expec with the m flag, and not match anything otherwise. ie:

    $_="Two line\nParagraphs\n";
    print "No flags\n" if(/^Para/);
    print "m flag\n" if(/^Para/m);

    Which seems to me to be intuitive.

    Further, if it included soft CRs, that would be an extra bit of whitespace where there "really" isnt.

    Well, the separation between the last word on one line and the first word
    on the next is in fact "whitespace". Whether it is a space character or a newline, it is still whitespace and matches \s. So if you treated a soft
    CR like a hard CR, you're doing the right thing, because (presumably) for
    the auto-wordwrap to ahve kicked in, the user must have tyoed a space or
    tab or something at that point in the input text.

    To completely throw a spanner in the works: Quoting. How will you treat something like

    yada yada yada yada yada yada yada yada yada yada yada yada Dumb
    User yada yada yada yada yada yada yada

    when searching for /Dumb User/ or even /Dubm\s/User/m ? ;-)



    ---
    Playing: "Smokin Banana Peels" by "The Dead Milkmen"
    from the "Death Rides a Pale Cow" album
    þ Synchronet þ Making sure Jason works OK at The ANJO BBS
  • From Deuce@VERT/SYNCNIX to Angus McLeod on Thursday, March 02, 2006 12:45:00
    Re: RegExp Behaviour in FSE
    By: Angus McLeod to Deuce on Thu Mar 02 2006 12:54 am

    The s flag wouldn't need any special handling at all since it merely expa what . matches to include newlines. So your example would work as you ex with the m flag, and not match anything otherwise. ie:

    $_="Two line\nParagraphs\n";
    print "No flags\n" if(/^Para/);
    print "m flag\n" if(/^Para/m);

    Which seems to me to be intuitive.

    Further, if it included soft CRs, that would be an extra bit of whitespac where there "really" isnt.

    Well, the separation between the last word on one line and the first word
    on the next is in fact "whitespace". Whether it is a space character or a newline, it is still whitespace and matches \s. So if you treated a soft
    CR like a hard CR, you're doing the right thing, because (presumably) for the auto-wordwrap to ahve kicked in, the user must have tyoed a space or
    tab or something at that point in the input text.

    To completely throw a spanner in the works: Quoting. How will you treat something like

    > yada yada yada yada yada yada yada yada yada yada yada yada Dumb
    > User yada yada yada yada yada yada yada

    when searching for /Dumb User/ or even /Dubm\s/User/m ? ;-)

    Well, I can't say it seems completely intuitive, since your example would actually not have done what you said it would have. :-)

    As for adding a soft CR, the spaces are *still* at the end of the line... so if soft CRs were expanded to hard, you'd need /Dumb\s+User/ reather than /Dumb\sUser/ or /Dumb User/

    As for quoting, I plan on using Deep Magic for it when possible. Since this editor is runnign inside of Synchronet, thanks to reply linking, there's a VERY good chance that the editor can actually read from the original message itself and requote to fit... ie: it could restore the missing data from this:

    The fix is simply to use the correct flags

    To this:

    The fix is simply to use the correcty flags in the regex... m or s as a

    And possibly even rewrapping it to fit in the smaller width.

    The fix is simply to use the correcty flags in the regex... m or s as appropriate.

    The quoting indicators wouldn't count as part of the string.

    ---
    This sig is not directed at Jazzman.

    ---
    þ Synchronet þ My Brand-New BBS (All the cool SysOps run STOCK!)
  • From Angus McLeod@VERT/ANJO to Deuce on Friday, March 03, 2006 01:51:00
    Re: RegExp Behaviour in FSE
    By: Deuce to Angus McLeod on Thu Mar 02 2006 12:45:00

    To completely throw a spanner in the works: Quoting. How will you treat something like

    > yada yada yada yada yada yada yada yada yada yada yada yada Dumb
    > User yada yada yada yada yada yada yada

    when searching for /Dumb User/ or even /Dubm\s/User/m ? ;-)

    Well, I can't say it seems completely intuitive, since your example would actually not have done what you said it would have. :-)

    Huh? I *know* it won't match because of the > and the extra whitespace characters. But should the quotation symbol be treated as whitespace
    (thus matching with \s) when applying RE's to quoted text? ;-) Just
    messing with ya....

    As for adding a soft CR, the spaces are *still* at the end of the line... so soft CRs were expanded to hard, you'd need /Dumb\s+User/ reather than /Dumb\sUser/ or /Dumb User/

    Oh, OK, I didn't realise that you left the space at the end of the line.
    But what I'm saying is that I should be able to match /Dumb$/m when Dumb
    User is auto-wrapped at line end (like that). If I'm looking at the text
    and "Dumb" is at the end of the line, I *ought* to be able to match
    against it with /Dumb$/m withOUT having to guess whether there was an invisible space after the word, or whether that line break occured
    implicitly due to line-wrap or explicitly due to the press of the ENTER
    key.

    As for quoting, I plan on using Deep Magic for it when possible.

    And possibly even rewrapping it to fit in the smaller width.

    The fix is simply to use the correcty flags in the regex... m or s a appropriate.

    Hmmm.

    ---
    Playing: "2000 Miles" by "Pretenders"
    from the "Learning to crawl" album
    þ Synchronet þ Making sure Jason works OK at The ANJO BBS
  • From Deuce@VERT/SYNCNIX to Angus McLeod on Friday, March 03, 2006 12:04:00
    Re: RegExp Behaviour in FSE
    By: Angus McLeod to Deuce on Fri Mar 03 2006 01:51 am

    Re: RegExp Behaviour in FSE
    By: Deuce to Angus McLeod on Thu Mar 02 2006 12:45:00
    Huh? I *know* it won't match because of th > characters. But should the
    qu otation symbol be treated as whitespace (thus matching with \s) when applying R E's to quote > messing with ya....
    I meant the original /^Para/ example.
    Oh, OK, I didn't realise that you left the sp
    But what I'm saying is that I should be able to match
    User is auto-wrapped at line end (like that). If
    and "Dumb" is at the end of the line, I *ough
    against it with /Dumb$/m withOU T having to guess whether
    invisible space after the word, or whether that li ne break occured implicitly due to line-wrap or explicitly due to the press of the ENT > key.
    Yeah, the spaces are at the end of the line for my convienience. In "theory" they actually exist in limbo between the two lines for the purposes if the m flag.

    ---
    This sig is not directed at Jazzman.

    ---
    þ Synchronet þ My Brand-New BBS (All the cool SysOps run STOCK!)
  • From Deuce@VERT/SYNCNIX to Angus McLeod on Friday, March 03, 2006 12:07:00
    Re: RegExp Behaviour in FSE
    By: Angus McLeod to Deuce on Fri Mar 03 2006 01:51 am

    Re: RegExp Behaviour in FSE
    By: Deuce to Angus McLeod on Thu Mar 02 2006 12:45:00

    To completely throw a spanner in the works: Quoting. How will you tr
    eat > > > something like

    > yada yada yada yada yada yada yada yada yada yada yada yada Dum
    b
    > User yada yada yada yada yada yada yada

    when searching for /Dumb User/ or even /Dubm\s/User/m ? ;-)

    Well, I can't say it seems completely intuitive, since your example would actually not have done what you said it would have. :-)

    Huh? I *know* it won't match because of th > characters. But should the qu Oh, OK, I didn't realise that you left the sp > But what I'm saying is that
    Hrm... did that fix teh high intensity issue?

    ---
    This sig is not directed at Jazzman.

    ---
    þ Synchronet þ My Brand-New BBS (All the cool SysOps run STOCK!)
  • From Angus McLeod@VERT/ANJO to Deuce on Saturday, March 04, 2006 01:25:00
    Re: RegExp Behaviour in FSE
    By: Deuce to Angus McLeod on Fri Mar 03 2006 12:07:00

    Huh? I *know* it won't match because of th > characters. But should the Oh, OK, I didn't realise that you left the sp > But what I'm saying is th
    Hrm... did that fix teh high intensity issue?

    I don't think so...

    ---
    Playing: "Traveller in time" by "Uriah Heep"
    from the "Demons & wizards" album
    þ Synchronet þ Making sure Jason works OK at The ANJO BBS