| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
|---|
| 2 | |
|---|
| 3 | <html> |
|---|
| 4 | |
|---|
| 5 | <head> |
|---|
| 6 | <title>J User's Guide - Regular Expressions</title> |
|---|
| 7 | <LINK REL="stylesheet" HREF="j.css" TYPE="text/css"> |
|---|
| 8 | </head> |
|---|
| 9 | |
|---|
| 10 | <body> |
|---|
| 11 | |
|---|
| 12 | <a href="contents.html">Top</a> |
|---|
| 13 | |
|---|
| 14 | <hr> |
|---|
| 15 | |
|---|
| 16 | <h1>Regular Expressions</h1> |
|---|
| 17 | |
|---|
| 18 | <hr> |
|---|
| 19 | |
|---|
| 20 | <h2>Background</h2> |
|---|
| 21 | |
|---|
| 22 | A regular expression is a character string where some characters are given |
|---|
| 23 | special meaning, so that the pattern as a whole denotes a possibly infinite |
|---|
| 24 | class of alternative strings to match. |
|---|
| 25 | <p> |
|---|
| 26 | J uses the <a href="http://www.cacas.org/~wes/java">gnu.regexp</a> package. |
|---|
| 27 | |
|---|
| 28 | |
|---|
| 29 | <h2>Supported Syntax</h2> |
|---|
| 30 | |
|---|
| 31 | Within a regular expression, the following characters have special meaning: |
|---|
| 32 | |
|---|
| 33 | <ul> |
|---|
| 34 | <li> |
|---|
| 35 | Positional Operators |
|---|
| 36 | <blockquote> |
|---|
| 37 | <code>^</code> matches the beginning of a line<br> |
|---|
| 38 | <code>$</code> matches the end of a line<br> |
|---|
| 39 | </blockquote> |
|---|
| 40 | |
|---|
| 41 | <li> |
|---|
| 42 | One-Character Operators |
|---|
| 43 | <blockquote> |
|---|
| 44 | <code>.</code> matches any single character<br> |
|---|
| 45 | <code>\d</code> matches any decimal digit<br> |
|---|
| 46 | <code>\D</code> matches any non-digit<br> |
|---|
| 47 | <code>\n</code> matches a newline character<br> |
|---|
| 48 | <code>\r</code> matches a return character<br> |
|---|
| 49 | <code>\s</code> matches any whitespace character<br> |
|---|
| 50 | <code>\S</code> matches any non-whitespace character<br> |
|---|
| 51 | <code>\t</code> matches a tab character<br> |
|---|
| 52 | <code>\w</code> matches any word (alphanumeric) character<br> |
|---|
| 53 | <code>\W</code> matches any non-word (alphanumeric) character<br> |
|---|
| 54 | <p> |
|---|
| 55 | Otherwise, <code>\c</code> matches the character <i>c</i>. |
|---|
| 56 | </blockquote> |
|---|
| 57 | |
|---|
| 58 | <li> |
|---|
| 59 | Character Classes |
|---|
| 60 | <blockquote> |
|---|
| 61 | <code>[abc]</code> matches any character in the set <i>a</i>, <i>b</i> or <i>c</i><br> |
|---|
| 62 | <code>[^abc]</code> matches any character not in the set <i>a</i>, <i>b</i> or <i>c</i><br> |
|---|
| 63 | <code>[a-z]</code> matches any character in the range <i>a</i> to <i>z</i> (inclusive)<br> |
|---|
| 64 | <p> |
|---|
| 65 | A leading or trailing dash is interpreted literally.<br> |
|---|
| 66 | </blockquote> |
|---|
| 67 | |
|---|
| 68 | <li> |
|---|
| 69 | Subexpressions and Backreferences |
|---|
| 70 | <blockquote> |
|---|
| 71 | <code>(abc)</code> matches whatever the expression <i>abc</i> would match, and saves it as a subexpression<br> |
|---|
| 72 | <code>\<i>n</i></code> where 1 <= <i>n</i> <= 9, matches the same thing the <i>n</i>th subexpression matched<br> |
|---|
| 73 | <p> |
|---|
| 74 | Parentheses can also be used for grouping. |
|---|
| 75 | <p> |
|---|
| 76 | Parentheses used for grouping or to record matched subexpressions should not be escaped. |
|---|
| 77 | <p> |
|---|
| 78 | Backreferences may also be used in replacement strings; see <a href="commands.html#replace">replace</a>. |
|---|
| 79 | </blockquote> |
|---|
| 80 | |
|---|
| 81 | <li> |
|---|
| 82 | Branching (Alternation) Operator |
|---|
| 83 | <blockquote> |
|---|
| 84 | <code>a|b</code> matches whatever the expression <i>a</i> would match, or whatever the expression <i>b</i> would match.<br> |
|---|
| 85 | </blockquote> |
|---|
| 86 | |
|---|
| 87 | <li> |
|---|
| 88 | Repeating Operators |
|---|
| 89 | <blockquote> |
|---|
| 90 | <code>?</code> matches zero or one occurrence of the preceding expression or the null string<br> |
|---|
| 91 | <code>*</code> matches zero or more occurrences of the preceding expression<br> |
|---|
| 92 | <code>+</code> matches one or more occurrences of the preceding expression<br> |
|---|
| 93 | <code>{m}</code> matches exactly <i>m</i> occurrences of the preceding expression<br> |
|---|
| 94 | <code>{m,n}</code> matches between <i>m</i> and <i>n</i> occurrences of the preceding expression (inclusive)<br> |
|---|
| 95 | <code>{m,}</code> matches <i>m</i> or more occurrences of the preceding expression<br> |
|---|
| 96 | <p> |
|---|
| 97 | The repeating operators operate on the preceding atomic expression.<br> |
|---|
| 98 | </blockquote> |
|---|
| 99 | |
|---|
| 100 | <li> |
|---|
| 101 | Stingy (Minimal) Matching |
|---|
| 102 | <blockquote> |
|---|
| 103 | If a repeating operator is immediately followed by a ?, the repeating operator |
|---|
| 104 | will stop at the smallest number of repetitions that can complete the rest of |
|---|
| 105 | the match. |
|---|
| 106 | </blockquote> |
|---|
| 107 | |
|---|
| 108 | </ul> |
|---|
| 109 | |
|---|
| 110 | </body> |
|---|
| 111 | |
|---|
| 112 | </html> |
|---|