summaryrefslogtreecommitdiffstats
path: root/rubbos/app/httpd-2.0.64/srclib/pcre/doc/pcretest.txt
diff options
context:
space:
mode:
Diffstat (limited to 'rubbos/app/httpd-2.0.64/srclib/pcre/doc/pcretest.txt')
-rw-r--r--rubbos/app/httpd-2.0.64/srclib/pcre/doc/pcretest.txt319
1 files changed, 319 insertions, 0 deletions
diff --git a/rubbos/app/httpd-2.0.64/srclib/pcre/doc/pcretest.txt b/rubbos/app/httpd-2.0.64/srclib/pcre/doc/pcretest.txt
new file mode 100644
index 00000000..0e13b6c6
--- /dev/null
+++ b/rubbos/app/httpd-2.0.64/srclib/pcre/doc/pcretest.txt
@@ -0,0 +1,319 @@
+NAME
+ pcretest - a program for testing Perl-compatible regular
+ expressions.
+
+
+
+SYNOPSIS
+ pcretest [-d] [-i] [-m] [-o osize] [-p] [-t] [source] [des-
+ tination]
+
+ pcretest was written as a test program for the PCRE regular
+ expression library itself, but it can also be used for
+ experimenting with regular expressions. This man page
+ describes the features of the test program; for details of
+ the regular expressions themselves, see the pcre man page.
+
+
+
+OPTIONS
+ -d Behave as if each regex had the /D modifier (see
+ below); the internal form is output after compila-
+ tion.
+
+ -i Behave as if each regex had the /I modifier;
+ information about the compiled pattern is given
+ after compilation.
+
+ -m Output the size of each compiled pattern after it
+ has been compiled. This is equivalent to adding /M
+ to each regular expression. For compatibility with
+ earlier versions of pcretest, -s is a synonym for
+ -m.
+
+ -o osize Set the number of elements in the output vector
+ that is used when calling PCRE to be osize. The
+ default value is 45, which is enough for 14 cap-
+ turing subexpressions. The vector size can be
+ changed for individual matching calls by including
+ \O in the data line (see below).
+
+ -p Behave as if each regex has /P modifier; the POSIX
+ wrapper API is used to call PCRE. None of the
+ other options has any effect when -p is set.
+
+ -t Run each compile, study, and match 20000 times
+ with a timer, and output resulting time per com-
+ pile or match (in milliseconds). Do not set -t
+ with -m, because you will then get the size output
+ 20000 times and the timing will be distorted.
+
+
+
+DESCRIPTION
+ If pcretest is given two filename arguments, it reads from
+ the first and writes to the second. If it is given only one
+
+
+
+
+SunOS 5.8 Last change: 1
+
+
+
+ filename argument, it reads from that file and writes to
+ stdout. Otherwise, it reads from stdin and writes to stdout,
+ and prompts for each line of input, using "re>" to prompt
+ for regular expressions, and "data>" to prompt for data
+ lines.
+
+ The program handles any number of sets of input on a single
+ input file. Each set starts with a regular expression, and
+ continues with any number of data lines to be matched
+ against the pattern. An empty line signals the end of the
+ data lines, at which point a new regular expression is read.
+ The regular expressions are given enclosed in any non-
+ alphameric delimiters other than backslash, for example
+
+ /(a|bc)x+yz/
+
+ White space before the initial delimiter is ignored. A regu-
+ lar expression may be continued over several input lines, in
+ which case the newline characters are included within it. It
+ is possible to include the delimiter within the pattern by
+ escaping it, for example
+
+ /abc\/def/
+
+ If you do so, the escape and the delimiter form part of the
+ pattern, but since delimiters are always non-alphameric,
+ this does not affect its interpretation. If the terminating
+ delimiter is immediately followed by a backslash, for exam-
+ ple,
+
+ /abc/\
+
+ then a backslash is added to the end of the pattern. This is
+ done to provide a way of testing the error condition that
+ arises if a pattern finishes with a backslash, because
+
+ /abc\/
+
+ is interpreted as the first line of a pattern that starts
+ with "abc/", causing pcretest to read the next line as a
+ continuation of the regular expression.
+
+
+
+PATTERN MODIFIERS
+ The pattern may be followed by i, m, s, or x to set the
+ PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED
+ options, respectively. For example:
+
+ /caseless/i
+
+ These modifier letters have the same effect as they do in
+ Perl. There are others which set PCRE options that do not
+ correspond to anything in Perl: /A, /E, and /X set
+ PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respec-
+ tively.
+
+ Searching for all possible matches within each subject
+ string can be requested by the /g or /G modifier. After
+ finding a match, PCRE is called again to search the
+ remainder of the subject string. The difference between /g
+ and /G is that the former uses the startoffset argument to
+ pcre_exec() to start searching at a new point within the
+ entire string (which is in effect what Perl does), whereas
+ the latter passes over a shortened substring. This makes a
+ difference to the matching process if the pattern begins
+ with a lookbehind assertion (including \b or \B).
+
+ If any call to pcre_exec() in a /g or /G sequence matches an
+ empty string, the next call is done with the PCRE_NOTEMPTY
+ and PCRE_ANCHORED flags set in order to search for another,
+ non-empty, match at the same point. If this second match
+ fails, the start offset is advanced by one, and the normal
+ match is retried. This imitates the way Perl handles such
+ cases when using the /g modifier or the split() function.
+
+ There are a number of other modifiers for controlling the
+ way pcretest operates.
+
+ The /+ modifier requests that as well as outputting the sub-
+ string that matched the entire pattern, pcretest should in
+ addition output the remainder of the subject string. This is
+ useful for tests where the subject contains multiple copies
+ of the same substring.
+
+ The /L modifier must be followed directly by the name of a
+ locale, for example,
+
+ /pattern/Lfr
+
+ For this reason, it must be the last modifier letter. The
+ given locale is set, pcre_maketables() is called to build a
+ set of character tables for the locale, and this is then
+ passed to pcre_compile() when compiling the regular expres-
+ sion. Without an /L modifier, NULL is passed as the tables
+ pointer; that is, /L applies only to the expression on which
+ it appears.
+
+ The /I modifier requests that pcretest output information
+ about the compiled expression (whether it is anchored, has a
+ fixed first character, and so on). It does this by calling
+ pcre_fullinfo() after compiling an expression, and output-
+ ting the information it gets back. If the pattern is stu-
+ died, the results of that are also output.
+ The /D modifier is a PCRE debugging feature, which also
+ assumes /I. It causes the internal form of compiled regular
+ expressions to be output after compilation.
+
+ The /S modifier causes pcre_study() to be called after the
+ expression has been compiled, and the results used when the
+ expression is matched.
+
+ The /M modifier causes the size of memory block used to hold
+ the compiled pattern to be output.
+
+ The /P modifier causes pcretest to call PCRE via the POSIX
+ wrapper API rather than its native API. When this is done,
+ all other modifiers except /i, /m, and /+ are ignored.
+ REG_ICASE is set if /i is present, and REG_NEWLINE is set if
+ /m is present. The wrapper functions force
+ PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless
+ REG_NEWLINE is set.
+
+ The /8 modifier causes pcretest to call PCRE with the
+ PCRE_UTF8 option set. This turns on the (currently incom-
+ plete) support for UTF-8 character handling in PCRE, pro-
+ vided that it was compiled with this support enabled. This
+ modifier also causes any non-printing characters in output
+ strings to be printed using the \x{hh...} notation if they
+ are valid UTF-8 sequences.
+
+
+
+DATA LINES
+ Before each data line is passed to pcre_exec(), leading and
+ trailing whitespace is removed, and it is then scanned for \
+ escapes. The following are recognized:
+
+ \a alarm (= BEL)
+ \b backspace
+ \e escape
+ \f formfeed
+ \n newline
+ \r carriage return
+ \t tab
+ \v vertical tab
+ \nnn octal character (up to 3 octal digits)
+ \xhh hexadecimal character (up to 2 hex digits)
+ \x{hh...} hexadecimal UTF-8 character
+
+ \A pass the PCRE_ANCHORED option to pcre_exec()
+ \B pass the PCRE_NOTBOL option to pcre_exec()
+ \Cdd call pcre_copy_substring() for substring dd
+ after a successful match (any decimal number
+ less than 32)
+ \Gdd call pcre_get_substring() for substring dd
+
+ after a successful match (any decimal number
+ less than 32)
+ \L call pcre_get_substringlist() after a
+ successful match
+ \N pass the PCRE_NOTEMPTY option to pcre_exec()
+ \Odd set the size of the output vector passed to
+ pcre_exec() to dd (any number of decimal
+ digits)
+ \Z pass the PCRE_NOTEOL option to pcre_exec()
+
+ When \O is used, it may be higher or lower than the size set
+ by the -O option (or defaulted to 45); \O applies only to
+ the call of pcre_exec() for the line in which it appears.
+
+ A backslash followed by anything else just escapes the any-
+ thing else. If the very last character is a backslash, it is
+ ignored. This gives a way of passing an empty line as data,
+ since a real empty line terminates the data input.
+
+ If /P was present on the regex, causing the POSIX wrapper
+ API to be used, only B, and Z have any effect, causing
+ REG_NOTBOL and REG_NOTEOL to be passed to regexec() respec-
+ tively.
+
+ The use of \x{hh...} to represent UTF-8 characters is not
+ dependent on the use of the /8 modifier on the pattern. It
+ is recognized always. There may be any number of hexadecimal
+ digits inside the braces. The result is from one to six
+ bytes, encoded according to the UTF-8 rules.
+
+
+
+OUTPUT FROM PCRETEST
+ When a match succeeds, pcretest outputs the list of captured
+ substrings that pcre_exec() returns, starting with number 0
+ for the string that matched the whole pattern. Here is an
+ example of an interactive pcretest run.
+
+ $ pcretest
+ PCRE version 2.06 08-Jun-1999
+
+ re> /^abc(\d+)/
+ data> abc123
+ 0: abc123
+ 1: 123
+ data> xyz
+ No match
+
+ If the strings contain any non-printing characters, they are
+ output as \0x escapes, or as \x{...} escapes if the /8
+ modifier was present on the pattern. If the pattern has the
+ /+ modifier, then the output for substring 0 is followed by
+ the the rest of the subject string, identified by "0+" like
+ this:
+
+ re> /cat/+
+ data> cataract
+ 0: cat
+ 0+ aract
+
+ If the pattern has the /g or /G modifier, the results of
+ successive matching attempts are output in sequence, like
+ this:
+
+ re> /\Bi(\w\w)/g
+ data> Mississippi
+ 0: iss
+ 1: ss
+ 0: iss
+ 1: ss
+ 0: ipp
+ 1: pp
+
+ "No match" is output only if the first match attempt fails.
+
+ If any of the sequences \C, \G, or \L are present in a data
+ line that is successfully matched, the substrings extracted
+ by the convenience functions are output with C, G, or L
+ after the string number instead of a colon. This is in addi-
+ tion to the normal full list. The string length (that is,
+ the return from the extraction function) is given in
+ parentheses after each string for \C and \G.
+
+ Note that while patterns can be continued over several lines
+ (a plain ">" prompt is used for continuations), data lines
+ may not. However newlines can be included in data by means
+ of the \n escape.
+
+
+
+AUTHOR
+ Philip Hazel <ph10@cam.ac.uk>
+ University Computing Service,
+ New Museums Site,
+ Cambridge CB2 3QG, England.
+ Phone: +44 1223 334714
+
+ Last updated: 15 August 2001
+ Copyright (c) 1997-2001 University of Cambridge.