OSSP CVS Repository

ossp - Check-in [682]
Not logged in
[Honeypot]  [Browse]  [Home]  [Login]  [Reports
[Search]  [Ticket]  [Timeline
  [Patchset]  [Tagging/Branching

Check-in Number: 682
Date: 2001-Aug-16 15:17:00 (local)
2001-Aug-16 13:17:00 (UTC)
User:rse
Branch:
Comment: Fix return code documentation of str_parse(): it -1 (error), 0 (no matching) or +1 (matching) and not just TRUE or FALSE. Additionally fixed the str_parse() examples in the documentation.
Tickets:
Inspections:
Files:
ossp-pkg/str/ChangeLog      1.32 -> 1.33     5 inserted, 0 deleted
ossp-pkg/str/str.3      1.33 -> 1.34     16 inserted, 13 deleted
ossp-pkg/str/str.pod      added-> 1.27

ossp-pkg/str/ChangeLog 1.32 -> 1.33

--- ChangeLog    2001/08/16 12:21:21     1.32
+++ ChangeLog    2001/08/16 13:17:00     1.33
@@ -10,6 +10,11 @@
  ChangeLog
 
  Changes between 0.9.4 and 0.9.5 (14-Jul-2000 to 16-Aug-2001):
+   
+   *) Fix return code documentation of str_parse(): it -1 (error), 0 
+      (no matching) or +1 (matching) and not just TRUE or FALSE.
+      Additionally fixed the str_parse() examples in the documentation.
+      [Ralf S. Engelschall]
 
    *) Let str_base64(STR_BASE64_DECODE, ...) correctly honor 
       the specified maximum size of the input string.


ossp-pkg/str/str.3 1.33 -> 1.34

--- str.3        2001/04/27 12:22:21     1.33
+++ str.3        2001/08/16 13:17:00     1.34
@@ -1,5 +1,5 @@
-.\" Automatically generated by Pod::Man version 1.02
-.\" Sun Dec 31 12:23:40 2000
+.\" Automatically generated by Pod::Man version 1.15
+.\" Thu Aug 16 15:15:59 2001
 .\"
 .\" Standard preamble:
 .\" ======================================================================
@@ -46,8 +46,8 @@
 .    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
 .    ds L" ""
 .    ds R" ""
-.    ds C` `
-.    ds C' '
+.    ds C` ""
+.    ds C' ""
 'br\}
 .el\{\
 .    ds -- \|\(em\|
@@ -63,7 +63,7 @@
 .if \nF \{\
 .    de IX
 .    tm Index:\\$1\t\\n%\t"\\$2"
-.    .
+..
 .    nr % 0
 .    rr F
 .\}
@@ -283,8 +283,10 @@
 .Ip "int \fBstr_parse\fR(const char *\fIs\fR, const char *\fIpop\fR, ...);" 4
 .IX Item "int str_parse(const char *s, const char *pop, ...);"
 This parses the string \fIs\fR according to the parsing operation specified
-by \fIpop\fR. If the parsing operation succeeds, \f(CW\*(C`TRUE\*(C'\fR is returned. Else
-\&\f(CW\*(C`FALSE\*(C'\fR is returned.
+by \fIpop\fR. If the parsing operation succeeds, \f(CW\*(C`1\*(C'\fR is returned. If the
+parsing operation failed because the pattern \fIpop\fR did not match, \f(CW\*(C`0\*(C'\fR
+is returned. If the parsing operation failed because the underlying
+regular expression library failed, \f(CW\*(C`\-1\*(C'\fR is returned.
 .Sp
 The \fIpop\fR string usually has one of the following two syntax variants:
 `\fBm\fR \fIdelim\fR \fIregex\fR \fIdelim\fR \fIflags\fR*' (for matching operations)
@@ -699,7 +701,7 @@
 .IX Item "Match a String"
 .Vb 5
 \& char *var = "foo:bar";
-\& if (str_parse(var, "^.+?:.+$/)) {
+\& if (str_parse(var, "^.+?:.+$/) > 0) {
 \&     /* var matched */
 \&     ...
 \& }
@@ -709,7 +711,7 @@
 .Vb 10
 \& char *var = "foo:bar";
 \& char *cp, *v1, *v2;
-\& if (str_parse(var, "m/^(.+?):(.+)$/b", &cp, &v1, &v2)) {
+\& if (str_parse(var, "m/^(.+?):(.+)$/b", &cp, &v1, &v2) > 0) {
 \&     ...
 \&     /* now we have: 
 \&        cp = "foo\e0bar\e0" and v1 and v2 pointing
@@ -758,11 +760,12 @@
 recycled: for the \fIstr_token\fR\|(3) implementation an anchient \fIstrtok\fR\|(3)
 flavor from William Deich 1991 was cleaned up and adjusted. As the
 background parsing engine for \fIstr_parse\fR\|(3) a heavily stripped down
-version of Philip Hazel's \s-1PCRE\s0 2.08 library was used. The \fIstr_format\fR\|(3)
+version of Philip Hazel's Perl Compatible Regular Expression (\s-1PCRE\s0)
+library (initially version 2.08 and now 3.5) was used. The \fIstr_format\fR\|(3)
 implementation was based on Panos Tsirigotis' \fIsprintf\fR\|(3) code as
-adjusted by the Apache Software Foundation 1998. The formatting engine
-was stripped down and enhanced to support internal extensions which were
-required by \fIstr_format\fR\|(3) and \fIstr_parse\fR\|(3).
+adjusted by the Apache Software Foundation (\s-1ASF\s0) 1998. The formatting
+engine was stripped down and enhanced to support internal extensions
+which were required by \fIstr_format\fR\|(3) and \fIstr_parse\fR\|(3).
 .SH "AUTHOR"
 .IX Header "AUTHOR"
 .Vb 3


ossp-pkg/str/str.pod -> 1.27

*** /dev/null    Sat Nov 23 01:10:18 2024
--- -    Sat Nov 23 01:10:19 2024
***************
*** 0 ****
--- 1,774 ----
+ ##
+ ##  Str - String Library
+ ##  Copyright (c) 1999-2000 Ralf S. Engelschall <rse@engelschall.com>
+ ##
+ ##  This file is part of Str, a string handling and manipulation 
+ ##  library which can be found at http://www.engelschall.com/sw/str/.
+ ##
+ ##  Permission to use, copy, modify, and distribute this software for
+ ##  any purpose with or without fee is hereby granted, provided that
+ ##  the above copyright notice and this permission notice appear in all
+ ##  copies.
+ ##
+ ##  THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
+ ##  WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ ##  MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ ##  IN NO EVENT SHALL THE AUTHORS AND COPYRIGHT HOLDERS AND THEIR
+ ##  CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ ##  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ ##  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ ##  USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ ##  ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ ##  OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ ##  OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ ##  SUCH DAMAGE.
+ ##
+ ##  str.pod: Unix manual page
+ ##
+ 
+ #   Parts of this manual page (the str_format description) is:
+ #
+ #   Copyright (c) 1990, 1991, 1993
+ #   The Regents of the University of California.  All rights reserved.
+ #
+ #   This code is derived from software contributed to Berkeley by
+ #   Chris Torek and the American National Standards Committee X3,
+ #   on Information Processing Systems.
+ #
+ #   Redistribution and use in source and binary forms, with or without
+ #   modification, are permitted provided that the following conditions
+ #   are met:
+ #   1. Redistributions of source code must retain the above copyright
+ #      notice, this list of conditions and the following disclaimer.
+ #   2. Redistributions in binary form must reproduce the above copyright
+ #      notice, this list of conditions and the following disclaimer in the
+ #      documentation and/or other materials provided with the distribution.
+ #   3. All advertising materials mentioning features or use of this software
+ #      must display the following acknowledgement:
+ #      This product includes software developed by the University of
+ #      California, Berkeley and its contributors.
+ #   4. Neither the name of the University nor the names of its contributors
+ #      may be used to endorse or promote products derived from this software
+ #      without specific prior written permission.
+ #
+ #   THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+ #   ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ #   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ #   ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ #   FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ #   DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ #   OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ #   HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ #   LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ #   OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ #   SUCH DAMAGE.
+ 
+ =pod
+ 
+ =head1 NAME
+ 
+ B<Str> - String Library
+ 
+ =head1 VERSION
+ 
+ Str STR_VERSION_STR
+ 
+ =head1 SYNOPSIS
+ 
+ B<str_len>,
+ B<str_copy>,
+ B<str_dup>,
+ B<str_concat>,
+ B<str_splice>,
+ B<str_compare>,
+ B<str_span>,
+ B<str_locate>,
+ B<str_token>,
+ B<str_parse>,
+ B<str_format>,
+ B<str_hash>,
+ B<str_base64>.
+ 
+ =head1 DESCRIPTION
+ 
+ The B<Str> library is a generic string library written in ANSI C which
+ provides functions for handling, matching, parsing, searching and
+ formatting of C strings. So it can be considered as a superset of POSIX
+ string(3), but its main intention is to provide a more convinient and
+ compact API plus a more generalized functionality.
+ 
+ =head1 FUNCTIONS
+ 
+ The following functions are provided by the B<Str> API:
+ 
+ =over 4
+ 
+ =item str_size_t B<str_len>(const char *I<s>);
+ 
+ This function determines the length of string I<s>, i.e., the number
+ of characters starting at I<s> that precede the terminating C<NUL>
+ character. It returns C<NULL> if I<s> is C<NULL>.
+ 
+ =item char *B<str_copy>(char *I<s>, const char *I<t>, size_t I<n>);
+ 
+ This copies the characters in string I<t> into the string I<s>, but never more
+ than I<n> characters (if I<n> is greater than C<0>). The two involved strings
+ can overlap and the characters in I<s> are always C<NUL>-terminated. The
+ string I<s> has to be large enough to hold all characters to be copied.
+ function returns C<NULL> if I<s> or I<t> are C<NULL>. Else it returns the
+ pointer to the written C<NUL>-terminating character in I<s>.
+ 
+ =item char *B<str_dup>(const char *I<s>, str_size_t I<n>);
+ 
+ This returns a copy of the characters in string I<s>, but never more than I<n>
+ characters if I<n> is greater than C<0>. It returns C<NULL> if I<s> is
+ C<NULL>. The returned string has to be deallocated later with free(3).
+ 
+ =item char *B<str_concat>(char *I<s>, ...);
+ 
+ This functions concatenates the characters of all string arguments into a new
+ allocated string and returns this new string.  If I<s> is C<NULL> the function
+ returns C<NULL>. Else it returns the pointer to the written final
+ C<NUL>-terminating character in I<s>. The returned string later has to be
+ deallicated with free(3).
+ 
+ =item char *B<str_splice>(char *I<s>, str_size_t I<off>, str_size_t I<n>, char *I<t>, str_size_t I<m>);
+ 
+ This splices the string I<t> into string I<s>, i.e., the I<n> characters
+ at offset I<off> in I<s> are removed and at their location the string
+ I<t> is inserted (or just the first I<m> characters of I<t> if I<m> is
+ greater than C<0>). It returns C<NULL> if I<s> or I<t> are C<NULL>.
+ Else the string I<s> is returned. The function supports also the
+ situation where I<t> is a sub-string of I<s> as long as the area
+ I<s+off>...I<s+off+n> and I<t>...I<t+m> do not overlap. The caller
+ always has to make sure that enough room exists in I<s>.
+ 
+ =item int B<str_compare>(const char *I<s>, const char *I<t>, str_size_t I<n>, int I<mode>);
+ 
+ This performs a lexicographical comparison of the two strings I<s>
+ and I<t> (but never compares more than I<n> characters of them)
+ and returns one of three return values: a value lower than C<0> if
+ I<s> is lexicographically lower than I<t>, a vlue of exactly C<0>
+ if I<s> and I<t> are equal and a value greater than C<0> if I<s> is
+ lexicographically higher than I<t>. Per default (I<mode> is C<0>) the
+ comparison is case-sensitive, but if C<STR_NOCASE> is used for I<mode>
+ the comparison is done in a case-insensitive way.
+ 
+ =item char *B<str_span>(const char *I<s>, size_t I<n>, const char *I<charset>, int I<mode>);
+ 
+ This functions spans a string I<s> according to the characters specified in
+ I<charset>. If I<mode> is C<0>, this means that I<s> is spanned from left to
+ right starting at I<s> (and ending either when reaching the terminating C<NUL>
+ character or already after I<n> spanned characters) as long as the characters
+ of I<s> are contained in I<charset>.
+ 
+ Alternatively one can use a I<mode> of C<STR_COMPLEMENT> to indicate that I<s>
+ is spanned as long as the characters of I<s> are I<not> contained in
+ I<charset>, i.e., I<charset> then specifies the complement of the spanning
+ characters.
+ 
+ In both cases one can additionally "or" (with the C operator ``C<|>'')
+ C<STR_RIGHT> into I<mode> to indicate that the spanning is done right to
+ left starting at the terminating C<NUL> character of I<s> (and ending
+ either when reaching I<s> or already after I<n> spanned characters).
+ 
+ =item char *B<str_locate>(const char *I<s>, str_size_t I<n>, const char *I<t>);
+ 
+ This functions searches for the (smaller) string I<t> inside (larger) string
+ I<s>. If I<n> is not C<0>, the search is performed only inside the first I<n>
+ characters of I<s>. 
+ 
+ =item char *B<str_token>(char **I<s>, const char *I<delim>, const char *I<quote>, const char *I<comment>, int I<mode>);
+ 
+ This function considers the string I<s> to consist of a sequence of
+ zero or more text tokens separated by spans of one or more characters
+ from the separator string I<delim>. However, text between matched pairs
+ of quotemarks (characters in I<quote>) is treated as plain text, never
+ as delimiter (separator) text. Each call of this function returns a
+ pointer to the first character of the first token of I<s>. The token is
+ C<NUL>-terminated, i.e., the string I<s> is processed in a destructive
+ way. If there are quotation marks or escape sequences, the input
+ string is rewritten with quoted sections and escape sequences properly
+ interpreted.
+ 
+ This function keeps track of its parsing position in the string between
+ separate calls by simply adjusting the callers I<s> pointer, so that
+ subsequent calls with the same pointer variable I<s> will start
+ processing from the position immediately after the last returned token.
+ In this way subsequent calls will work through the string I<s> until no
+ tokens remain. When no token remains in I<s>, C<NULL> is returned. The
+ string of token separators (I<delim>) and the string of quote characters
+ (I<quote>) may be changed from call to call.
+ 
+ If a character in the string I<s> is not quoted or escaped, and is in the
+ I<comment> set, then it is overwritten with a C<NUL> character and the rest of
+ the string is ignored. The characters to be used as quote characters are
+ specified in the I<quote> set, and must be used in balanced pairs. If there
+ is more than one flavor of quote character, one kind of quote character may be
+ used to quote another kind. If an unbalanced quote is found, the function
+ silently act as if one had been placed at the end of the input string.  The
+ I<delim> and I<quote> strings must be disjoint, i.e., they have to share
+ no characters. 
+ 
+ The I<mode> argument can be used to modify the processing of the string
+ (default for I<mode> is C<0>): C<STR_STRIPQUOTES> forces I<quote>
+ characters to be stripped from quoted tokens; C<STR_BACKSLASHESC>
+ enables the interpretation (and expansion) of backslash escape sequences
+ (`B<\x>') through ANSI-C rules; C<STR_SKIPDELIMS> forces that after the
+ terminating C<NUL> is written and the token returned, further delimiters
+ are skipped (this allows one to make sure that the delimiters for
+ one word don't become part of the next word if one change delimiters
+ between calls); and C<STR_TRIGRAPHS> enables the recognition and
+ expansion of ANSI C Trigraph sequences (as a side effect this enables
+ C<STR_BACKSLASHESC>, too).
+ 
+ =item int B<str_parse>(const char *I<s>, const char *I<pop>, ...);
+ 
+ This parses the string I<s> according to the parsing operation specified
+ by I<pop>. If the parsing operation succeeds, C<1> is returned. If the
+ parsing operation failed because the pattern I<pop> did not match, C<0>
+ is returned. If the parsing operation failed because the underlying
+ regular expression library failed, C<-1> is returned.
+ 
+ The I<pop> string usually has one of the following two syntax variants:
+ `B<m> I<delim> I<regex> I<delim> I<flags>*' (for matching operations)
+ and `B<s> I<delim> I<regex> I<delim> I<subst> I<delim> I<flags>*' (for
+ substitution operations). For more details about the syntax variants
+ and semantic of the I<pop> argument see section B<GORY DETAILS, Parsing
+ Specification> below. The syntax of the I<regex> part in I<pop> is
+ mostly equivalent to Perl 5's regular expression syntax. For the
+ complete and gory details see perlre(1). A brief summary you can find
+ under section B<GORY DETAILS, Perl Regular Expressions> below.
+ 
+ =item int B<str_format>(char *I<s>, str_size_t I<n>, const char *I<fmt>, ...);
+ 
+ This formats a new string according to I<fmt> and optionally following
+ arguments and writes it into the string I<s>, but never more than I<n>
+ characters at all. It returns the number of written characters.  If I<s> is
+ C<NULL> it just calculates the number of characters which would be written.
+ 
+ The function generates the output string under the control of the I<fmt>
+ format string that specifies how subsequent arguments (or arguments accessed
+ via the variable-length argument facilities of stdarg(3)) are converted for
+ output.
+ 
+ The format string I<fmt> is composed of zero or more directives:
+ ordinary characters (not B<%>), which are copied unchanged to the output
+ stream; and conversion specifications, each of which results in fetching
+ zero or more subsequent arguments. Each conversion specification is
+ introduced by the character B<%>. The arguments must correspond properly
+ (after type promotion) with the conversion specifier. Which conversion
+ specifications are supported are described in detail under B<GORY
+ DETAILS, Format Specification> below.
+ 
+ =item unsigned long B<str_hash>(const char *I<s>, str_size_t I<n>, int I<mode>);
+ 
+ This function calculates a hash value of string I<s> (or of its first I<n>
+ characters if I<n> is equal to C<0>). The following hashing functions
+ are supported and can be selected with I<mode>: STR_HASH_DJBX33 (Daniel
+ J. Berstein, Times 33 Hash with Addition), STR_HASH_BJDDJ (Bob
+ Jenkins, Dr. Dobbs Journal), and STR_HASH_MACRC32 (Mark Adler, Cyclic
+ Redundancy Check with 32-Bit). This function is intended for fast use
+ in hashing algorithms and I<not> for use as cryptographically strong
+ message digests.
+ 
+ =item int B<str_base64>(char *I<s>, str_size_t I<n>, unsigned char *I<ucp>, str_size_t I<ucn>, int I<mode>);
+ 
+ This function Base64 encodes I<ucn> bytes starting at I<ucp> and writes
+ the resulting string into I<s> (but never more than I<n> characters are
+ written). The I<mode> for this operation has to be C<STR_BASE64_ENCODE>.
+ Additionally one can OR the value C<STR_BASE64_STRICT> to enable strict
+ encoding where after every 72th output character a newline character is
+ inserted. The function returns the number of output characters written.
+ If I<s> is C<NULL> the function just calculates the number of required
+ output characters.
+ 
+ Alternatively, if I<mode> is C<STR_BASE64_DECODE> the string I<s> (or
+ the first I<n> characters only if I<n> is not C<0>) is decoded and the
+ output bytes written at I<ucp>. Again, if I<ucp> is C<NULL> only the
+ number of required output bytes are calculated.
+ 
+ =back
+ 
+ =head1 GORY DETAILS
+ 
+ In this part of the documentation more complex topics are documented in
+ detail.
+ 
+ =head2 Perl Regular Expressions
+ 
+ The regular expressions used in B<Str> are more or less Perl compatible
+ (they are provided by a stripped down and built-in version of the
+ I<PCRE> library). So the syntax description in perlre(1) applies
+ and don't has to be repeated here again. For a deeper understanding
+ and details you should have a look at the book `I<Mastering Regular
+ Expressions>' (see also the perlbook(1) manpage) by I<Jeffrey Friedl>.
+ For convinience reasons we give you only a brief summary of Perl
+ compatible regular expressions:
+ 
+ The following metacharacters have their standard egrep(1) meanings:
+ 
+   \      Quote the next metacharacter
+   ^      Match the beginning of the line
+   .      Match any character (except newline)
+   $      Match the end of the line (or before newline at the end)
+   |      Alternation
+   ()     Grouping
+   []     Character class
+ 
+ The following standard quantifiers are recognized:
+ 
+   *      Match 0 or more times (greedy)
+   *?     Match 0 or more times (non greedy)
+   +      Match 1 or more times (greedy)
+   +?     Match 1 or more times (non greedy)
+   ?      Match 1 or 0 times (greedy)
+   ??     Match 1 or 0 times (non greedy)
+   {n}    Match exactly n times (greedy)
+   {n}?   Match exactly n times (non greedy)
+   {n,}   Match at least n times (greedy)
+   {n,}?  Match at least n times (non greedy)
+   {n,m}  Match at least n but not more than m times (greedy)
+   {n,m}? Match at least n but not more than m times (non greedy)
+ 
+ The following backslash sequences are recognized:
+ 
+   \t     Tab                   (HT, TAB)
+   \n     Newline               (LF, NL)
+   \r     Return                (CR)
+   \f     Form feed             (FF)
+   \a     Alarm (bell)          (BEL)
+   \e     Escape (think troff)  (ESC)
+   \033   Octal char
+   \x1B   Hex char
+   \c[    Control char
+   \l     Lowercase next char
+   \u     Uppercase next char
+   \L     Lowercase till \E
+   \U     Uppercase till \E
+   \E     End case modification
+   \Q     Quote (disable) pattern metacharacters till \E
+ 
+ The following non zero-width assertions are recognized:
+ 
+   \w     Match a "word" character (alphanumeric plus "_")
+   \W     Match a non-word character
+   \s     Match a whitespace character
+   \S     Match a non-whitespace character
+   \d     Match a digit character
+   \D     Match a non-digit character
+ 
+ The following zero-width assertions are recognized:
+ 
+   \b     Match a word boundary
+   \B     Match a non-(word boundary)
+   \A     Match only at beginning of string
+   \Z     Match only at end of string, or before newline at the end
+   \z     Match only at end of string
+   \G     Match only where previous m//g left off (works only with /g)
+ 
+ The following regular expression extensions are recognized:
+ 
+   (?#text)              An embedded comment
+   (?:pattern)           This is for clustering, not capturing (simple)
+   (?imsx-imsx:pattern)  This is for clustering, not capturing (full)
+   (?=pattern)           A zero-width positive lookahead assertion
+   (?!pattern)           A zero-width negative lookahead assertion
+   (?<=pattern)          A zero-width positive lookbehind assertion
+   (?<!pattern)          A zero-width negative lookbehind assertion
+   (?>pattern)           An "independent" subexpression
+   (?(cond)yes-re)       Conditional expression (simple)
+   (?(cond)yes-re|no-re) Conditional expression (full)
+   (?imsx-imsx)          One or more embedded pattern-match modifiers
+ 
+ =head2 Parsing Specification
+ 
+ The B<str_parse>(const char *I<s>, const char *I<pop>, ...) function
+ is a very flexible but complex one. The argument I<s> is the string on
+ which the parsing operation specified by argument I<pop> is applied.
+ The parsing semantics are highly influenced by Perl's `B<=~>' matching
+ operator, because one of the main goals of str_parse(3) is to allow one
+ to rewrite typical Perl matching constructs into C.
+ 
+ Now to the gory details. In general, the I<pop> argument of str_parse(3)
+ has one of the following two syntax variants:
+ 
+ =over 4
+ 
+ =item B<Matching:> `B<m> I<delim> I<regex> I<delim> I<flags>*': 
+ 
+ This matches I<s> against the Perl-style regular expression I<regex>
+ under the control of zero or more I<flags> which control the parsing
+ semantics. The stripped down I<pop> syntax `I<regex>' is equivalent to
+ `B<m/>I<regex>B</>'. 
+ 
+ For each grouping pair of parenthesis in I<regex>, the text in I<s>
+ which was grouped by the parenthesis is extracted into new strings.
+ These per default are allocated as seperate strings and returned to the
+ caller through following `B<char **>' arguments. The caller is required
+ to free(3) them later.
+ 
+ =item B<Substitution:> `B<s> I<delim> I<regex> I<delim> I<subst> I<delim> I<flags>*': 
+ 
+ This matches I<s> against the Perl-style regular expression I<regex>
+ under the control of zero or more I<flags> which control the parsing
+ semantics. As a result of the operation, a new string formed which
+ consists of I<s> but with the part which matched I<regex> replaced by
+ I<subst>. The result string is returned to the caller through a `B<char
+ **>' argument. The caller is required to free(3) this later.
+ 
+ For each grouping pair of parenthesis in I<regex>, the text in I<s>
+ which was grouped by the parenthesis is extracted into new strings
+ and can be referenced for expansion via `B<$n>' (n=1,..) in I<subst>.
+ Additionally any str_format(3) style `B<%>' constructs in I<subst> are
+ expanded through additional caller supplied arguments.
+ 
+ =back
+ 
+ The following I<flags> are supported:
+ 
+ =over 4
+ 
+ =item B<b>
+ 
+ If the I<bundle> flag `B<b>' is specified, the extracted strings are
+ bundled together into a single chunk of memory and its address is
+ returned to the caller with a additional `B<char **>' argument which has
+ to preceed the regular string arguments. The caller then has to free(3)
+ only this chunk of memory in order to free all extracted strings at
+ once.
+ 
+ =item B<i>
+ 
+ If the case-I<insensitive> flag `B<i>' is specified, I<regex>
+ is matched in case-insensitive way.
+ 
+ =item B<o>
+ 
+ If the I<once> flag `B<o>' is specified, this indicates to the B<Str>
+ library that the whole I<pop> string is constant and that its internal
+ pre-processing (it is compiled into a deterministic finite automaton
+ (DFA) internally) has to be done only once (the B<Str> library then
+ caches the DFA which corresponds to the I<pop> argument).
+ 
+ =item B<x>
+ 
+ If the I<extended> flag `B<x>' is specified, the I<regex>'s legibility
+ is extended by permitting embedded whitespace and comments to allow one
+ to write down complex regular expressions more cleary and even in a
+ documented way.
+ 
+ =item B<m>
+ 
+ If the I<multiple> lines flag `B<m>' is specified, the string I<s> is
+ treated as multiple lines. That is, this changes the regular expression
+ meta characters `B<^>' and `B<$>' from matching at only the very start
+ or end of the string I<s> to the start or end of any line anywhere
+ within the string I<s>.
+ 
+ =item B<s>
+ 
+ If the I<single> line flag `B<s>' is specified, the string I<s> is
+ treated as single line. That is, this changes the regular expression
+ meta character `B<.>' to match any character whatsoever, even a newline,
+ which it normally would not match.
+ 
+ =back
+ 
+ 
+ =head1 CONVERSION SPECIFICATION
+ 
+ In the format string of str_format(3) each conversion specification is
+ introduced by the character B<%>. After the B<%>, the following appear
+ in sequence:
+ 
+ =over 4
+ 
+ =item o
+ 
+ An optional field, consisting of a decimal digit string followed by a B<$>,
+ specifying the next argument to access.  If this field is not provided, the
+ argument following the last argument accessed will be used.  Arguments are
+ numbered starting at B<1>. If unaccessed arguments in the format string are
+ interspersed with ones that are accessed the results will be indeterminate.
+ 
+ =item o
+ 
+ Zero or more of the following flags:
+ 
+ A B<#> character specifying that the value should be converted to an
+ ``alternate form''.  For B<c>, B<d>, B<i>, B<n>, B<p>, B<s>, and B<u>,
+ conversions, this option has no effect.  For B<o> conversions, the precision
+ of the number is increased to force the first character of the output string
+ to a zero (except if a zero value is printed with an explicit precision of
+ zero).  For B<x> and B<X> conversions, a non-zero result has the string B<0x>
+ (or B<0X> for B<X> conversions) prepended to it.  For B<e>, B<E>, B<f>, B<g>,
+ and B<G>, conversions, the result will always contain a decimal point, even if
+ no digits follow it (normally, a decimal point appears in the results of those
+ conversions only if a digit follows).  For B<g> and B<G> conversions, trailing
+ zeros are not removed from the result as they would otherwise be.
+ 
+ A zero `B<0>' character specifying zero padding.  For all conversions except
+ B<n>, the converted value is padded on the left with zeros rather than blanks.
+ If a precision is given with a numeric conversion (B<d>, B<i>, B<o>, B<u>,
+ B<i>, B<x>, and B<X>), the `B<0>' flag is ignored.
+ 
+ A negative field width flag `B<->' indicates the converted value is to be left
+ adjusted on the field boundary.  Except for B<n> conversions, the converted
+ value is padded on the right with blanks, rather than on the left with blanks
+ or zeros.  A `B<->' overrides a `B<0>' if both are given.
+ 
+ A space, specifying that a blank should be left before a positive number
+ produced by a signed conversion (B<d>, B<e>, B<E>, B<f>, B<g>, B<G>, or B<i>).
+ 
+ A `B<+>' character specifying that a sign always be placed before a number
+ produced by a signed conversion.  A `B<+>' overrides a space if both are used.
+ 
+ =item o
+ 
+ An optional decimal digit string specifying a minimum field width.
+ If the converted value has fewer characters than the field width, it will
+ be padded with spaces on the left (or right, if the left-adjustment
+ flag has been given) to fill out
+ the field width.
+ 
+ =item o
+ 
+ An optional precision, in the form of a period `B<.>' followed by an
+ optional digit string. If the digit string is omitted, the precision is
+ taken as zero. This gives the minimum number of digits to appear for
+ B<d>, B<i>, B<o>, B<u>, B<x>, and B<X> conversions, the number of digits
+ to appear after the decimal-point for B<e>, B<E>, and B<f> conversions,
+ the maximum number of significant digits for B<g> and B<G> conversions,
+ or the maximum number of characters to be printed from a string for B<s>
+ conversions.
+ 
+ =item o
+ 
+ The optional character B<h>, specifying that a following B<d>, B<i>, B<o>,
+ B<u>, B<x>, or B<X> conversion corresponds to a `C<short int>' or `C<unsigned
+ short int>' argument, or that a following B<n> conversion corresponds to a
+ pointer to a `C<short int> argument.
+ 
+ =item o
+ 
+ The optional character B<l> (ell) specifying that a following B<d>, B<i>,
+ B<o>, B<u>, B<x>, or B<X> conversion applies to a pointer to a `C<long int>'
+ or `C<unsigned long int>' argument, or that a following B<n> conversion
+ corresponds to a pointer to a `C<long int> argument.
+ 
+ =item o
+ 
+ The optional character B<q>, specifying that a following B<d>, B<i>, B<o>,
+ B<u>, B<x>, or B<X> conversion corresponds to a `C<quad int>' or `C<unsigned
+ quad int>' argument, or that a following B<n> conversion corresponds to a
+ pointer to a `C<quad int>' argument.
+ 
+ =item o
+ 
+ The character B<L> specifying that a following B<e>, B<E>, B<f>, B<g>, or B<G>
+ conversion corresponds to a `C<long double>' argument.
+ 
+ =item o
+ 
+ A character that specifies the type of conversion to be applied.
+ 
+ =back
+ 
+ A field width or precision, or both, may be indicated by an asterisk `B<*>' or
+ an asterisk followed by one or more decimal digits and a `B<$>' instead of a
+ digit string.  In this case, an `C<int>' argument supplies the field width or
+ precision.  A negative field width is treated as a left adjustment flag
+ followed by a positive field width; a negative precision is treated as though
+ it were missing.  If a single format directive mixes positional (`B<nn$>') and
+ non-positional arguments, the results are undefined.
+ 
+ The conversion specifiers and their meanings are:
+ 
+ =over 4
+ 
+ =item B<diouxX>
+ 
+ The `C<int>' (or appropriate variant) argument is converted to signed decimal
+ (B<d> and B<i>), unsigned octal (B<o>), unsigned decimal (B<u>), or unsigned
+ hexadecimal (B<x> and B<X>) notation.  The letters B<abcdef> are used for B<x>
+ conversions; the letters B<ABCDEF> are used for B<X> conversions.  The
+ precision, if any, gives the minimum number of digits that must appear; if the
+ converted value requires fewer digits, it is padded on the left with zeros.
+ 
+ =item B<DOU>
+ 
+ The `C<long int> argument is converted to signed decimal, unsigned octal, or
+ unsigned decimal, as if the format had been B<ld>, B<lo>, or B<lu>
+ respectively.  These conversion characters are deprecated, and will eventually
+ disappear.
+ 
+ =item B<eE>
+ 
+ The `C<double>' argument is rounded and converted in the style
+ `[-]d.dddB<e>+-dd' where there is one digit before the decimal-point character
+ and the number of digits after it is equal to the precision; if the precision
+ is missing, it is taken as 6; if the precision is zero, no decimal-point
+ character appears.  An B<E> conversion uses the letter B<E> (rather than B<e>)
+ to introduce the exponent.  The exponent always contains at least two digits;
+ if the value is zero, the exponent is 00.
+ 
+ =item B<f>
+ 
+ The `C<double>' argument is rounded and converted to decimal notation in the
+ style `[-]ddd.ddd>' where the number of digits after the decimal-point
+ character is equal to the precision specification.  If the precision is
+ missing, it is taken as 6; if the precision is explicitly zero, no
+ decimal-point character appears.  If a decimal point appears, at least one
+ digit appears before it.
+ 
+ =item B<g>
+ 
+ The `C<double>' argument is converted in style B<f> or B<e> (or B<E> for B<G>
+ conversions).  The precision specifies the number of significant digits.  If
+ the precision is missing, 6 digits are given; if the precision is zero, it is
+ treated as 1.  Style B<e> is used if the exponent from its conversion is less
+ than -4 or greater than or equal to the precision.  Trailing zeros are removed
+ from the fractional part of the result; a decimal point appears only if it is
+ followed by at least one digit.
+ 
+ =item B<c>
+ 
+ The `C<int>' argument is converted to an `C<unsigned char>, and the resulting
+ character is written.
+ 
+ =item B<s>
+ 
+ The `C<char *>' argument is expected to be a pointer to an array of character
+ type (pointer to a string).  Characters from the array are written up to (but
+ not including) a terminating C<NUL> character; if a precision is specified, no
+ more than the number specified are written.  If a precision is given, no null
+ character need be present; if the precision is not specified, or is greater
+ than the size of the array, the array must contain a terminating C<NUL>
+ character.
+ 
+ =item B<p>
+ 
+ The `C<void *> pointer argument is printed in hexadecimal (as if by `B<%#x>'
+ or `C<%#lx>).
+ 
+ =item B<n>
+ 
+ The number of characters written so far is stored into the integer indicated
+ by the `C<int *>' (or variant) pointer argument.  No argument is converted.
+ 
+ =item B<%>
+ 
+ A `B<%>' is written. No argument is converted. The complete conversion
+ specification is `B<%%>.
+ 
+ =back
+ 
+ In no case does a non-existent or small field width cause truncation of a
+ field; if the result of a conversion is wider than the field width, the field
+ is expanded to contain the conversion result.
+ 
+ =head1 EXAMPLES
+ 
+ In the following a few snippets of selected use cases of B<Str> are
+ presented:
+ 
+ =over 4
+ 
+ =item B<Splice a String into Another>
+ 
+  char *v1 = "foo bar quux";
+  char *v2 = "baz";
+  str_splice(v1, 3, 5, v2, 0):
+  /* now we have v1 = "foobazquux" */
+  ....
+ 
+ =item B<Tokenize a String>
+ 
+  char *var = " foo \t " bar 'baz'" q'uu'x #comment";
+  char *tok, *p;
+  p = var; 
+  while ((tok = str_token(p, ":", "\"'", "#", 0)) != NULL) {
+      /* here we enter three times: 
+         1. tok = "foo"
+         2. tok = " bar 'baz'" 
+         3. tok = "quux" */ 
+      ...
+  }
+ 
+ =item B<Match a String>
+ 
+  char *var = "foo:bar";
+  if (str_parse(var, "^.+?:.+$/) > 0) {
+      /* var matched */
+      ...
+  }
+ 
+ =item B<Match a String and Go Ahead with Details>
+ 
+  char *var = "foo:bar";
+  char *cp, *v1, *v2;
+  if (str_parse(var, "m/^(.+?):(.+)$/b", &cp, &v1, &v2) > 0) {
+      ...
+      /* now we have: 
+         cp = "foo\0bar\0" and v1 and v2 pointing
+         into it, i.e., v1 = "foo", v2 = "bar" */
+      ...
+      free(cp);
+  }
+ 
+ =item B<Substitute Text in a String>
+ 
+  char *var = "foo:bar";
+  char *subst = "quux";
+  char *new;
+  str_parse(var, "s/^(.+?):(.+)$/$1-%s-$2/", &new, subst);
+  ...
+  /* now we have: var = "foo:bar", new = "foo:quux:bar" */
+  ...
+  free(new);
+ 
+ =item B<Format a String>
+ 
+  char *v0 = "abc..."; /* length not guessable */
+  char *v1 = "foo";
+  void *v2 = 0xDEAD;
+  int v3 = 42;
+  char *cp;
+  int n;
+ 
+  n = str_format(NULL, 0, "%s|%5s-%x-%04d", v0, v1, v2, v3); 
+  cp = malloc(n);
+  str_format(cp, n, "%s-%x-%04d", v1, v2, v3); 
+  /* now we have cp = "abc...|  foo-DEAD-0042" */
+  ...
+  free(cp);
+ 
+ =back
+ 
+ =head1 SEE ALSO
+ 
+ string(3), printf(3), perlre(1).
+ 
+ =head1 HISTORY
+ 
+ The B<Str> library was written in November and December 1999 by Ralf
+ S. Engelschall. As building blocks various existing code was used and
+ recycled: for the str_token(3) implementation an anchient strtok(3)
+ flavor from William Deich 1991 was cleaned up and adjusted. As the
+ background parsing engine for str_parse(3) a heavily stripped down
+ version of Philip Hazel's Perl Compatible Regular Expression (PCRE)
+ library (initially version 2.08 and now 3.5) was used. The str_format(3)
+ implementation was based on Panos Tsirigotis' sprintf(3) code as
+ adjusted by the Apache Software Foundation (ASF) 1998. The formatting
+ engine was stripped down and enhanced to support internal extensions
+ which were required by str_format(3) and str_parse(3).
+ 
+ =head1 AUTHOR
+ 
+  Ralf S. Engelschall
+  rse@engelschall.com
+  www.engelschall.com
+ 
+ =cut
+ 

CVSTrac 2.0.1