## ## OSSP var -- Variable Expansion ## Copyright (c) 2001-2002 The OSSP Project (http://www.ossp.org/) ## Copyright (c) 2001-2002 Cable & Wireless Deutschland (http://www.cw.com/de/) ## ## This file is part of OSSP VAR, an extensible data serialization ## library which can be found at http://www.ossp.org/pkg/lib/var/. ## ## Permission to use, copy, modify, and distribute this software for ## any purpose with or without fee is hereby granted, provided that ## the above copyright notice and this permission notice appear in all ## copies. ## ## THIS SOFTWARE IS PROVIDED `AS IS' AND ANY EXPRESSED OR IMPLIED ## WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF ## MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. ## IN NO EVENT SHALL THE AUTHORS AND COPYRIGHT HOLDERS AND THEIR ## CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, ## SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT ## LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF ## USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ## ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, ## OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT ## OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF ## SUCH DAMAGE. ## ## var.pod: Unix manual page source ## =pod =head1 NAME B -- Variable Expansion =head1 SYNOPSIS =over 2 =item Types: B, B, B, B, B, B. =item Functions: B, B, B, B, B, B, B, B. =item Variables: B. =back =head1 DESCRIPTION B is a flexible, full-featured and fast variable construct expansion library. It supports a configurable variable construct syntax very similar to the style found in many scripting languages (like C<@>I, C<${>IC<}>, C<$(>IC<)>, etc.) and provides both simple scalar (C<${>IC<}>) and array (C<${>IC<[>IC<]}>) expansion, plus optionally one or more post-operations on the expanded value (C<${>IC<:>I[C<:>I...]]C<}>). The supported post-operations are length determination, case conversion, defaults, postive and negative alternatives, sub-strings, regular expression based substitutions, character translations, and padding. Additionally, a meta-construct plus arithmetic expressions for index and range calculations allow (even nested) iterations over array variable expansions (..C<[>..C<${>IC<[#+1]}>..C<]>..). The actual variable value lookup is performed through a callback function, so B can expand arbitrary values. =head1 SYNTAX CONSTRUCTS A string expanded through B can consist of arbitrary text characters plus one or more of the following special syntax constructs which are expanded by B. =over 4 =item C<\>I Character with the octal value I (I: C<0>,...,C<7>). =item C<\x>I, C<\x{>IC<}> Character with the hexadecimal value I or the characters denoted by grouped hexadecimal numbers I. (I, I: C<0>,...,C<9>,[C],...,[C]). =item C<\t>, C<\r>, C<\n> Tabulator, Carriage Return and Newline character. =item C<\\>, C<\>I Ordinary character C<\> and I. =item C<$>I, C<${>IC<}> Contents of scalar variable I. =item C<${>IC<[>IC<]>C<}> Contents of array variable I at position I. For I full arithmetic expressions are allowed. =item C<${>IC<:#}> Length of C<$>I. =item C<${>IC<:l}>, C<${>IC<:u}> C<$>I, converted to all lower-case or all upper-case. =item C<${>IC<:->IC<}> If C<$>I is not empty string and not undefined, then C<$>I, else I (default value). =item C<${>IC<:+>IC<}> If C<$>I is empty string, then empty string, else I (positive alternative). =item C<${>IC<:*>IC<}> If C<$>I is not empty string, then empty string, else I (negative alternative). =item C<${>IC<:o>IC<,>[I]C<}> Substring of C<$>I starting at position I with I characters. =item C<${>IC<:o>IC<->[I]C<}> Substring of C<$>I starting at position I and ending at position I (inclusive). =item C<${>IC<:s/>ICIC[C]C<}> C<$>I after replacing characters matching I with I. By default, case-sensitive regular expression matching is performed and only the first occurance of I is replaced. Flag "C" switches to case insensitive matching; flag "C" switches to plain text pattern; flag "C" switches to replacements of all occurances; flag "C" switches to multi-line matching (That is, change "C<^>" and "C<$>" from matching the start or end of the string to matching the start or end of any line). =item C<${>IC<:y/>ICIC C<$>I after replacing all characters found in the I character class by the corresponding character in the I character class. =item C<${>IC<:p/>ICIC{C,C,C}C<}> C<$>I after padding to I with I. Original contents of I is either left justified (flag "C"), centered (flag "C"), or right justified (flag "C"). =item C<${>IC<:%>I[C<(>IC<)>]C<}> C<$>I after passing it to an application-supplied function I. The optional argument I is passed to the function, too. By default no such functions are defined. =item C<[>IC<]>, C<[>IC<]>C<{>IC<,>IC<,>IC<}> Repeat expansion of I as long as at least one array variable does not expand to the empty string (first variant) or exactly (I-I)/I times (second variant). In both cases the character "C<#>" is expanded in C as the current loop index (C<0>,... for first variant and I,...,I with stepping I for second variant). I of array variable lookups. For I, I and I, full arithmetic expressions are allowed. This loop construct can be nested, too. In this case an inner loop is fully repeated for each iteration of the outer loop. Additionally, arithmetic expressions are supported in both I, I, I and I parts of variable constructs in I. =back =head1 SYNTAX CONSTRUCTS (GRAMMAR) All the variable syntax constructs supported by B follow the same grammatical form. For completeness and reference reasons, the corresponding grammar is given in an extended BNF: input ::= ( TEXT | variable | INDEX_OPEN input INDEX_CLOSE (loop_limits)? )* variable ::= DELIM_INIT (name|expression) name ::= (NAME_CHARS)+ expression ::= DELIM_OPEN (name|variable)+ (INDEX_OPEN num_exp INDEX_CLOSE)? (':' command)* DELIM_CLOSE command ::= '-' (TEXT_EXP|variable)+ | '+' (TEXT_EXP|variable)+ | 'o' NUMBER ('-'|',') (NUMBER)? | '#' | '*' (TEXT_EXP|variable)+ | 's' '/' (TEXT_PATTERN)+ '/' (variable|TEXT_SUBST)* '/' ('m'|'g'|'i'|'t')* | 'y' '/' (variable|TEXT_SUBST)+ '/' (variable|TEXT_SUBST)* '/' | 'p' '/' NUMBER '/' (variable|TEXT_SUBST)* '/' ('r'|'l'|'c') | '%' (name|variable)+ ('(' (TEXT_ARGS)? ')')? | 'l' | 'u' num_exp ::= operand | operand ('+'|'-'|'*'|'/'|'%') num_exp operand ::= ('+'|'-')? NUMBER | INDEX_MARK | '(' num_exp ')' | variable loop_limits ::= DELIM_OPEN (num_exp)? ',' (num_exp)? (',' (num_exp)?)? DELIM_CLOSE NUMBER ::= ('0'|...|'9')+ TEXT_PATTERN::= (^('/'))+ TEXT_SUBST ::= (^(DELIM_INIT|'/'))+ TEXT_ARGS ::= (^(DELIM_INIT|')'))+ TEXT_EXP ::= (^(DELIM_INIT|DELIM_CLOSE|':'))+ TEXT ::= (^(DELIM_INIT|INDEX_OPEN|INDEX_CLOSE))+ DELIM_INIT ::= '$' DELIM_OPEN ::= '{' DELIM_CLOSE ::= '}' INDEX_OPEN ::= '[' INDEX_CLOSE ::= ']' INDEX_MARK ::= '#' NAME_CHARS ::= 'a'|...|'z'|'A'|...|'Z'|'0'|...|'9' Notice that the grammar definitions of DELIM_INIT, DELIM_OPEN, DELIM_CLOSE, INDEX_OPEN, INDEX_CLOSE, INDEX_MARK and NAME_CHARS correspond to the default syntax configuration only. They can be changed through the API (see B). =head1 APPLICATION PROGRAMMING INTERFACE (API) The following is a detailed description of the B B language Application Programming Interface (API): =head2 TYPES The B API consists of the following B data types: =over 4 =item B This is an exported enumerated integer type describing the return code of all API functions. On success, every API functions returns C. On error, they return C. For a list of all possible return codes see F. Their corresponding describing text can be determined with var_strerror(). =item B This is an opaque data type representing a variable expansion context. Only pointers to this abstract data type are used in the API. =item B This is an exported enumerated integer type describing configuration parameters for var_config(). Currently C (for configuring the syntax via B) and C (for configuring the callback for value lookups via B) are defined. =item B This is an exported structural data type describing the variable construct syntax. It is passed to var_config() on C and consists of the following members (directly corresponding to the upper-case non-terminals in the grammar above): char escape; /* default: '\\' */ char delim_init; /* default: '$' */ char delim_open; /* default: '{' */ char delim_close; /* default: '}' */ char index_open; /* default: '[' */ char index_close; /* default: ']' */ char index_mark; /* default: '#' */ char *name_chars; /* default: "a-zA-Z0-9_" */ All members are single character constants, except for I which is a character class listing all valid characters. As an abbreviation the construct "IC<->I" is supported which means all characters from I to I (both included) in the underlying character set. =item B This is an exported function pointer type for variable value lookup functions. Such a callback function B has to be of the following prototype: var_rc_t *B(var_t *I, void *I, const char *I, size_t I, int I, const char **I, size_t *I, size_t *I); This function will be called by var_expand() internally whenever it has to resolve the contents of a variable. Its parameters are: =over 4 =item var_t *I This is the passed-through argument as passed to var_expand() as the first argument. This can be used in the callback function to distinguish the expansion context or to resolve return codes, etc. =item void *I This is the passed-through argument as passed to var_config() on C as the forth argument. This can be used to provide an internal context to the callback function through var_expand(). =item const char *I This is a pointer to the name of the variable which's contents var_expand() wishes to resolve. Please note that the string is NOT necessarily terminated by a C ('C<\0>') character. If the callback function needs it C-terminated, it has to copy the string into an a temporary buffer of its own and C-terminate it there. =item size_t I This is the length of the variable name at I. =item int I This determines which entry of an array variable to lookup. If the variable specification that led to the execution of the lookup function did not contain an index, zero (C<0>) is provided by default as I. If I is less than zero, the callback should return the number of entries in the array variable. If I is greater or equal zero, it should return the specified particular entry. It is up to the callback to decide what to return for an index not equal to zero if the underlying variable is a scalar. =item const char **I This is a pointer to the location where the callback function should store the pointer to the resolved value of the variable. =item size_t *I This is a pointer to the location where the callback function should store the length of the resolved value of the variable. =item size_t *I This is a pointer to the location where the callback function should store the size of the buffer that has been allocated to hold the value of the resolved variable. If no buffer has been allocated by the callback at all, because the variable uses some other means of storing the contents -- as in the case of getenv(3), where the system provides the buffer for the string --, this should be set to zero (C<0>). In case a buffer size greater than zero is returned by the callback, var_expand() will make use of that buffer internally if possible. It will also free(3) the buffer when it is not needed anymore, so it is important that it was previously allocated with malloc(3) by the callback. =back The return code of the lookup function B is interpreted by var_expand() according to the following convention: C means success, that is, the contents of the variable has been resolved successfully and the I, I, and I variables have been filled with appropriate values. A return code C means that the resolving failed, such as a system error or lack of resources. In the latter two cases, the contents of I, I and I is assumed to be undefined. Hence, var_expand() will not free(3) any possibly allocated buffers, the callback must take care of this itself. If a callback returns the special C return code, the behaviour of var_expand() depends on the setting of its I parameter. If I has been set, var_expand() will pass-through this error to the caller. If I has not been set, var_expand() will copy the expression that caused the lookup to fail verbatimly into the output buffer so that an additional expanding pass may expand it later. If the callback returns an C, var_expand() will fail with this return code. If the cause for the error can not be denoted by an error code defined in F, callback implementors should use the error code C (which is currently defined to -64). It is guaranteed that no error code smaller than C is ever used by any B API function, so if the callback implementor wishes to distinguish between different reasons for failure, he subtract own callback return codes from this value, i.e., return (C - I) (I E= 0) from the callback function. =item B This is an exported function pointer type for variable value operation functions. Such a callback function B has to be of the following prototype: var_rc_t *B(var_t *I, void *I, const char *I, size_t I, const char *I, size_t I, const char *I, size_t I, const char **I, size_t *I, size_t *I); This function will be called by var_expand() internally whenever a custom operation is used. Its parameters are: =over 4 =item var_t *I This is the passed-through argument as passed to var_expand() as the first argument. This can be used in the callback function to distinguish the expansion context or to resolve return codes, etc. =item void *I This is the passed-through argument as passed to var_config() on C as the forth argument. This can be used to provide an internal context to the callback function through var_expand(). =item const char *I This is a pointer to the name of the operation which var_expand() wishes to perform. Please note that the string is NOT necessarily terminated by a C ('C<\0>') character. If the callback function needs it C-terminated, it has to copy the string into an a temporary buffer of its own and C-terminate it there. =item size_t I This is the length of the variable name at I. =item const char *I This is a pointer to the optional argument string to the operation. If no argument string or an empty argument string was supplied this is C. =item size_t I This is the length of the I. =item const char *I This is a pointer to the value of the variable which the operation wants to adjust. =item size_t I This is the length of the I. =item const char **I This is a pointer to the location where the callback function should store the pointer to the adjusted value. =item size_t *I This is a pointer to the location where the callback function should store the length of the adjusted value of the variable. =item size_t *I This is a pointer to the location where the callback function should store the size of the buffer that has been allocated to hold the adjusted value of the variable. If no buffer has been allocated by the callback at all, because the variable uses some other means of storing the contents, this should be set to zero (C<0>). In case a buffer size greater than zero is returned by the callback, var_expand() will make use of that buffer internally if possible. It will also free(3) the buffer when it is not needed anymore, so it is important that it was previously allocated with malloc(3) by the callback. =back =back =head2 FUNCTIONS The B API consists of the following B functions: =over 4 =item var_rc_t B(var_t **I); Create a new variable expansion context and store it into I. =item var_rc_t B(var_t *I); Destroy the variable expansion context I. =item var_rc_t B(var_t *I, var_config_t I, ...); Configure the variable expansion context I. The variable argument list depends on the I identifier: =over 4 =item C, var_syntax_t *I This overrides the syntax configuration in I with the one provided in I. The complete structure contents is copied, so the caller is allowed to immediately destroy I after the var_config() call. The default is the contents as shown above under the type description of B. =item C, var_cb_value_t I, void *I This overrides the syntax configuration in I with the one provided The default is C for I and I. At least C for I is not valid for proper operation of var_expand(), so the caller has to configure the callback before variable expansions can be successfully performed. =back =item var_rc_t B(var_t *I, const char *I, size_t I, char *I, size_t I, int I); This expands escape sequences found in the input buffer I/I. The I/I point to a output buffer, into which the expanded data is copied if processing is successful. The size of this buffer must be at least I+1 characters. The reason is that var_unescape() always adds a terminating C ('C<\0>') character at the end of the output buffer, so that you can use the result comfortably with other C library routines. The supplied I either has to be point to a pre-allocated buffer or is allowed to point to I (because the unescaping operation is guarrantied to either keep the size or reduce the size of the input). The parameter I is a boolean flag that modifies the behavior of var_unescape(). If is set to true (any value except zero), var_unescape() will expand B escape sequences it sees, even those that it does not know about. This means that "C<\1>" will become "C<1>", even though "C<\1>" has no special meaning to var_unescape(). If I is set to false (the value zero), such escape sequences will be copied verbatimly to the output buffer. The quoted pairs supported by var_unescape() are "C<\t>" (tabulator), "C<\r>" (carriage return), "C<\n>" (line feed), "C<\NNN>" (octal value), "C<\xNN>" (hexadecimal value), and "C<\x{NNMM..}>" (grouped hexadecimal values). =item var_rc_t B(var_t *I, const char *I, size_t I, char **I, size_t *I, int I); This is the heart of B. It expands all syntax constructs in I/I and stores them in an allocated buffer returned in I/I. The output buffer I/I is allocated by var_expand() using the system call malloc(3), thus it is the caller's responsibility to free(3) that buffer once it is no longer used anymore. The output buffer for convinience reasons is always C-terminated by var_expand(), but this C character is not counted for I. The I pointer can be specified as C if you are not interested in the output buffer length. The I flag determines how var_expand() deals with undefined variables (indicated by the callback function through the return code C). If it is set to true (any value except zero), var_expand() will fail with error code C whenever an undefined variable is encountered. That is, it just passes-through the return code of the callback function. If set to false (value zero), var_expand() will copy the expression it failed to expand verbatimly into the output buffer, in order to enable you to go over the buffer with an additional pass. Generally, if you do not plan to use muli-pass expansion, you should set I to true in order to make sure no unexpanded variable constructs are left over in the buffer. If var_expand() fails with an error, I will point to I and I will contain the number of characters that have been consumed from I before the error occured. In other words, if an error occurs, I/I point to the last parsing location in I/I before the error occured. The only exceptions for this error semantics are: on C and C errors, I and I are undefined. =item var_rc_t B(var_t *I, char **I, int I, const char *I, va_list I); This is a high-level function on top of B which expands simple printf(3)-style constructs before expanding the complex variable constructs. So, this is something of a combination between sprintf(3) and B. It expands simple "C<%s>" (string, type "C"), "C<%d>" (integer number, type "C") and "C<%c>" (character, type "C") constructs in I. The values are taken from the variable argument vector I. After this expansion the result is passed through B by passing through the I, I and I arguments. The final result is a malloc(3)'ed buffer provided in I which the caller has to free(3) later. =item var_rc_t B(var_t *I, char **I, int I, const char *I, ...); This is just a wrapper around B which translates the variable argument list into C. =item var_rc_t B(var_t *I, var_rc_t I, char **I); This can be used to map any B return codes (as returned by all the B API functions) into a clear-text message describing the reason for failure in prose. Please note that errors coming from the callback, such as C and those based on it, cannot be mapped and will yield the message "C". =back =head2 VARIABLES =over 4 =item B =back =head1 COMBINING UNESCAPING AND EXPANSION For maximum power and flexibility, you usually want to combine var_unescape() and var_expand(). That is, you will want to use var_unescape() to turn all escape sequences into their real representation before you call var_expand() for expanding variable constructs. This way the user can safely use specials like "C<\n>" or "C<\t>" throughout the template and achieve the desired effect. These escape sequences are particularly useful if search-and-replace or transpose actions are performed on variables before they are expanded. Be sure, though, to make the first var_unescape() pass with the I flag set to false, or the routine will also expand escape sequences like "C<\1>", which might have a special meaning (regular expression back-references) in the var_expand() pass to follow. Once, all known escape sequences are expanded, expand the variables with var_expand(). After that, you will want to have a second pass with var_unescape() and the flag I set to true, to make sure all remaining escape sequences are expanded. Also, the var_expand() pass might have introduced now quoted pairs into the output text, which you need to expand to get the desired effect. =head1 EXAMPLE (DEVELOPER) The following simple but complete program illustrates the full usage of B. It accepts a single argument on the command line and expands this in three steps (unescaping known escape sequences, expanding variable constructs, unescaping new and unknown escape sequences). The value lookup callback uses the process environment to resolve variables. #include #include #include #include "var.h" static var_rc_t lookup( var_t *var, void *ctx, const char *var_ptr, size_t var_len, int var_idx, const char **val_ptr, size_t *val_len, size_t *val_size) { char tmp[256]; if (var_idx != 0) return VAR_ERR_ARRAY_LOOKUPS_ARE_UNSUPPORTED; if (var_len > sizeof(tmp) - 1) return VAR_ERR_OUT_OF_MEMORY; memcpy(tmp, var_ptr, var_len); tmp[var_len] = '\0'; if ((*val_ptr = getenv(tmp)) == NULL) return VAR_ERR_UNDEFINED_VARIABLE; *val_len = strlen(*val_ptr); *val_size = 0; return VAR_OK; } static void die(const char *context, var_t *var, var_rc_t rc) { char *error; var_strerror(var, rc, &error); fprintf(stderr, "ERROR: %s: %s (%d)\n", context, error, rc); exit(1); } int main(int argc, char *argv[]) { var_t *var; var_rc_t rc; char *src_ptr; char *dst_ptr; size_t src_len; size_t dst_len; var_syntax_t syntax = { '\\', '$', '{', '}', '[', ']', '#', "a-zA-Z0-9_" }; /* command line handling */ if (argc != 2) die("command line", NULL, VAR_ERR_INVALID_ARGUMENT); src_ptr = argv[1]; src_len = strlen(src_ptr); fprintf(stdout, "input: \"%s\"\n", src_ptr); /* establish variable expansion context */ if ((rc = var_create(&var)) != VAR_OK) die("create context", NULL, rc); if ((rc = var_config(var, VAR_CONFIG_SYNTAX, &syntax)) != VAR_OK) die("configure syntax", var, rc); if ((rc = var_config(var, VAR_CONFIG_CB_VALUE, lookup, NULL)) != VAR_OK) die("configure callback", var, rc); /* unescape known escape sequences (in place) */ if ((rc = var_unescape(var, src_ptr, src_len, src_ptr, src_len+1, 0)) != VAR_OK) die("unescape known escape sequences", var, rc); src_len = strlen(src_ptr); fprintf(stdout, "unescaped: \"%s\"\n", src_ptr); /* expand variable constructs (force expansion) */ if ((rc = var_expand(var, src_ptr, src_len, &dst_ptr, &dst_len, 1)) != VAR_OK) { if (rc != VAR_ERR_INVALID_ARGUMENT && rc != VAR_ERR_OUT_OF_MEMORY) { fprintf(stdout, "parsing: \"%s\"\n", dst_ptr); fprintf(stdout, " %*s\n", dst_len, "^"); } die("variable expansion", var, rc); } fprintf(stdout, "expanded: \"%s\"\n", dst_ptr); /* unescape new and unknown escape sequences (in place) */ if ((rc = var_unescape(var, dst_ptr, dst_len, dst_ptr, dst_len+1, 1)) != VAR_OK) die("unescape new and unknown escape sequences", var, rc); fprintf(stdout, "output: \"%s\"\n", dst_ptr); free(dst_ptr); /* destroy variable expansion context */ if ((rc = var_destroy(var)) != VAR_OK) die("destroy context", var, rc); return 0; } Copy & paste the source code it into F (or use the version already shipped with the B source distribution), compile it with $ cc `var-config --cflags` -o var_play var_play.c `var-config --ldflags --libs` and use it to play with the various B variable expansion possibilities. =head1 EXAMPLE (USER) The following are a few sample use cases of B variable expansions. They all assume the default syntax configuration and the following variable definitions: C<$foo=foo> (a scalar), C<$bar=Ebar1,bar2,bar3,E> (an array), C<$baz=Ebaz1,baz2,baz3,E> (another array), C<$quux=quux> (another scalar), C<$name=Efoo,bar,baz,quuxE> (another scalar) and C<$empty=""> (another scalar). Input Output ----------------------------- -------------- $foo foo ${foo} foo ${bar[0]} bar1 ${${name[1]}[0]} bar1 ${foo:u:y/O/U/:s/(.*)/<\1>/} ${foo:u:y/O/U/:s/(.*)/<\1>/} ${empty:-foo} foo ${foo:+yes}${foo:*no} yes ${empty:+yes}${empty:*no} no ${foo:p/6/./l} foo... ${foo:p/6/./r} ...foo [${bar[#]}${bar[#+1]:+,}] bar1,bar2,bar3 [${bar[#-1]:+,}${bar[#]}] bar1,bar2,bar3 [${bar[#]}]{2,1,3} bar2bar3 [${bar[#]}]{1,2,3} bar1bar3 [${foo[#]}[${bar[#]}]]{1,,2} foo1bar1bar2bar3foo2bar1bar2bar3 =head1 SEE ALSO pcre(3), regex(7), B (Value Access). =head1 HISTORY B was written by Peter Simons Esimons@crypt.toE in November 2001. Its API and internal code structure was revamped in February 2002 by Ralf S. Engelschall Erse@engelschall.comE to fully conform to the B library standards. =cut