## ** VAR - OSSP variable expression library. ## Copyright (c) 2001 The OSSP Project (http://www.ossp.org/) ## Copyright (c) 2001 Cable & Wireless Deutschland (http://www.cw.com/de/) ## ## This file is part of OSSP VAR, an extensible data serialization ## library which can be found at http://www.ossp.org/pkg/var/. ## ## Permission to use, copy, modify, and distribute this software for ## any purpose with or without fee is hereby granted, provided that ## the above copyright notice and this permission notice appear in all ## copies. ## ## THIS SOFTWARE IS PROVIDED `AS IS' AND ANY EXPRESSED OR IMPLIED ## WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF ## MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. ## IN NO EVENT SHALL THE AUTHORS AND COPYRIGHT HOLDERS AND THEIR ## CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, ## SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT ## LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF ## USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ## ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, ## OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT ## OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF ## SUCH DAMAGE. ## ## var.pod: Unix manual page source ## =pod =head1 NAME var - OSSP variable expression library. =head1 SYNOPSIS =head1 DESCRIPTION The purpose of VAR is ... =head1 THE LOOKUP CALLBACK The function var_expand() does not know how to look the contents of a variable up itself. Instead, it relies on a caller-supplied callback function, which adheres to the var_cb_t function interface: int lookup(void *context, const char *varname, size_t name_len, const char **data, size_t *data_len, size_t *buffer_size); This function will be called by var_expand() whenever it has to retrieve the contents of, say, the variable $name, using the following parameters: =over 4 =item void *context The contents of context is passed through from the var_expand()'s "lookup_context" parameter to the callback. This parameter can be used by the programmer to provide internal data to the callback function through var_expand(). =item const char *varname This is a pointer to the name of the variable which's contents var_expand() wishes to retrieve. In our example of looking up $name, varname would point to the string "name". Please note that the string is NOT necessarily terminated by a '\0' character! If the callback function needs to pass the string to the standard C library string manipulation functions during the lookup, it will have to copy the string into a buffer of its own to ensure it is null-terminated. =item size_t name_len The "name_len" parameter contains the length of the variable name "varname" points to. =item const char **data This is a pointer to the location where the callback function should store the pointer to the contents of the variable. =item size_t *data_len This is a pointer to the location where the callback function should store the length of the contents of the variable. =item size_t *buffer_size This is a pointer to the location where the callback function should store the size of the buffer that has been allocated to hold the contents of the looked-up variable. If no buffer has been allocated at all, because the variable uses some other means of storing the contents -- as in the case of getenv(3), where the system provides the buffer for the string --, this should be zero (0). In case a buffer size greater than zero is returned by the callback function, var_expand() will make use of that buffer internally if possible. If will also free(3) the buffer when it is not needed anymore. =back The return code of the lookup function is interpreted by var_expand() accordin to the following convention: Any return code greater than zero means success, that is, the contents of the variable has been looked-up successfully and the "data", "data_len", and "buffer_size" locations have been filled with appropriate values. A return code of zero (0) means that the variable was undefined and its contents therefore could not be looked-up. A return code of less than zero means that the lookup failed for some other reason, such as a system error or lack of resources. In the latter two cases, the contents of "data", "data_len" and "buffer_size" is assumed to be undefined. Hence, var_expand() will not free(3) any possibly allocated buffers, the callback must take care of that itself. If a callback returns zero -- meaning the variable is undefined --, the behavior of var_expand() depends on the setting of the "force_expand" parameter. If force-expand mode has been set, var_expand() will fail with a VAR_ERR_UNDEFINED_VARIABLE error. If force-expand mode has not been set, var_expand() will copy the expression that caused the lookup to fail verbatimly into the output buffer so that an additional expanding pass may expand it later. If the callback returns an error -- meaning a return code less than zero --, var_expand() will fail with the return code it got from the callback. Callback implementors are encouraged to re-use the error codes defined in var.h whereever possible. An example of an error code a callback might want to reuse is VAR_ERR_OUT_OF_MEMORY. If the cause for the error can not be denoted by an error code defined in var.h, callback implementors should use the error code VAR_ERR_CALLBACK, which is currently defined to -64. It is guaranteed that no error code smaller than VAR_ERR_CALLBACK is ever used by var_expand() or VAR_UNESCAPE(), so if the callback implementor wishes to distinguish between different reasons for failure, he can define his own set of errors typedef enum { LOOKUP_ERROR_ONE = -3, LOOKUP_ERROR_TWO = -2, LOOKUP_ERROR_THREE = -1, } lookup_error_t; and return, say, "(VAR_ERR_CALLBACK - LOOKUP_ERROR_TWO)". To illustrate the implementation of a proper callback, take a look at the following expamle that accesses the system environment via getenv(3) to lookup variables and to return them to var_expand(): int env_lookup(void *context, const char *varname, size_t name_len, const char **data, size_t *data_len, size_t *buffer_size) { char tmp[256]; if (name_len > sizeof(tmp) - 1) { /* Callback can't expand variable names longer than sizeof(tmp) characters. */ return VAR_ERR_CALLBACK; } memcpy(tmp, varname, name_len); tmp[name_len] = '\0'; *data = getenv(tmp); if (*data == NULL) return 0; *data_len = strlen(*data); *buffer_size = 0; return 1; } =head1 SUPPORTED NAMED CHARACTERS The var_unescape() function knows the following constructs: =over 4 =item \t, \r, \n These expressions are replaced by the appropriate binary representation of a tab, a carrige return and a newline respectively. =item \abc This expression is replaced by the value of the octal number "abc". Valid digits of "a" are in the range of '0' to '3', for digits "b" and "c" in the range of '0' to '7'. Please note that an octal expression is recognized only if the backslash is followed by three digits! The expression "\1a7", for example, is interpreted as the quoted pair "\1" followed by the verbatim text "a7". =item \xAB This expression is replaced by the value of the hexadecimal number $AB. Both characters "A" and "B" must be in the range of '0' to '9', 'a' to 'f', or 'A' to 'F'. =item \x{...} This expression denotes a set of grouped hexadecimal numbers. The "..." part may consist of an arbitrary number of hexadecimal pairs, such as in "\x{}", "\x{ff}", or "\x{55ffab04}". The empty expression "\x{}" is a no-op; it will not produce any output. This construct may be useful to specify multi-byte characters (as in Unicode). Even though "\x{0102}" is effectively equivalent to "\x01\x02", the grouping of values may be useful in other contexts, even though var_unescape() or var_expand() make no direct use of it. =back =head1 SUPPORTED VARIABLE EXPRESSIONS Additionally to the ordinary variable expansion of $name or ${name}, var_expand() supports a number of operations that can be performed on the contents of "name" before it is copied to the output buffer. Such operations are always denoted by appending the a colon and a command character to the variable name, for expample: ${name:l} or ${name:s/foo/bar/}. You can specify multiple operations, which are executed from the left to the right, for expample: ${name:l:s/foo/bar/:u}. Also, you can nest variable expansion and command execution pretty much anywhere in the construct, for example: ${name:s/$foo/$bar/g}. In that context is probably useful to have a look at the formal expression grammar provided in section "EBNF GRAMMAR OF SUPPORTED EXPRESSIONS". Generally, all operations described below do not modify the contents of any variable -- var_expand() generally can't set variables, it will only read them. If the description says that an operation "replaces the contents of variable $foo", it is meant that rather than expanding the expression the the contents of $foo, it will expand to the modified string instead. The contents of $foo is left untouched in any case. =over 4 =item ${name:#} This operation will expand to the length of the contents of $name. If, for example, "$FOO" is "foobar", then "${FOO:#}" will result in "6". =item ${name:l} This operation will turn the contents of $name to all lower-case, using the system routine tolower(3), thereby possibly using the system's localization settings. =item ${name:u} This operation will turn the contents of $name to all upper-case, using the system routine toupper(3), thereby possibly using the system's localization settings. =item ${name:*} This operation will replace the contents of $name with the empty string ("") if $name is not empty. Otherwise, it will replace it by "word". =item ${name:-} This operation will replace the contents of $name with "word" if $name is empty. Otherwise, it will expand to the contents of $name. =item ${name:+} This operation will replace the contents of $name with "word" if $name is not empty. Otherwise, it will expand to the contents of $name. =item ${name:o-} This operation will cut the range of "start" to "end" out of the contents of $name and return that. ${name:o3-4} means, for instance, to return the next 4 charaters starting at position 3 in the string. Please note that start positions begin at zero (0)! If the "end" range is left out, as in ${name:o3-}, the operation will return the string starting at position 3 until the end. =item ${name:o,} This operation will cut the string starting at position "start" to ending position "end" out of the contents of $name and return that. Please note that the character at position "end" is not included in the result, "end - 1" is the last character position returned. ${name:o3,4}, for instance, will return the substring from position 3 to position 4 -- that is exactly one character. Also, please note that start positions begin at zero (0)! If the "end" parameter is left out, as in ${name:o3,}, the operation will return the string starting at position 3 until the end. =item ${name:s///[gti]} This operation will perform a search-and-replace operation on the contents of $name and return the result. The behavior of the search-and-replace is modified by the following flags parameter: If a 't' flag has been provided, a plain text search-and-replace is performed, otherwise, the default is to a regular expression search-and-replace as in the system utility sed(1). If the 'g' flag has been provided, the search-and-replace will replace all instances of "pattern" by "replace", otherwise, the default is to replace only the first instance. If the 't' flag has been provided, the search-and-replace will take place case-insensitively, otherwise, the default is to distinguish character case. =item ${name:y///} This operation will translate all characters in the contents of $name that are found in the "ochars" class to the corresponding character in the "nchars" class, just like the system utility tr(1) does. Both "ochars" and "nchars" may contain character range specifications, for example "a-z0-9". A hyphon as the first or last character of the class specification is interpreted literally. Both the "ochars" and the "nchars" class must contain the same number of characters after all ranges are expanded, or var_expand() will abort with an error. If, for example, "$FOO" would contain "foobar", then "${FOO:y/a-z/A-Z/} would yield "FOOBAR". Another goodie is to use that operation to ROT13-encrypt or decrypt a string with the expression "${FOO:y/a-z/n-za-m/}". =item ${name:p///} This operation will pad the contents of $name with "string" according to the "align" parameter, so that the result is at least "width" characters long. Valid parameters for align are 'l' (left), 'r' (right), or 'c' (center). The "string" parameter may contain multiple characters, if you see any use for that. If, for example, "$FOO" is "foobar", then "${FOO:p/20/./c}" would yield ".......foobar......."; "${FOO:p/20/./l}" would yield "foobar.............."; and "${FOO:p/20/./r}" would yield "..............foobar"; =back =head1 EBNF GRAMMAR OF SUPPORTED EXPRESSIONS input : (TEXT|variable)* variable : '$' (name|expression) expression : START-DELIM (name|variable)+ (':' command)* END-DELIM name : (VARNAME)+ command : '-' (EXPTEXT|variable)+ | '+' (EXPTEXT|variable)+ | 'o' (NUMBER ('-'|',') (NUMBER)?) | '#' | '*' (EXPTEXT|variable)+ | 's' '/' (variable|SUBSTTEXT)+ '/' (variable|SUBSTTEXT)* '/' ('g'|'i'|'t')* | 'y' '/' (variable|SUBSTTEXT)+ '/' (variable|SUBSTTEXT)* '/' | 'p' '/' NUMBER '/' (variable|SUBSTTEXT)* '/' ('r'|'l'|'c') | 'l' | 'u' START-DELIM: '{' END-DELIM : '}' VARNAME : '[a-zA-Z0-9_]+' NUMBER : '[0-9]+' SUBSTTEXT : '[^$/]' EXPTEXT : '[^$}:]+' TEXT : '[^$]+' Please note that the descriptions of START-DELIM, END-DELIM, VARNAME, SUBSTEXT, and EXPTEXT shown here assume that var_expand() has been called in the default configuration. In thruth, the contents of VARNAME corresponds directly to the setting of "namechars" in the var_config_t structure. Similarly, the dollar ('$') corresponds directly to the setting of "varinit", and the '{' and '}' characters to "startdelim" and "enddelim" respectively. =head1 CODES RETURNED BY THE LIBRARY Generally, all routines part of that library follow the convention that a return code of zero or greater denotes success and a return code of less than zero denotes failure. (This is slightly different for the callbacks, please see section "THE LOOKUP CALLBACK" for further details.) In order to distinguish the various causes of failure, the following set of defines is provided in var.h: =over 4 =item VAR_OK No errors; everything went fine. =item VAR_ERR_INCOMPLETE_QUOTED_PAIR The configured escape character as the last character in the input buffer. =item VAR_ERR_INVALID_ARGUMENT Any of the provided arguments is invalid, for expample: the pointer to the input buffer is NULL. =item VAR_ERR_SUBMATCH_OUT_OF_RANGE During execution of a ${name:s/pattern/replace/flags} operation, a submatch has been referenced in the "replace" part, which's number is greater than the number of submatches encountered in the "pattern" part, for expample: ${name:s/foo(bar)/\2/}. =item VAR_ERR_UNKNOWN_QUOTED_PAIR_IN_REPLACE During execution of a ${name:s/pattern/replace/flags} operation, the parser encountered an unknown quoted pair in the "replace" part. Valid quoted pairs are "\\", "\0", "\1", ... , "\9" only. =item VAR_ERR_EMPTY_PADDING_FILL_STRING The "fill" part in an ${name:p/width/fill/pos/} expression was found to be empty. =item VAR_ERR_MISSING_PADDING_WIDTH The "width" part in an ${name:p/width/fill/pos/} expression was found to be empty. =item VAR_ERR_MALFORMATTED_PADDING Any of the "/" delimiters was missing while parsing a ${name:p/width/fill/pos/} expression. =item VAR_ERR_INCORRECT_TRANSPOSE_CLASS_SPEC While parsing a ${name:y/old-class/new-class/} expression, any of the character class specifications had a start-of-range character that was greater (in terms of ASCII encoding) than the end-of-range character, for expample: "[z-a]". =item VAR_ERR_EMPTY_TRANSPOSE_CLASS While parsing a ${name:y/old-class/new-class/} expression, any of the character class specifications was found to be empty. =item VAR_ERR_TRANSPOSE_CLASSES_MISMATCH While parsing a ${name:y/old-class/new-class/} expression, the number of characters found in the expanded "old-class" was different than the number of characters in new-class". =item VAR_ERR_MALFORMATTED_TRANSPOSE Any of the "/" delimiters was missing while parsing a ${name:y/old-class/new-class/} expression. =item VAR_ERR_OFFSET_LOGIC The "end" offset in a ${name:o,} expression is smaller than the "start" offset. =item VAR_ERR_OFFSET_OUT_OF_BOUNDS The "start" offset in a ${name:o,} expression is greater than the number of characters found in $name. =item VAR_ERR_RANGE_OUT_OF_BOUNDS The end-of-range in a ${name:o,} or ${name:o-} expression would be greater than the number of characters found in $name. =item VAR_ERR_INVALID_OFFSET_DELIMITER The two numbers in an offset operation are delimited by a character different from "," or "-". =item VAR_ERR_MISSING_START_OFFSET The "start" offset in a ${name:o,} or ${name:o-} expression was found to be empty. =item VAR_ERR_EMPTY_SEARCH_STRING The "pattern" part of a ${name:s/pattern/replace/flags} expression was found to be empty. =item VAR_ERR_MISSING_PARAMETER_IN_COMMAND In a ${name:+word}, ${name:-word}, or ${name:*word} expression, the "word" part was missing -- that means empty. =item VAR_ERR_INVALID_REGEX_IN_REPLACE While compiling the "pattern" part of a ${name:s/pattern/replace/flags} expression, regcomp(3) failed with an error. =item VAR_ERR_UNKNOWN_REPLACE_FLAG In a ${name:s/pattern/replace/flags} expression, a flag other that "t", "i", or "g" was found. =item VAR_ERR_MALFORMATTED_REPLACE Any of the "/" delimiters was missing while parsing a ${name:s/pattern/replace/flags} expression. =item VAR_ERR_UNKNOWN_COMMAND_CHAR In a ${name:} expression, "char" did not specify any of the supported operations. =item VAR_ERR_INPUT_ISNT_TEXT_NOR_VARIABLE At one point during parsing of the input buffer, an expression was found that was neither verbatim text nor a variable expression. This usually is the result of a inconsistent configuration of var_expand() via the var_config_t paramater. =item VAR_ERR_UNDEFINED_VARIABLE Looking up a variable's contents failed and var_expand() was running in "force expand" mode. =item VAR_ERR_INCOMPLETE_VARIABLE_SPEC The input buffer ended in the middle of a ${name} expression, or the configured variable initializer character was found to be the last character of the input buffer. =item VAR_ERR_OUT_OF_MEMORY var_expand() failed while malloc(3)ing internally needed buffers. =item VAR_ERR_INVALID_CONFIGURATION Any of the characters configured in the var_config_t structure as a special ("varinit", "startdelim", "enddelim", and "escape") was found to be a member of the "namechars" class. =item VAR_ERR_INCORRECT_CLASS_SPEC The character class specification "namechars" of the var_config_t structure provided to var_expand was syntactically incorrect, that is, the start-of-range was greater than end-of-range. (See also VAR_ERR_INCORRECT_TRANSPOSE_CLASS_SPEC.) =item VAR_ERR_INCOMPLETE_GROUPED_HEX var_unescape() encountered the end of the input buffer in the middle of a grouped-hex "\x{...}" expression. =item VAR_ERR_INCOMPLETE_OCTAL var_unescape() encountered the end of the input buffer in the middle of an octal "\000" expression. =item VAR_ERR_INVALID_OCTAL The second of third digit of an octal "\000" expression was found not be in the range of '0' to '7'. =item VAR_ERR_OCTAL_TOO_LARGE The value specified via an octal "\000" expression was larger than 0377. =item VAR_ERR_INVALID_HEX Any of the digits of a hex "\x00" expression was found not be in the range of '0' to '9' or 'a' to 'b'. =item VAR_ERR_INCOMPLETE_HEX var_unescape() encountered the end of the input buffer in the middle of a hex "\x00" expression. =item VAR_ERR_INCOMPLETE_NAMED_CHARACTER var_unescape() encountered the backslash ('\') as the last character of the input buffer. =back =head1 SO ALSO regex(7) =cut