Index: ossp-pkg/var/var.pod RCS File: /v/ossp/cvs/ossp-pkg/var/var.pod,v rcsdiff -q -kk '-r1.2' '-r1.3' -u '/v/ossp/cvs/ossp-pkg/var/var.pod,v' 2>/dev/null --- var.pod 2001/11/13 12:47:59 1.2 +++ var.pod 2001/11/16 15:44:17 1.3 @@ -41,146 +41,549 @@ =head1 THE LOOKUP CALLBACK +The function var_expand() does not know how to look the contents of a +variable up itself. Instead, it relies on a caller-supplied callback +function, which adheres to the var_cb_t function interface: + + int lookup(void *context, + const char *varname, size_t name_len, + const char **data, size_t *data_len, + size_t *buffer_size); + +This function will be called by var_expand() whenever it has to +retrieve the contents of, say, the variable $name, using the following +parameters: + +=over 4 + +=item void *context + +The contents of context is passed through from the var_expand()'s +"lookup_context" parameter to the callback. This parameter can be used +by the programmer to provide internal data to the callback function +through var_expand(). + +=item const char *varname + +This is a pointer to the name of the variable which's contents +var_expand() wishes to retrieve. In our example of looking up $name, +varname would point to the string "name". Please note that the string +is NOT necessarily terminated by a '\0' character! If the callback +function needs to pass the string to the standard C library string +manipulation functions during the lookup, it will have to copy the +string into a buffer of its own to ensure it is null-terminated. + +=item size_t name_len + +The "name_len" parameter contains the length of the variable name +"varname" points to. + +=item const char **data + +This is a pointer to the location where the callback function should +store the pointer to the contents of the variable. + +=item size_t *data_len + +This is a pointer to the location where the callback function should +store the length of the contents of the variable. + +=item size_t *buffer_size + +This is a pointer to the location where the callback function should +store the size of the buffer that has been allocated to hold the +contents of the looked-up variable. If no buffer has been allocated at +all, because the variable uses some other means of storing the +contents -- as in the case of getenv(3), where the system provides the +buffer for the string --, this should be zero (0). + +In case a buffer size greater than zero is returned by the callback +function, var_expand() will make use of that buffer internally if +possible. If will also free(3) the buffer when it is not needed +anymore. + +=back + +The return code of the lookup function is interpreted by var_expand() +accordin to the following convention: Any return code greater than +zero means success, that is, the contents of the variable has been +looked-up successfully and the "data", "data_len", and "buffer_size" +locations have been filled with appropriate values. A return code of +zero (0) means that the variable was undefined and its contents +therefore could not be looked-up. A return code of less than zero +means that the lookup failed for some other reason, such as a system +error or lack of resources. In the latter two cases, the contents of +"data", "data_len" and "buffer_size" is assumed to be undefined. +Hence, var_expand() will not free(3) any possibly allocated buffers, +the callback must take care of that itself. + +If a callback returns zero -- meaning the variable is undefined --, +the behavior of var_expand() depends on the setting of the +"force_expand" parameter. If force-expand mode has been set, +var_expand() will fail with a VAR_ERR_UNDEFINED_VARIABLE error. If +force-expand mode has not been set, var_expand() will copy the +expression that caused the lookup to fail verbatimly into the output +buffer so that an additional expanding pass may expand it later. + +If the callback returns an error -- meaning a return code less than +zero --, var_expand() will fail with the return code it got from the +callback. Callback implementors are encouraged to re-use the error +codes defined in var.h whereever possible. An example of an error code +a callback might want to reuse is VAR_ERR_OUT_OF_MEMORY. If the cause +for the error can not be denoted by an error code defined in var.h, +callback implementors should use the error code VAR_ERR_CALLBACK, +which is currently defined to -64. It is guaranteed that no error code +smaller than VAR_ERR_CALLBACK is ever used by var_expand() or +VAR_UNESCAPE(), so if the callback implementor wishes to distinguish +between different reasons for failure, he can define his own set of +errors + + typedef enum { + LOOKUP_ERROR_ONE = -3, + LOOKUP_ERROR_TWO = -2, + LOOKUP_ERROR_THREE = -1, + } lookup_error_t; + +and return, say, "(VAR_ERR_CALLBACK - LOOKUP_ERROR_TWO)". + +To illustrate the implementation of a proper callback, take a look at +the following expamle that accesses the system environment via +getenv(3) to lookup variables and to return them to var_expand(): + + int env_lookup(void *context, + const char *varname, size_t name_len, + const char **data, size_t *data_len, + size_t *buffer_size) + { + char tmp[256]; + + if (name_len > sizeof(tmp) - 1) { + /* Callback can't expand variable names longer than + sizeof(tmp) characters. */ + + return VAR_ERR_CALLBACK; + } + memcpy(tmp, varname, name_len); + tmp[name_len] = '\0'; + *data = getenv(tmp); + if (*data == NULL) + return 0; + *data_len = strlen(*data); + *buffer_size = 0; + return 1; + } + +=head1 SUPPORTED NAMED CHARACTERS + +The var_unescape() function knows the following constructs: + +=over 4 + +=item \t, \r, \n + +These expressions are replaced by the appropriate binary +representation of a tab, a carrige return and a newline respectively. + +=item \abc + +This expression is replaced by the value of the octal number "abc". +Valid digits of "a" are in the range of '0' to '3', for digits "b" and +"c" in the range of '0' to '7'. Please note that an octal expression +is recognized only if the backslash is followed by three digits! The +expression "\1a7", for example, is interpreted as the quoted pair "\1" +followed by the verbatim text "a7". + +=item \xAB + +This expression is replaced by the value of the hexadecimal number +$AB. Both characters "A" and "B" must be in the range of '0' to '9', +'a' to 'f', or 'A' to 'F'. + +=item \x{...} + +This expression denotes a set of grouped hexadecimal numbers. The +"..." part may consist of an arbitrary number of hexadecimal pairs, +such as in "\x{}", "\x{ff}", or "\x{55ffab04}". The empty expression +"\x{}" is a no-op; it will not produce any output. + +This construct may be useful to specify multi-byte characters (as in +Unicode). Even though "\x{0102}" is effectively equivalent to +"\x01\x02", the grouping of values may be useful in other contexts, +even though var_unescape() or var_expand() make no direct use of it. + +=back + =head1 SUPPORTED VARIABLE EXPRESSIONS +Additionally to the ordinary variable expansion of $name or ${name}, +var_expand() supports a number of operations that can be performed on +the contents of "name" before it is copied to the output buffer. Such +operations are always denoted by appending the a colon and a command +character to the variable name, for expample: ${name:l} or +${name:s/foo/bar/}. You can specify multiple operations, which are +executed from the left to the right, for expample: +${name:l:s/foo/bar/:u}. + +Also, you can nest variable expansion and command execution pretty +much anywhere in the construct, for example: ${name:s/$foo/$bar/g}. In +that context is probably useful to have a look at the formal +expression grammar provided in section "EBNF GRAMMAR OF SUPPORTED +EXPRESSIONS". + +Generally, all operations described below do not modify the contents +of any variable -- var_expand() generally can't set variables, it will +only read them. If the description says that an operation "replaces +the contents of variable $foo", it is meant that rather than expanding +the expression the the contents of $foo, it will expand to the +modified string instead. The contents of $foo is left untouched in any +case. + =over 4 =item ${name:#} +This operation will expand to the length of the contents of $name. If, +for example, "$FOO" is "foobar", then "${FOO:#}" will result in "6". + =item ${name:l} +This operation will turn the contents of $name to all lower-case, +using the system routine tolower(3), thereby possibly using the +system's localization settings. + =item ${name:u} +This operation will turn the contents of $name to all upper-case, +using the system routine toupper(3), thereby possibly using the +system's localization settings. + =item ${name:*} +This operation will replace the contents of $name with the empty +string ("") if $name is not empty. Otherwise, it will replace it by +"word". + =item ${name:-} -=item ${name:?} +This operation will replace the contents of $name with "word" if $name +is empty. Otherwise, it will expand to the contents of $name. =item ${name:+} +This operation will replace the contents of $name with "word" if $name +is not empty. Otherwise, it will expand to the contents of $name. + =item ${name:o-} +This operation will cut the range of "start" to "end" out of the +contents of $name and return that. ${name:o3-4} means, for instance, +to return the next 4 charaters starting at position 3 in the string. +Please note that start positions begin at zero (0)! If the "end" range +is left out, as in ${name:o3-}, the operation will return the string +starting at position 3 until the end. + =item ${name:o,} +This operation will cut the string starting at position "start" to +ending position "end" out of the contents of $name and return that. +Please note that the character at position "end" is not included in +the result, "end - 1" is the last character position returned. +${name:o3,4}, for instance, will return the substring from position 3 +to position 4 -- that is exactly one character. Also, please note that +start positions begin at zero (0)! If the "end" parameter is left out, +as in ${name:o3,}, the operation will return the string starting at +position 3 until the end. + =item ${name:s///[gti]} +This operation will perform a search-and-replace operation on the +contents of $name and return the result. The behavior of the +search-and-replace is modified by the following flags parameter: If a +'t' flag has been provided, a plain text search-and-replace is +performed, otherwise, the default is to a regular expression +search-and-replace as in the system utility sed(1). If the 'g' flag +has been provided, the search-and-replace will replace all instances +of "pattern" by "replace", otherwise, the default is to replace only +the first instance. If the 't' flag has been provided, the +search-and-replace will take place case-insensitively, otherwise, the +default is to distinguish character case. + =item ${name:y///} +This operation will translate all characters in the contents of $name +that are found in the "ochars" class to the corresponding character in +the "nchars" class, just like the system utility tr(1) does. Both +"ochars" and "nchars" may contain character range specifications, for +example "a-z0-9". A hyphon as the first or last character of the class +specification is interpreted literally. Both the "ochars" and the +"nchars" class must contain the same number of characters after all +ranges are expanded, or var_expand() will abort with an error. + +If, for example, "$FOO" would contain "foobar", then +"${FOO:y/a-z/A-Z/} would yield "FOOBAR". Another goodie is to use that +operation to ROT13-encrypt or decrypt a string with the expression +"${FOO:y/a-z/n-za-m/}". + =item ${name:p///} +This operation will pad the contents of $name with "string" according +to the "align" parameter, so that the result is at least "width" +characters long. Valid parameters for align are 'l' (left), 'r' +(right), or 'c' (center). The "string" parameter may contain multiple +characters, if you see any use for that. + +If, for example, "$FOO" is "foobar", then "${FOO:p/20/./c}" would +yield ".......foobar......."; "${FOO:p/20/./l}" would yield +"foobar.............."; and "${FOO:p/20/./r}" would yield +"..............foobar"; + =back =head1 EBNF GRAMMAR OF SUPPORTED EXPRESSIONS -input : (TEXT|variable)* - -variable : '$' (name|expression) + input : (TEXT|variable)* -expression : START-DELIM (name|variable)+ (':' command)* END-DELIM + variable : '$' (name|expression) -name : (VARNAME|SPECIAL1|SPECIAL2)+ + expression : START-DELIM (name|variable)+ (':' command)* END-DELIM -command : '-' (EXPTEXT|variable)+ - | '+' (EXPTEXT|variable)+ - | 'o' (NUMBER ('-'|',') (NUMBER)?) - | '#' - | '*' (EXPTEXT|variable)+ - | 's' '/' (variable|SUBSTTEXT)+ '/' (variable|SUBSTTEXT)* '/' ('g'|'i'|'t')* - | 'y' '/' (variable|SUBSTTEXT)+ '/' (variable|SUBSTTEXT)* '/' - | 'p' '/' NUMBER '/' (variable|SUBSTTEXT)* '/' ('r'|'l'|'c') - | 'l' - | 'u' + name : (VARNAME)+ -START-DELIM: '{' + command : '-' (EXPTEXT|variable)+ + | '+' (EXPTEXT|variable)+ + | 'o' (NUMBER ('-'|',') (NUMBER)?) + | '#' + | '*' (EXPTEXT|variable)+ + | 's' '/' (variable|SUBSTTEXT)+ '/' (variable|SUBSTTEXT)* '/' ('g'|'i'|'t')* + | 'y' '/' (variable|SUBSTTEXT)+ '/' (variable|SUBSTTEXT)* '/' + | 'p' '/' NUMBER '/' (variable|SUBSTTEXT)* '/' ('r'|'l'|'c') + | 'l' + | 'u' -END-DELIM : '}' + START-DELIM: '{' -VARNAME : '[a-zA-Z0-9_]+' + END-DELIM : '}' -SPECIAL1 : '[' + VARNAME : '[a-zA-Z0-9_]+' -SPECIAL2 : ']' + NUMBER : '[0-9]+' -NUMBER : '[0-9]+' + SUBSTTEXT : '[^$/]' -SUBSTTEXT : '[^$/]' + EXPTEXT : '[^$}:]+' -EXPTEXT : '[^$}:]+' + TEXT : '[^$]+' -TEXT : '[^$]+' +Please note that the descriptions of START-DELIM, END-DELIM, VARNAME, +SUBSTEXT, and EXPTEXT shown here assume that var_expand() has been +called in the default configuration. In thruth, the contents of +VARNAME corresponds directly to the setting of "namechars" in the +var_config_t structure. Similarly, the dollar ('$') corresponds +directly to the setting of "varinit", and the '{' and '}' characters +to "startdelim" and "enddelim" respectively. =head1 CODES RETURNED BY THE LIBRARY +Generally, all routines part of that library follow the convention +that a return code of zero or greater denotes success and a return +code of less than zero denotes failure. (This is slightly different +for the callbacks, please see section "THE LOOKUP CALLBACK" for +further details.) In order to distinguish the various causes of +failure, the following set of defines is provided in var.h: + =over 4 -=item VAR_CALLBACK_ERROR +=item VAR_OK -=item VAR_EMPTY_PADDING_FILL_STRING +No errors; everything went fine. -=item VAR_MISSING_PADDING_WIDTH +=item VAR_ERR_INCOMPLETE_QUOTED_PAIR -=item VAR_MALFORMATTED_PADDING +The configured escape character as the last character in the input +buffer. -=item VAR_INCORRECT_TRANSPOSE_CLASS_SPEC +=item VAR_ERR_INVALID_ARGUMENT -=item VAR_EMPTY_TRANSPOSE_CLASS +Any of the provided arguments is invalid, for expample: the pointer to +the input buffer is NULL. -=item VAR_TRANSPOSE_CLASSES_MISMATCH +=item VAR_ERR_SUBMATCH_OUT_OF_RANGE -=item VAR_MALFORMATTED_TRANSPOSE +During execution of a ${name:s/pattern/replace/flags} operation, a +submatch has been referenced in the "replace" part, which's number is +greater than the number of submatches encountered in the "pattern" +part, for expample: ${name:s/foo(bar)/\2/}. -=item VAR_OFFSET_LOGIC_ERROR +=item VAR_ERR_UNKNOWN_QUOTED_PAIR_IN_REPLACE -=item VAR_OFFSET_OUT_OF_BOUNDS +During execution of a ${name:s/pattern/replace/flags} operation, the +parser encountered an unknown quoted pair in the "replace" part. Valid +quoted pairs are "\\", "\0", "\1", ... , "\9" only. -=item VAR_RANGE_OUT_OF_BOUNDS +=item VAR_ERR_EMPTY_PADDING_FILL_STRING -=item VAR_INVALID_OFFSET_DELIMITER +The "fill" part in an ${name:p/width/fill/pos/} expression was found +to be empty. -=item VAR_MISSING_START_OFFSET +=item VAR_ERR_MISSING_PADDING_WIDTH -=item VAR_EMPTY_SEARCH_STRING +The "width" part in an ${name:p/width/fill/pos/} expression was found +to be empty. -=item VAR_MISSING_PARAMETER_IN_COMMAND +=item VAR_ERR_MALFORMATTED_PADDING -=item VAR_INVALID_REGEX_IN_REPLACE +Any of the "/" delimiters was missing while parsing a +${name:p/width/fill/pos/} expression. -=item VAR_UNKNOWN_REPLACE_FLAG +=item VAR_ERR_INCORRECT_TRANSPOSE_CLASS_SPEC -=item VAR_MALFORMATTED_REPLACE +While parsing a ${name:y/old-class/new-class/} expression, any of the +character class specifications had a start-of-range character that was +greater (in terms of ASCII encoding) than the end-of-range character, +for expample: "[z-a]". -=item VAR_UNKNOWN_COMMAND_CHAR +=item VAR_ERR_EMPTY_TRANSPOSE_CLASS -=item VAR_INPUT_ISNT_TEXT_NOR_VARIABLE +While parsing a ${name:y/old-class/new-class/} expression, any of the +character class specifications was found to be empty. -=item VAR_UNDEFINED_VARIABLE +=item VAR_ERR_TRANSPOSE_CLASSES_MISMATCH -=item VAR_INCOMPLETE_VARIABLE_SPEC +While parsing a ${name:y/old-class/new-class/} expression, the number +of characters found in the expanded "old-class" was different than the +number of characters in new-class". -=item VAR_OUT_OF_MEMORY +=item VAR_ERR_MALFORMATTED_TRANSPOSE -=item VAR_INVALID_CONFIGURATION +Any of the "/" delimiters was missing while parsing a +${name:y/old-class/new-class/} expression. -=item VAR_INCORRECT_CLASS_SPEC +=item VAR_ERR_OFFSET_LOGIC -=item VAR_INCOMPLETE_GROUPED_HEX +The "end" offset in a ${name:o,} expression is smaller +than the "start" offset. -=item VAR_INCOMPLETE_OCTAL +=item VAR_ERR_OFFSET_OUT_OF_BOUNDS -=item VAR_INVALID_OCTAL +The "start" offset in a ${name:o,} expression is greater +than the number of characters found in $name. -=item VAR_OCTAL_TOO_LARGE +=item VAR_ERR_RANGE_OUT_OF_BOUNDS -=item VAR_INVALID_HEX +The end-of-range in a ${name:o,} or ${name:o-} +expression would be greater than the number of characters found in +$name. -=item VAR_INCOMPLETE_HEX +=item VAR_ERR_INVALID_OFFSET_DELIMITER -=item VAR_INCOMPLETE_NAMED_CHARACTER +The two numbers in an offset operation are delimited by a character +different from "," or "-". -=item VAR_INCOMPLETE_QUOTED_PAIR +=item VAR_ERR_MISSING_START_OFFSET -=item VAR_OK +The "start" offset in a ${name:o,} or +${name:o-} expression was found to be empty. + +=item VAR_ERR_EMPTY_SEARCH_STRING + +The "pattern" part of a ${name:s/pattern/replace/flags} expression was +found to be empty. + +=item VAR_ERR_MISSING_PARAMETER_IN_COMMAND + +In a ${name:+word}, ${name:-word}, or ${name:*word} expression, the +"word" part was missing -- that means empty. + +=item VAR_ERR_INVALID_REGEX_IN_REPLACE + +While compiling the "pattern" part of a +${name:s/pattern/replace/flags} expression, regcomp(3) failed with an +error. + +=item VAR_ERR_UNKNOWN_REPLACE_FLAG + +In a ${name:s/pattern/replace/flags} expression, a flag other that +"t", "i", or "g" was found. + +=item VAR_ERR_MALFORMATTED_REPLACE + +Any of the "/" delimiters was missing while parsing a +${name:s/pattern/replace/flags} expression. + +=item VAR_ERR_UNKNOWN_COMMAND_CHAR + +In a ${name:} expression, "char" did not specify any of the +supported operations. + +=item VAR_ERR_INPUT_ISNT_TEXT_NOR_VARIABLE + +At one point during parsing of the input buffer, an expression was +found that was neither verbatim text nor a variable expression. This +usually is the result of a inconsistent configuration of var_expand() +via the var_config_t paramater. + +=item VAR_ERR_UNDEFINED_VARIABLE + +Looking up a variable's contents failed and var_expand() was running +in "force expand" mode. + +=item VAR_ERR_INCOMPLETE_VARIABLE_SPEC + +The input buffer ended in the middle of a ${name} expression, or the +configured variable initializer character was found to be the last +character of the input buffer. + +=item VAR_ERR_OUT_OF_MEMORY + +var_expand() failed while malloc(3)ing internally needed buffers. + +=item VAR_ERR_INVALID_CONFIGURATION + +Any of the characters configured in the var_config_t structure as a +special ("varinit", "startdelim", "enddelim", and "escape") was found +to be a member of the "namechars" class. + +=item VAR_ERR_INCORRECT_CLASS_SPEC + +The character class specification "namechars" of the var_config_t +structure provided to var_expand was syntactically incorrect, that is, +the start-of-range was greater than end-of-range. (See also +VAR_ERR_INCORRECT_TRANSPOSE_CLASS_SPEC.) + +=item VAR_ERR_INCOMPLETE_GROUPED_HEX + +var_unescape() encountered the end of the input buffer in the middle +of a grouped-hex "\x{...}" expression. + +=item VAR_ERR_INCOMPLETE_OCTAL + +var_unescape() encountered the end of the input buffer in the middle +of an octal "\000" expression. + +=item VAR_ERR_INVALID_OCTAL + +The second of third digit of an octal "\000" expression was found not +be in the range of '0' to '7'. + +=item VAR_ERR_OCTAL_TOO_LARGE + +The value specified via an octal "\000" expression was larger than +0377. + +=item VAR_ERR_INVALID_HEX + +Any of the digits of a hex "\x00" expression was found not be in the +range of '0' to '9' or 'a' to 'b'. + +=item VAR_ERR_INCOMPLETE_HEX + +var_unescape() encountered the end of the input buffer in the middle +of a hex "\x00" expression. + +=item VAR_ERR_INCOMPLETE_NAMED_CHARACTER + +var_unescape() encountered the backslash ('\') as the last character +of the input buffer. =back