## ## OSSP xds - Extensible Data Serialization ## Copyright (c) 2001-2004 Ralf S. Engelschall ## Copyright (c) 2001-2004 The OSSP Project ## Copyright (c) 2001-2004 Cable & Wireless ## ## This file is part of OSSP xds, an extensible data serialization ## library which can be found at http://www.ossp.org/pkg/lib/xds/. ## ## Permission to use, copy, modify, and distribute this software for ## any purpose with or without fee is hereby granted, provided that ## the above copyright notice and this permission notice appear in all ## copies. ## ## THIS SOFTWARE IS PROVIDED `AS IS' AND ANY EXPRESSED OR IMPLIED ## WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF ## MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. ## IN NO EVENT SHALL THE AUTHORS AND COPYRIGHT HOLDERS AND THEIR ## CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, ## SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT ## LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF ## USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ## ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, ## OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT ## OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF ## SUCH DAMAGE. ## ## xds.pod: Unix manual page source ## =pod =head1 NAME B - eXtensible Data Serialization =head1 SYNOPSIS xds_init, xds_destroy, xds_register, xds_unregister, xds_setbuffer, xds_getbuffer, xds_encode, xds_decode, xds_vencode, xds_vdecode. =head1 DESCRIPTION The B library is generic and extensible encoding and decoding framework for the serialization of arbitrary ISO C data types. B consists of three components: the generic encoding and decoding framework, a set of shipped engines to encode and decode values in certain existing formats (Sun RPC/XDR and XDS/XML are currently provided), and a run-time context, which is used to manage buffers, registered engines, etc. The library is designed to allow fully recursive and efficient encoding/decoding of arbitrary nested data. =head2 INTRODUCTION In order to use B, the first thing the developer has to do is to create a valid context by calling xds_init(). The function requires one parameter that determines whether to operate in encoding- or decoding mode. A context can be used for encoding or decoding only; it is not possible to use the same context for both operations. Once a valid context has been obtained, the function xds_register() can be used to register an arbitrary number of encoding (or decoding) engines within the context. Two sets of engines are included in the library, additional ones can be easily programmed. These functions will handle any elementary datatype defined by the ISO-C language, such as 32-bit integers, 64-bit integers, unsigned integers (of both 32 and 64-bit), floating point numbers, strings and octet streams. Once all required encoding/decoding engines are registered, the functions xds_encode() or xds_decode() may be used to actually perform the encoding or decoding process. Any data type for which an engine has been registered before can be handled by the library. This means, it is possible for the developer to write custom engines for any data type he desires to use and to register them in the context -- as long as these engines adhere to the C interface defined in F. In particular, it is possible to register meta engines. This is an engine designed to encode or decode data types which consist of several elementary data types. Such an engine will simply re-use the existing engines to encode or decode the elements of the structure. The following example program (without error checking for simplicity) will encode the unsigned integer 0x1234 into the B format (known from Sun RPC), decode it back into the native host format, and compare the result to make sure it is the original value again: #include #include #include "xds.h" int main(int argc, char *argv[]) { xds_t *xds; xds_uint32_t uint32 = 0x1234; xds_uint32_t uint32_new; char *buffer; size_t buffer_size; /* encoding */ xds = xds_init(XDS_ENCODE); xds_register(xds, "uint32", &xdr_encode_uint32, NULL); xds_encode(xds, "uint32", uint32); xds_getbuffer(xds, XDS_GIFT, (void**)&buffer, &buffer_size); xds_destroy(xds); /* ...usually buffer is now transferred to a remote system... */ /* decoding */ xds = xds_init(XDS_DECODE); xds_register(xds, "uint32", &xdr_decode_uint32, NULL); xds_setbuffer(xds, XDS_LOAN, buffer, buffer_size); xds_decode(xds, "uint32", &uint32_new); xds_destroy(xds); /* comparison */ if (uint32 == uint32_new) printf("OK\n"); else printf("Failure\n"); return 0; } =head1 THE XDS FRAMEWORK B provides a generic framework for encoding and decoding. The corresponding API is described here. =over 4 =item xds_t *B(xds_mode_t I); This function creates and initializes a context. The I parameter may be either C or C, depending on whether you want to encode or decode data. If successful, xds_init() returns a pointer to the context. In case of failure, xds_init() returns C and sets errno to C (failed to allocate internal memory buffers) or C (I parameter was invalid). A context obtained from xds_init() must be destroyed by xds_destroy() if it is no longer needed. =item void B(xds_t *I); xds_destroy() will destroy the context I, created by xds_init(). Doing so will return all resources associated with this context -- most notably the memory used to buffer the results of encoding or decoding any values. A context may not be used after it has been destroyed. =item int B(xds_t *I, const char *I, xds_engine_t I, void *I); This function will register an engine in the provided context. An I is potentially any function that fullfils the following interface: int B(xds_t *I, void *I, void *I, size_t I, size_t *I, va_list *I); By calling xds_register(), the I will be registered under the name I in the context I. The last parameter I may be used as the user sees fit: It will be passed when the engine is actually called and may be used to implement an engine-specific context. Most engines will not need a context of their own, in which case C should be specified here. Please note that until the user calls xds_register() for a context he obtained from xds_init(), no engines are registered for that context. Even the engines included in the B source distribution are not registered automatically. For engine names, any combination of the characters a-z, A-Z, 0-9, "-", and "_" may be used; anything else is not a legal engine name component. xds_register() may return the following return codes: C (everything went fine; the engine is registered now), C (either I, I, or I are C or I contains illegal characters for an engine name), or C (failed to allocate internally required buffers). =item int B(xds_t *I, const char *I); xds_unregister() will remove the engine I from the context I. The function will return C in case everything went fine, C in case the engine I is not registered in I, or C if either I or I are C or I contains illegal characters for an engine name. =item int B(xds_t *I, xds_scope_t I, void *I, size_t I); This function allows the user to control the buffer handling: Calling it will replace the buffer currently used in I. The address and size of that buffer are passed to xds_setbuffer() via the I and I parameters. The I parameter determines for which context the new buffer will be set. Furthermore, you can set I to either C or C. C will tell B that the provided buffer is now owned by the library and that it may be resized by calling realloc(3). Furthermore, the buffer is free(3)'ed when I is destroyed. If I is C and I is C, xds_setbuffer() will simply allocate a buffer of its own to be set in I. Please note that a buffer given to the library as a gift B have been allocated using malloc(3) -- it may not live on the stack because B will try to free or to resize the buffer as it sees fit. Passing C via I tells xds_setbuffer() that the buffer is owned by the application and that B should not free nor resize the buffer in any case. In this mode, passing a buffer of C will result in an invalid-argument error. =item int B(xds_t *I, xds_scope_t I, void **I, size_t *I); This function is the counterpart to xds_setbuffer(): It will get the buffer currently used in the context I. The address of that buffer is stored in the location I points to; the length of the buffer's content will be stored in the location I points to. The I argument may be set to either C or C. The first setting means that the buffer is now owned by the application and that B must not use it after this xds_getbuffer() call anymore; it will allocate a new internal buffer instead. Of course, this also means that the buffer will not be freed by xds_destroy(); the application has to free(3) the buffer itself when it is not needed anymore. Setting I to C tells B that the application just wishes to peek into the buffer and will not modify it. The buffer is still owned (and used) by B. Please note that the loaned address returned by xds_getbuffer() may change after any other xds_xxx() function call! The function will return C (everything went fine) or C (I, I or I are C or I is invalid) signifying success or failure respectively. Please note: It is perfectly legal for xds_getbuffer() to return a buffer of C and a buffer length of C<0>! This happens when xds_getbuffer() is called for a fresh context before a buffer has been allocated at all. =item int B(xds_t *I, const char *I, va_list I); This function will encode one or several values using the appropriate encoding engines registered in the context I. The parameter I contains a sprintf(3)-alike descriptions of the values to be encoded; the actual values are provided in the varadic parameter I. The format for I is simple: Just provide the names of the engines to be used for encoding the appropriate value(s) in I. Any non-legal engine-name character may be used as a delimiter. In order to encode two 32-bit integers followed by a 64-bit integer, the format string int32 int32 int64 could be used. In case you don't like the blank, use the colon instead: int32:int32:int64 Of course the names to be used here have to match to the names used to register the engines in I earlier. Every time xds_vencode() is called, it will append the encoded data at the end of the internal buffer stored in I. Thus, you can call xds_vencode() several times in order to encode several values, but you'll still get all encoded values stored in one buffer. Calling xds_setbuffer() or xds_getbuffer() with gift semantics at any point during encoding will re-set the buffer to the beginning. All values that have been encoded into that buffer already will eventually be overwritten when xds_encode() is called again. Hence: Don't call xds_setbuffer() or xds_getbuffer() unless you actually want to access the data stored in the buffer. Also, it should be noted that the data you have to provide for I depends entirely on what the deployed engines expect to find on the stack -- there is no "standard" on what should be put on the stack here. The B and B engines included in the distribution will simply expect the value to be encoded to be found on the stack, but other engines may act differently. xds_vencode() will return any of the following return codes: C (everything worked fine), C (failed to allocate or to resize the internal buffer), C (the internal buffer is too small but is not owned by us), C (I or I are C), C (an engine name specified in I is not registered in I), C (I is initialized in decode mode), or C (the engine returned an unspecified error). =item int B(xds_t *I, const char *I, ...); This function is basically identical to xds_vencode(), only that it uses a different prototype syntax. =item int B(xds_t *I, const char *I, va_list I); This function is almost identical to xds_vencode(): It expects a context, a format string and a set of parameters for the engines, but xds_vdecode() does not encode any data, it decodes the data back into the native format. The format string determines which engines are to be called by the framework in order to decode the values contained in the buffer. The values will then be stored at the locations found in the corresponding I entry. But please note that the exact behavior of the decoding engines is not specified! The B and B engines included in this distribution expect a pointer to a location where to store the decoded value, but other engines may vary. xds_vdecode() may return any of the following return codes: C (everything went fine), C (I or I are C), C (the format string says the next value is of a particular type, but that's not what we found in the buffer), C (an engine name specified in I is not registered in I), C (I has been initialized in encode mode), C (an engine tried to read more bytes from the buffer than what is data left), or C (an engine returned an unspecified error). =item int B(xds_t *I, const char *I, ...); This function is basically identical to xds_vdecode(), only that it uses a different prototype syntax. =back =head1 THE XDR ENGINES The B distribution ships with a set of engine functions which implement the encoding and decoding for the B encoding known from Sun RPC. Function Name Expected `args' Input Output ---------------------------------------------------------- xdr_encode_uint32() xds_uint32_t 4 bytes 4 bytes xdr_decode_uint32() xds_uint32_t* 4 bytes 4 bytes xdr_encode_int32() xds_int32_t 4 bytes 4 bytes xdr_decode_int32() xds_int32_t* 4 bytes 4 bytes xdr_encode_uint64() xds_uint64_t 4 bytes 4 bytes xdr_decode_uint64() xds_uint64_t* 4 bytes 4 bytes xdr_encode_int64() xds_int64_t 4 bytes 4 bytes xdr_decode_int64() xds_int64_t* 4 bytes 4 bytes xdr_encode_float() xds_float_t 4 bytes 4 bytes xdr_decode_float() xds_float_t* 4 bytes 4 bytes xdr_encode_double() xds_double_t 8 bytes 8 bytes xdr_decode_double() xds_double_t* 8 bytes 8 bytes xdr_encode_octetstream() void*, size_t variable variable xdr_decode_octetstream() void**, size_t* variable variable xdr_encode_string() char* variable variable xdr_decode_string() char** variable variable Please note that the functions xdr_decode_octetstream() and xdr_decode_string() return a pointer to a buffer holding the decoded data. This buffer has been allocated with malloc(3) and must be free(3)'ed by the application when it is not required anymore. All other callbacks write the decoded value into the location found on the stack, but these behave differently because the length of the decoded data is not known in advance and the application cannot provide a buffer that's guaranteed to suffice. =head1 THE XML ENGINES The B distribution ships with a set of engine functions which implement the encoding and decoding for an B based format specified by the included B DTD. Function Name Expected `args' Input Output ---------------------------------------------------------------- xml_encode_uint32() xds_uint32_t 4 bytes 8-27 bytes xml_decode_uint32() xds_uint32_t* 18-27 bytes 4 bytes xml_encode_int32() xds_int32_t 4 bytes 16-26 bytes xml_decode_int32() xds_int32_t* 16-26 bytes 4 bytes xml_encode_uint64() xds_uint64_t 8 bytes 18-37 bytes xml_decode_uint64() xds_uint64_t* 18-37 bytes 8 bytes xml_encode_int64() xds_int64_t 8 bytes 16-36 bytes xml_decode_int64() xds_int64_t* 16-36 bytes 8 bytes xml_encode_float() xds_float_t 4 bytes variable xml_decode_float() xds_float_t* variable 4 bytes xml_encode_double() xds_double_t 8 bytes variable xml_decode_double() xds_double_t* variable 8 bytes xml_encode_octetstream() void*, size_t variable variable xml_decode_octetstream() void**, size_t* variable variable xml_encode_string() char* variable variable xml_decode_string() char** variable variable Please note that the functions xml_decode_octetstream() and xml_decode_string() return a pointer to a buffer holding the decoded data. This buffer has been allocated with malloc(3) and must be free(3)ed by the application when it is not required anymore. All other callbacks write the decoded value into the location found on the stack, but these behave differently because the length of the decoded data is not known in advance and the application cannot provide a buffer that's guaranteed to suffice. =head1 EXTENDING THE LIBRARY This section demonstrates how to write a "meta engine" for the B framework. The example engine will encode a complex data structure, consisting of three elementary data types. The structure is defined as follows: struct mystruct { xds_int32_t small; xds_int64_t big; xds_uint32_t positive; char text[16]; }; Some readers might wonder why the structure is defined using these weird data types rather than the familiar ones like C, C, etc. The reason is that these data types have an undefined size. An C variable will have, say, 32 bits when compiled on the average Unix machine, but when the same program is compiled on a 64-bit machine like Tru64 Unix, it will have a size of 64 bit. This is a problem when those structures have to be exchanged between entirely different systems, because the structures are binary incompatible -- something even B cannot remedy, because it is impossible to construct a bidirectional and lossless mapping in this case. In order to encode an instance of this structure, we write an encoding engine: static int encode_mystruct( xds_t *xds, void *engine_context, void *buffer, size_t buffer_size, size_t *used_buffer_size, va_list *args) { struct mystruct *ms; ms = va_arg(*args, struct mystruct*); return xds_encode(xds, "int32 int64 uint32 octetstream", ms->small, ms->big, ms->positive, ms->text, sizeof(ms->text)); } This engine takes the address of the I structure from the stack and then uses xds_encode() to handle all elements of I separately -- which is fine, because these data types are supported by B already (both by the shipped B and B engines). It is worth noting, though, that we refer to the other engines by name, meaning that these engines must be registered in I by that name before! What is very nice, though, is the fact that this encoding engine does not even need to know which particular engines are used to encode the actual values! If the user registere the B engines under the appropriate names, I will be encoded in B. If the user registers the B engines under the appropriate names, I will be encoded in B. Because of that property, we call such an engine a "meta engine". Of coures you need not necessarily implement an engine as a "meta engine": Rather than going through xds_encode(), it would be possible to execute the appropriate encoding engines directly. This had the advantage of not depending on those engines being registered at all, but it would make the custom engine depend on the elementary engines -- what is an unnecessary limitation. One more word about the engine syntax and semantics: As has been mentioned earlier, any function that adheres to the interface shown above is potentially an engine. These parameters have the following meaning: =over 4 =item I This is the B context that was originally provided to the xds_encode() call, which in turn executed the engine. It may be used, for example, for executing xds_encode() again like we did in our example engines. =item I The engine context can be used by the engine to store any type of internal information. The value the engine will receive must have been provided when the engine was registered by xds_register(). Engines obviously may neglect this parameter if they don't need a context of their own -- all engines included in the distribution do so. =item I This parameter points to the buffer the encoded data should be written to. In decoding mode, I points to the encoded data, which should be decoded; the location where the results should be stored at can be found on the stack then. =item I The number of bytes available in I. In encoding mode, this means "free space", in decoding mode, I determines how many bytes of encoded data are available in I for consumption. =item I This parameter points to a variable, which the callback must set before returning in order to let the framework know how many bytes it consumed from I. A callback encoding, say, an int32 number into a 8 bytes text representation would set the used_buffer_size to 8: *used_buffer_size = 8; In encoding mode, this variable determines how many bytes the engine has written into I; in decoding mode the variable determines how many bytes the engines has read from I. =item I This pointer points to an initialized varadic argument. Use the standard C macro va_arg(3) to fetch the actual data. =back A callback may return any of the following return codes, as defined in F: =over 4 =item C No error. =item C Failed to allocate required memory. =item C The buffer is too small to hold all encoded data. The callback may set *I to the number of bytes it needs in I, thereby giving the framework a hint by how many bytes it should enlarge the buffer before trying the engine again, but just leaving *I alone will work fine too, it may just be a bit less efficient in some cases. Obviously this return code does not make much sense in decoding mode. =item C Unexpected or incorrect parameters. =item C This return code will be returned in decoding mode in case the decoding engine realizes that the data it is decoding does not fit what it is expecting. Not all encoding formats will allow to detect this at all. B, for example, does not. =item C In decode mode, this error is be returned when an engine needs, say, 4 bytes of data in order to decode a value but I/I provides less. =item C Any other reason to fail than those listed before. Catch all... =back Let's take a look at the corresponding decoding "meta engine" now: static int decode_mystruct( xds_t *xds, void *engine_context, void *buffer, size_t buffer_size, size_t *used_buffer_size, va_list *args) { struct mystruct *ms; size_t i; char *tmp; int rc; ms = (struct mystruct *)va_arg(*args, void *); rc = xds_decode(xds, "int32 int64 uint32 octetstream", &(ms->small), &(ms->big), &(ms->positive), &tmp, &i); if (rc == XDS_OK) { if (i == sizeof(ms->text)) memmove(ms->text, tmp, i); else rc = XDS_ERR_TYPE_MISMATCH; free(tmp); } return rc; } The engine simply calls xds_decode() to handle the separate data types. The only complication is that the octet stream decoding engines return a pointer to malloc(3)ed buffer -- what is not what we need. Thus we have to manually copy the contents of that buffer into the right place in the structure and free the (now unused) buffer again. A complete example program encoding and decoding C can be found as F in the B source distribution. =head1 SEE ALSO RFC 1832: `XDR: External Data Representation Standard', R. Srinivasan, August 1995 XML-RPC Home Page: http://www.xmlrpc.org/ =head1 HISTORY B was initially written by Peter Simons Esimons@crypt.toE in August 2001 under contract with the B sponsor B. =cut