OSSP CVS Repository |
|
% -*- mode: LaTeX; fill-column: 75; -*- % % $Id: libxds.tex,v 1.18 2001/08/30 15:02:53 simons Exp $ % \documentclass[a4paper,10pt,pointlessnumbers,bibtotoc]{scrartcl} \usepackage[dvips,xdvi]{graphicx} \usepackage{fancyvrb} \typearea[2cm]{12} \fussy \begin{document} \titlehead{Cable \& Wireless Deutschland GmbH\\Application Services\\Development Team} \title{OSSP XDS ---\\Extensible Data Serialization} \author{Peter Simons $<$simons@computer.org$>$} \date{2001-08-01} \maketitle \section{Introduction} In today's networked world, computer systems of all brands and flavours communicate with each other. Unfortunately, these systems are far from being compatible: Many systems use different internal representations for the same thing. Look at the (hexadecimal) number \$1234 for instance: On a big endian machine, this number is be stored in memory the way you'd intuitively expect: \$12~\$34 --- the more significant byte preceeds the less significant one. On a little endian machine, though, the number \$1234 is be stored like this: \$34~\$12 --- exactly the other way round. As a result, you cannot just write the number \$1234 to a socket and expect the other end to understand it correctly, because if the endians differ, the reader will read a different number than the writer sent. Things will get even more complicated when you start exchanging floating point numbers, for which about a dozen different encodings exist! \begin{figure}[tbh] \begin{center} \includegraphics[width=\textwidth]{data-exchange.eps} \caption{Data exchange using XDS} \label{data exchange} \end{center} \end{figure} Solving these problems is the domain of XDS; its purpose is to encode data in a way that allows this data to be exchanged between different computer systems. Assume you'd want to transfer the value \$1234 from host A to host B. Then you would encode it using XDS, transfer the encoded data over the network, and decode the value again at the other end. Every program that follows this process will read the correct value no matter what native representation is used internally. There is a rich variety of applications for such a functionality: XDS may be used to encode data before it is written to disk or read from the disk, it may be used to encode data to be exchanged between processes over the network, etc. Because of this variety, special attention has been paid to the library design. \paragraph{The library has been designed to be extensible.} The functionality is split into a generic encoding and decoding framework and a set of encoding and decoding engines. These engines can be plugged into the framework at run-time to actually do the encoding and decoding of data. Because of this architecture, XDS can be customized to deploy any data format the developer sees fit. Included in the distribution are engines for the XDR format specified in \cite{xdr} and for the XML format specified in \cite{xml}. \paragraph{The library is convenient to use.} An arbitrary number of variables can be encoded or decoded with one single function call. All memory management is done by XDS, the developer doesn't need bother to allocate or to manage buffers for the encoded or decoded data. Automatic buffer management can be switched off at run-time, though, for maximum performance. \paragraph{Performance.} Since all transferred data has to pass through XDS, the library has been written to encode and decode with maximum performance. The generic encoding framework adds almost no run-time overhead to the actual encoding process: If non-automatic buffer management has been selected, hardly anything but the actual encoding/decoding engine is executed. \paragraph{Robustness.} In order to verify that the library works correctly, a set of regression tests is included in the distribution. These test suites will --- among other things --- encode known values and compare the result with the expected (correct) values. This ensures that XDS works correctly on any platform. \paragraph{Use of standard formats.} The supported XDR and XML formats are widely known and accepted, meaning that XDS is interoperable with other marshaling implementations: It is possible to encode data with XDS and to decode it with an entirely different XDR implementation or vice versa. \paragraph{Portability.} XDS has been written with portability in mind. Development took place on FreeBSD, Linux and Solaris; other platforms were used to test the results. It is expected that XDS will compile and function on virtually any POSIX.1-compliant system with a moderately modern ISO-C compiler. GNU's C~Compiler~(gcc) is known to compile the library just fine. For maximum portability, GNU Autoconf has been used to determine the target system's properties. \section{Architecture of XDS} \begin{figure}[htb] \begin{center} \includegraphics[width=\textwidth]{architecture.eps} \caption{Components of XDS} \label{XDS components} \end{center} \end{figure} The architecture of XDS is illustrated in figure~\ref{XDS components}. XDS consists of three components: The generic encoding and decoding framework, a set of engines to encode and decode values in a certain format, and a run-time context, which is used to manage buffers, registered engines, etc. In order to use the library, the first thing the developer has to do is to create a valid XDS context by calling \textsf{xds\_init()}. The routine requires one parameter that determines whether to operate in encoding- or decoding mode. A context can be used for encoding or decoding only; it is not possible to use the same context for both operations. Once a valid XDS context has been obtained, the routine \textsf{xds\_register()} can be used to register an arbitrary number of encoding or decoding engines within the context. Two sets of engines are included in the library. These routines will handle any elementary datatype defined by the ISO-C language, such as 32-bit integers, 64-bit integers, unsigned integers (of both 32- and 64-bit), floating point numbers, strings and octet streams. Once all required encoding/decoding engines are registered, the routines \textsf{xds\_encode()} or \textsf{xds\_\-decode()} may be used to actually perform the encoding or decoding process. Any data type for which an engine has been registered can be handled by the library. This means, that it is possible for the developer to write custom engines for any data type he desires to use and to register them in the context --- as long as these engines adhere to the \textsf{xds\_engine\_t} interface defined in \textsf{xds.h}. In particular, it is possible to register meta engines. That is an engine designed to encode or decode data types, which consist of several elementary data types. Such an engine will simply re-use the existing engines to encode or decode the elements of the structure. \section{Using the XDS library} \subsection{Encoding} The following example program will encode three variables using the XDR engines. The encoded results will be written to the standard output stream, which can be redirected to a file or piped into the decoding program described in the next section. Just take a look at the source code for a moment, we will then go on to discuss all relevant sections line by line. \begin{Verbatim}[numbers=left,fontsize=\small,frame=lines] #include <stdio.h> #include <unistd.h> #include <string.h> #include <errno.h> #include <xds.h> static void error_exit(int rc, const char* msg, ...) { va_list args; va_start(args, msg); vfprintf(stderr, msg, args); va_end(args); exit(rc); } int main() { xds_t* xds; char* buffer; size_t buffer_size; xds_int32_t int32 = -42; xds_uint32_t uint32 = 0x12345678; const char* string = "This is a test."; xds = xds_init(XDS_ENCODE); if (xds == NULL) error_exit(1, "Failed to initialize XDS context: %s\n", strerror(errno)); if (xds_register(xds, "int32", &xdr_encode_int32, NULL) != XDS_OK || xds_register(xds, "uint32", &xdr_encode_uint32, NULL) != XDS_OK || xds_register(xds, "string", &xdr_encode_string, NULL) != XDS_OK) error_exit(1, "Failed to register my encoding engines!\n"); if (xds_encode(xds, "int32 uint32 string", int32, uint32, string) != XDS_OK) error_exit(1, "xds_encode() failed!\n"); if (xds_getbuffer(xds, XDS_GIFT, (void**)&buffer, &buffer_size) != XDS_OK) error_exit(1, "getbuffer() failed.\n"); xds_destroy(xds); write(STDOUT_FILENO, buffer, buffer_size); free(buffer); fprintf(stderr, "Encoded data:\n"); fprintf(stderr, "\tint32 = %d\n", int32); fprintf(stderr, "\tuint32 = 0x%x\n", uint32); fprintf(stderr, "\tstring = \"%s\"\n", string); return 0; } \end{Verbatim} \paragraph{Lines 1--5.} The program starts by including several system headers, which define the prototypes for the routines we use. The most interesting header in our case is of course \textsf{xds.h} --- the header of XDS. Please note that all declarations required to use XDS are included in that file. \paragraph{Lines 7--13.} The \textsf{error\_exit()} routine is not relevant for the example; we just define it to make the rest of the source code shorter and easier to read. \paragraph{Lines 16--53.} This is the interesting part: The \textsf{main()} routine. This function will create the variables to be encoded on the stack, assign values to them, initialize the XDS library, use it to encode the values, and write the result of the encoding process to the standard output stream. Read on for further details. \paragraph{Lines 26--28.} First of all we have to obtain an XDS context. This is done by calling \textsf{xds\_init()}. Since we intend to \emph{encode} data, we initialize the context in encoding mode. The only other mode of operation would be decoding mode, but this is demonstrated in the next section. All other routines in XDS return a code from a small list of return codes defined in \textsf{xds.h}, but \textsf{xds\_init()} is different: It returns a pointer to an \textsf{xds\_t} in case of success and \textsf{NULL} in case of failure. One reason why \textsf{xds\_init()} would fail is because it can't allocate the memory required to initialize the context. In this case, the system variable \textsf{errno} is set to \textsf{ENOMEM}. The reason why \textsf{xds\_init()} would fail is because the mode parameter is invalid, in which case \textsf{errno} woulde be set to \textsf{EINVAL}. If XDS has been compiled with assertions enabled, such an error would result in an assertion error, terminating the program with a diagnostic message immediately. \paragraph{Lines 30--33.} Once we have obtained a valid XDS context, we register the engines we need. In this example, we'll encode a signed 32-bit integer, an unsigned 32-bit integer, and a string. We'll be using XDR encoding in this case, so the engines to register are \textsf{xdr\_encode\_int32()}, \textsf{xdr\_encode\_uint32()}, and \textsf{xdr\_encode\_string()}. (A complete list of available engines can be found in \textsf{xds.h}, in section~\ref{xdr}~and~\ref{xml}, or in the manual pages for the library.) Please note that we could switch the encoding format simply by using the corresponding \textsf{xml\_encode\_XXX()} engines here. We could even mix XDR and XML encoding as we see fit, but it's hard to think of a case where this would make sense~\dots{} The developer is free to choose a name he'd like to register an engine under, but the name may only contain alphanumerical characters, hyphens (``\verb#-#'') or underscores (``\verb#_#''). You can choose any name you want, but it is recommended to follow the naming scheme of the corresponding engine. Why this is recommended will become clear in section~\ref{meta engines}. \paragraph{Lines 35--36.} This is where the actual encoding takes place. As parameters, \textsf{xds\_encode()} requires a valid encoding context plus a format string that describes how the following parameters are to be interpreted. While the concept is obviously identical to \textsf{sprintf(3)}, the syntax is different: The format string may contain an arbitrary number of names, which are delimited by an arbitrary number of any character that is not a legal character for engine names. We recommend to delimit the names by colons or blanks. For each valid engine name in the format string, a corresponding parameter must follow in the varadic argument. How these parameters are interpreted depends on the engine you're using. The engines provided with the XDS library will expect the value to encode, but theoretically developers are free to write encoding and decoding engines that expect virtually any kind of information here. More about this will explained in section~\ref{meta engines}. \paragraph{Lines 38--39.} We have encoded all values we wanted to encode, now we can get the result from the library. This happens by calling \textsf{xds\_getbuffer()}. The routine will store the buffer's address and length at the locations we provided as parameters. Please note that we can choose whether we want the buffer as a ``gift'' (\textsf{XDS\_GIFT}) or as a ``loan'' (\textsf{XDS\_LOAN}). The buffer being a ``loan'' means that the buffer is still owned by the library --- we're only allowed to peek at it. Any call to an XDS routine may modify the buffer or even change the buffers location. Hence the result of a \textsf{xds\_getbuffer()} call with loaning semantics is only valid until the next XDS routine is called. If we ``gift'' semantics, the buffer we receive will be owned by us; the library will not touch the buffer again. This means of course, that we're responsible for \textsf{free(3)}ing the buffer when we don't need it anymore. \paragraph{Line 41.} Destroy the XDS context and all data associated with it. This is possible because we requested the buffer as ``gift''; the buffer is not associated with XDS anymore. \paragraph{Line 43.} Write the buffer with the encoded data to the standard output stream. \paragraph{Line 45.} Now that we don't need the buffer anymore, we can return the memory it uses to the system. \paragraph{Lines 47--50.} Write a short report of what we have done to the standard error channel. \bigskip Finally, let us compile and execute the example program shown above. For convenience, it is included in the distribution as \textsf{docs/encode.c}. You can compile and execute the program as follows: \begin{quote} \begin{verbatim} $ gcc -I.. encode.c -o encode -L.. -lxds $ ./encode >output Encoded data: int32 = -42 uint32 = 0x12345678 string = "This is a test." $ ls -l output -rw-r--r-- 1 simons simons 28 Aug 2 15:21 output \end{verbatim} \end{quote} The result of executing the programm --- the file \textsf{output} --- can be displayed with \textsf{hexdump(1)} or \textsf{od(1)} and should look like this: \begin{quote} \begin{Verbatim}[fontsize=\small] $ hexdump -C output 00000000 ff ff ff d6 12 34 56 78 00 00 00 0f 54 68 69 73 |.....4Vx....This| 00000010 20 69 73 20 61 20 74 65 73 74 2e 00 | is a test..| 0000001c \end{Verbatim} \end{quote} \noindent We will also re-use this file in the next section, where we'll read it and decode those values again. \subsection{Decoding} The following example program will read the result of the encoding example shown in the previous section and decode the values back into the native representation. Then it will print those values to the standard error stream so that the user can see the values are correct. Please take a look at the source now, we'll discuss all relevant details in the following paragraphs. \begin{Verbatim}[numbers=left,fontsize=\small,frame=lines] #include <stdio.h> #include <unistd.h> #include <string.h> #include <errno.h> #include <xds.h> static void error_exit(int rc, const char* msg, ...) { va_list args; va_start(args, msg); vfprintf(stderr, msg, args); va_end(args); exit(rc); } int main() { xds_t* xds; char buffer[1024]; size_t buffer_len; int rc; xds_int32_t int32; xds_uint32_t uint32; char* string; buffer_len = 0; do { rc = read(STDIN_FILENO, buffer + buffer_len, sizeof(buffer) - buffer_len); if (rc < 0) error_exit(1, "read() failed: %s\n", strerror(errno)); else if (rc > 0) buffer_len += rc; } while (rc > 0 && buffer_len < sizeof(buffer)); if (buffer_len >= sizeof(buffer)) error_exit(1, "Too much input data for our buffer.\n"); xds = xds_init(XDS_DECODE); if (xds == NULL) error_exit(1, "Failed to initialize XDS context: %s\n", strerror(errno)); if (xds_register(xds, "int32", &xdr_decode_int32, NULL) != XDS_OK || xds_register(xds, "uint32", &xdr_decode_uint32, NULL) != XDS_OK || xds_register(xds, "string", &xdr_decode_string, NULL) != XDS_OK) error_exit(1, "Failed to register my decoding engines!\n"); if (xds_setbuffer(xds, XDS_LOAN, buffer, buffer_len) != XDS_OK) error_exit(1, "setbuffer() failed.\n"); if (xds_decode(xds, "int32 uint32 string", &int32, &uint32, &string) != XDS_OK) error_exit(1, "xds_decode() failed!\n"); xds_destroy(xds); fprintf(stderr, "Decoded data:\n"); fprintf(stderr, "\tint32 = %d\n", int32); fprintf(stderr, "\tuint32 = 0x%x\n", uint32); fprintf(stderr, "\tstring = \"%s\"\n", string); free(string); return 0; } \end{Verbatim} \paragraph{Lines 1--25.} Include the required header files, define the \textsf{error\_exit()} helper function, and create the required variables on the stack. \paragraph{Lines 27--39.} These instructions will read an any number of bytes from the standard input stream --- as long as the input does not exceed the size of the \textsf{buffer} variable. In order to provide the program with the appropriate input, redirect the standard input stream to the file \textsf{output} created in the previous section or connect the encoding and decoding programs directly by a pipe. \paragraph{Lines 41-43.} Create a context for decoding the values. The semantics are identical to those described in the encoding example. \paragraph{Lines 45--48.} Register the decoding engines in the context. Please note that obviously the decoding engines must correspond to the encoding engines used to create the data we're about to process. Using, say, an XML engine to decode XDR data will at best return an error --- in the worst case, it will return incorrect results! \paragraph{Lines 50-51.} Here we do not \emph{get} a buffer from the library, we \emph{set} the buffer we've read earlier in the context for decoding. Please note that we use loan semantics in this case, not gift semantics. This is necessary because \textsf{buffer} has not been allocated by \textsf{malloc(3)} --- the variable is located on the stack. This means, that we cannot give it to XDS because XDS expects to be able to \textsf{free(3)} the buffer when the context is destroyed. Loan semantics are fine, though, all we have to do is to take care that we don't erase or modify the contents of \textsf{buffer} while XDS operates on it. The library itself will never touch the buffer in decode mode, no matter whether loan or gift semantics have been chosen. \paragraph{Lines 53--54.} Here comes the actual decoding of the buffer's content using \textsf{xds\_decode()}. The syntax is identical to \textsf{xds\_encode()}'s, the only difference is that the decoding engines do not expect values like the encoding engines did, but the location where to store the value. Thus, we pass the addresses of the appropriate variables here. If the routine returns with \textsf{XDS\_OK}, the decoded values will have been stored in those locations. It should be noted that the decoded string cannot trivially be returned this way. Instead, \textsf{xds\_decode()} will use \textsf{malloc(3)} to allocate a buffer large enough to hold the string. The address of that buffer is then stored in the pointer \textsf{string}. Of course, this means that the application has to \textsf{free(3)} the string once it's not required anymore. \paragraph{Line 56.} We don't need the context anymore, so we destroy it and thereby free all used resources. This does not affect \textsf{buffer} in any way because we used loan semantics. \paragraph{Lines 58-61.} Print the decoded values to the standard error stream for the user to take a look at them. \paragraph{Line 63.} Now that we don't need the contents of \textsf{string} anymore, we must return the buffer allocated in \textsf{xds\_decode()} to the system. \bigskip Like the encoding program described earlier, the source code to this program is included in the library distribution as \textsf{docs/decode.c}. You can compile and execute the program like this: \begin{quote} \begin{verbatim} $ gcc -I.. decode.c -o decode -L.. -lxds $ ./decode <output Decoded data: int32 = -42 uint32 = 0x12345678 string = "This is a test." \end{verbatim} \end{quote} We assume that the \textsf{output} file has been created as described in the previous section. Otherwise, you can execute both programs like this: \begin{quote} \begin{Verbatim}[fontsize=\small] $ ./encode | ./decode Encoded data: int32 = -42 uint32 = 0x12345678 string = "This is a test." Decoded data: int32 = -42 uint32 = 0x12345678 string = "This is a test." \end{Verbatim} \end{quote} \noindent This will encode and decode the values without the need for a temporary file. \section{Extending the XDS library} \label{meta engines} Now that we know how primitive data types can be encoded and decoded, let's write a ``meta engine'' that will handle complex data structures. For the example, we'll use the structure ``mystruct'', which is defined as follows: \begin{quote} \begin{verbatim} struct mystruct { xds_int32_t small; xds_int64_t big; xds_uint32_t positive; char text[16]; }; \end{verbatim} \end{quote} Some readers might wonder why the structure is defined using these weird data types rather than the familiar ones like \textsf{int}, \textsf{long}, etc. The reason is that these data types have an undefined size. An \textsf{int} variable will have, say, 32 bits when compiled on the average Unix machine, but when the same program is compiled on a 64-bit machine like TRUE64 Unix, it will have a size of 64 bit. That is a problem when those structures have to be exchanged between entirely different systems, because the structures are binary incompatible --- something even XDS cannot remedy. In order to encode an instance of this structure, we write an encoding engine: \begin{quote} \begin{verbatim} static int encode_mystruct(xds_t* xds, void* engine_context, void* buffer, size_t buffer_size, size_t* used_buffer_size, va_list* args) { struct mystruct* ms; ms = va_arg(*args, struct mystruct*); return xds_encode(xds, "int32 int64 uint32 octetstream", ms->small, ms->big, ms->positive, ms->text, sizeof(ms->text)); } \end{verbatim} \end{quote} This engine takes the address of the ``mystruct'' structure from the stack and then uses \textsf{xds\_encode()} to handle all elements of ``mystruct'' separately --- which is fine, because these data types are supperted by XDS already. It is worth noting, though, that we refer to the other engines by name, meaning that these engines must be registered in ``xds'' by that name! What is very nice, though, is the fact that this encoding engine does not even need to know which engines are used to encode the actual values! If the user registeres the XDR engines under the appropriate names, ``mystruct'' will be encoded in XDR. If the user registeres the XML engines under the appropriate names, ``mystruct'' will be encoded in XML. Because of that property, we call such an engine a ``meta engine''. Of coures you need not necessarily implement an engine as \emph{meta} engine: Rather than going through \textsf{xds\_encode()}, it would be possible to execute the appropriate encoding engines directly. This had the advantage of not depending on those engines being registered at all, but it would make the custom engine depend on the elementary engines --- what is an unnecessary limitation. One more word about the engine syntax and semantics: As has been mentioned earlier, any function that adheres to the interface shown above is potentially an engine. These parameters have the following meaning: \begin{itemize} \item xds --- This is the XDS context that was originally provided to the \textsf{xds\_encode()} call, which in turn executed the engine. It may be used, for example, for executing \textsf{xds\_en\-code()} again like we did in our example engines. \item engine\_context --- The engine context can be used by the engine to store any type of internal information. The value the engine will receive must have been provided when the engine was registered by \textsf{xds\_register()}. Engines obviously may neglect this parameter if they don't need a context of their own --- all engines included in the distribution do so. \item buffer --- This parameter points to the buffer the encoded data should be written to. In decoding mode, ``buffer'' points to the encoded data, which should be decoded; the location where the results should be stored at can be found on the stack then. \item buffer\_size --- The number of bytes available in ``buffer''. In encoding mode, this means ``free space'', in decoding mode, ``buffer\_size'' determines how many bytes of encoded data are available in ``buffer'' for consumption. \item used\_buffer\_size --- This parameter points to a variable, which the callback must set before returning in order to let the framework know how many bytes it consumed from ``buffer''. A callback encoding, say, an int32 number into a 8 bytes text representation would set the used\_buffer\_size to 8: \begin{quote} \begin{verbatim} *used_buffer_size = 8; \end{verbatim} \end{quote} In encoding mode, this variable determines how many bytes the engine has written into ``buffer''; in decoding mode the variable determines how many bytes the engines has read from ``buffer''. \item args --- This pointer points to an initialized varadic argument. Use the standard C macro \textsf{va\_arg(3)} to fetch the actual data. \end{itemize} A callback may return any of the following return codes, as defined in \textsf{xds.h}: \begin{itemize} \item XDS\_OK --- No error. \item XDS\_ERR\_NO\_MEM --- Failed to allocate required memory. \item XDS\_ERR\_OVERFLOW --- The buffer is too small to hold all encoded data. The callback may set ``*used\_buffer\_size'' to the number of bytes it needs in ``buffer'', thereby giving the framework a hint by how many bytes it should enlarge the buffer before trying the engine again, but just leaving ``*used\_buffer\_size'' alone will work fine too, it may just be a bit less efficient in some cases. Obviously this return code does not make much sense in decoding mode. \item XDS\_ERR\_INVALID\_ARG --- Unexpected or incorrect parameters. \item XDS\_ERR\_TYPE\_MISMATCH --- This return code will be returned in decoding mode in case the decoding engine realizes that the data it is decoding does not fit what it is expecting. Not all encoding formats will allow to detect this at all. XDR, for example, does not. \item XDS\_ERR\_UNDERFLOW --- In decode mode, this error is be returned when an engine needs, say, 4 bytes of data in order to decode a value but ``buffer''/''buffer\_size'' provides less. \item XDS\_ERR\_UNKNOWN --- Any other reason to fail than those listed before. Catch all~\dots{} \end{itemize} Let's take a look at the corresponding decoding engine now: \begin{quote} \begin{verbatim} static int decode_mystruct(xds_t* xds, void* engine_context, void* buffer, size_t buffer_size, size_t* used_buffer_size, va_list* args) { struct mystruct* ms; size_t i; char* tmp; int rc; ms = va_arg(*args, struct mystruct*); rc = xds_decode(xds, "int32 int64 uint32 octetstream", &(ms->small), &(ms->big), &(ms->positive), &tmp, &i); if (rc == XDS_OK) { if (i == sizeof(ms->text)) memmove(ms->text, tmp, i); else rc = XDS_ERR_TYPE_MISMATCH; free(tmp); } return rc; } \end{verbatim} \end{quote} The engine simply calls \textsf{xds\_decode()} to handle the separate data types. The only complication is that the octet stream decoding engines return a pointer to \textsf{malloc(3)}ed buffer --- what is not what we need. Thus we have to manually copy the contents of that buffer into the right place in the structure and free the (now unused) buffer again. A complete example program encoding and decoding ``mystruct'' can be found at \textsf{docs/\-extended.c} in the distribution. \section{The XDS Framework} \label{xds} \subsection{xds\_t* xds\_init(xds\_mode\_t~\underline{mode});} This routine creates and initializes a context for use with the XDS library. The ``mode'' parameter may be either \textsf{XDS\_ENCODE} or \textsf{XDS\_DECODE}, depending on whether you want to encode or to decode data. If successful, \textsf{xds\_init()} returns a pointer to the XDS context structure. In case of failure, \textsf{xds\_init()} returns \textsf{NULL} and sets \textsf{errno} to ENOMEM (failed to allocate internal memory buffers) or EINVAL (``mode'' parameter was invalid). A context obtained from \textsf{xds\_init()} must be destroyed by \textsf{xds\_destroy()} when it is not needed any more. \subsection{void xds\_destroy(xds\_t*~\underline{xds});} \textsf{xds\_destroy()} will destroy an XDS context created by \textsf{xds\_init()}. Doing so will return all resources associated with this context --- most notably the memory used to buffer the results of encoding or decoding any values. A context may not be used after it has been destroyed. \subsection{int xds\_register(xds\_t*~\underline{xds}, const~char*~\underline{name}, xds\_engine\_t~\underline{engine}, void*~\underline{engine\_context});} This routine will register an engine in the provided XDS context. An ``engine'' is potentially any function that fullfils the following interface: \begin{quote} \begin{verbatim} int engine(xds_t* xds, void* engine_context, void* buffer, size_t buffer_size, size_t* used_buffer_size, va_list* args); \end{verbatim} \end{quote} By calling \textsf{xds\_register()}, the engine ``engine'' will be registered under the name ``name'' in the XDS context ``xds''. The last parameter ``engine\_context'' may be used as the user sees fit: It will be passed when the engine is actually called and may be used to implement an engine-specific context. Most engines will not need a context of their own, in which case \textsf{NULL} should be specified here. Please note that until the user calls \textsf{xds\_register()} for an XDS context he obtained from \textsf{xds\_init()}, no engines are registered for that context. Even the engines included in the library distribution are not registered automatically. For engine names, any combination of the characters ``a--z'', ``A--Z'', ``0--9'', ``-'', and ``\_'' may be used; anything else is not a legal engine name component. \textsf{xds\_register()} may return the following return codes: \textsf{XDS\_OK} (everything went fine; the engine is registered now), \textsf{XDS\_ERR\_INVALID\_ARG} (either ``xds'', ``name'', or ``engine'' are \textsf{NULL} or ``name'' contains illegal characters for an engine name), or \textsf{XDS\_ERR\_NO\_MEM} (failed to allocate internally required buffers). \subsection{int xds\_unregister(xds\_t*~\underline{xds}, const~char*~\underline{name});} \textsf{xds\_unregister()} will remove the engine ``name'' from XDS context ``xds''. The function will return \textsf{XDS\_OK} in case everything went fine, \textsf{XDS\_ERR\_UNKNOWN\_ENGINE} in case the engine ``name'' is not registered in ``xds'', or \textsf{XDS\_ERR\_INVALID\_ARG} if either ``xds'' or ``name'' are \textsf{NULL} or ``name'' contains illegal characters for an engine name. \subsection{int xds\_setbuffer(xds\_t*~\underline{xds}, xds\_scope\_t~\underline{flag}, void*~\underline{buffer}, size\_t~\underline{buffer\_len});} \begin{figure}[tbh] \begin{center} \includegraphics[width=\textwidth]{setbuffer-logic.eps} \caption{xds\_setbuffer() modes of operation} \label{setbuffer logic} \end{center} \end{figure} This routine allows the user to control XDS' buffer handling: Calling it will replace the buffer currently used in ``xds''. The address and size of that buffer are passed to \textsf{xds\_setbuffer()} via the ``buffer'' and ``buffer\_len'' parameters. The ``xds'' parameter determines for which XDS context the new buffer will be set. Furthermore, you can set ``flag'' to either \textsf{XDS\_GIFT} or \textsf{XDS\_LOAN}. \textsf{XDS\_GIFT} will tell XDS that the provided buffer is now owned by the library and that it may be resized by calling \textsf{realloc(3)}. Furthermore, the buffer is \textsf{free(3)}ed when ``xds'' is destroyed. If ``flag'' is \textsf{XDS\_GIFT} and ``buffer'' is \textsf{NULL}, \textsf{xds\_setbuffer()} will simply allocate a buffer of its own to be set in ``xds''. Please note that a buffer given to XDS as gift \emph{must} have been allocated using \textsf{malloc(3)} --- it may not live on the stack because XDS will try to free or to resize the buffer as it sees fit. Passing \textsf{XDS\_LOAN} via ``flag'' tells \textsf{xds\_setbuffer()} that the buffer is owned by the application and that XDS should not free nor resize the buffer in any case. In this mode, passing a buffer of \textsf{NULL} will result in an invalid-argument error. \subsection{int xds\_getbuffer(xds\_t*~\underline{xds}, xds\_scope\_t~\underline{flag}, void**~\underline{buffer}, size\_t*~\underline{buffer\_len});} This routine is the counterpart to \textsf{xds\_setbuffer()}: It will get the buffer currently used in the XDS context ``xds''. The address of that buffer is stored in the location ``buffer'' points to; the length of the buffer's content will be stored in the location ``buffer\_len'' points to. The ``flag'' argument may be set to either \textsf{XDS\_GIFT} or \textsf{XDS\_LOAN}. The first setting means that the buffer is now owned by the application and that XDS must not use it after this \textsf{xds\_getbuffer()} call anymore; the library will allocate a new internal buffer instead. Of course, this also means that the buffer will not be freed by \textsf{xds\_destroy()}; the application has to \textsf{free(3)} the buffer itself when it is not needed anymore. Setting ``flag'' to \textsf{XDS\_LOAN} tells XDS that the application just wishes to peek into the buffer and will not modify it. The buffer is still owned (and used) by XDS. Please note that the loaned address returned by \textsf{xds\_getbuffer()} may change after any other \textsf{xds\_xxx()} function call! The routine will return \textsf{XDS\_OK} (everything went fine) or \textsf{XDS\_ERR\_INVALID\_ARG} (``xds'', ``buffer'' or ``buffer\_len'' are \textsf{NULL} or ``flag'' is invalid) signifying success or failure respectively. Please note: It is perfectly legal for \textsf{xds\_getbuffer()} to return a buffer of \textsf{NULL} and a buffer length of 0! This happens when \textsf{xds\_getbuffer()} is called for an XDS context before a buffer has been allocated. \subsection{int xds\_vencode(xds\_t*~\underline{xds}, const~char*~\underline{fmt}, va\_list~\underline{args});} This routine will encode one or several values using the appropriate encoding engines registered in XDS context ``xds''. The parameter ``fmt'' contains a \textsf{sprintf(3)}-alike descriptions of the values to be encoded; the actual values are provided in the varadic parameter ``args''. The format for ``fmt'' is simple: Just provide the names of the engines to be used for encoding the appropriate value(s) in ``args''. Any non-legal engine-name character may be used as a delimiter. In order to encode two 32-bit integers followed by a 64-bit integer, the format string \begin{quote} \begin{verbatim} int32 int32 int64 \end{verbatim} \end{quote} could be used. In case you don't like the blank, use the colon instead: \begin{quote} \begin{verbatim*} int32:int32:int64 \end{verbatim*} \end{quote} Of course the names to be used here have to match to the names used to register the engines in ``xds'' earlier. Every time \textsf{xds\_vencode()} is called, it will append the encoded data at the end of the internal buffer stored in ``xds''. Thus, you can call \textsf{xds\_vencode()} several times in order to encode several values, but you'll still get all encoded values stored in one buffer. Calling \textsf{xds\_setbuffer()} or \textsf{xds\_getbuffer()} with gift semantics at any point during encoding will re-set the buffer to the beginning. All values that have been encoded into that buffer already will eventually be overwritten when \textsf{xds\_encode()} is called again. Hence: Don't call \textsf{xds\_setbuffer()} or \textsf{xds\_getbuffer()} unless you actually want to access the data stored in the buffer. Also, it should be noted that the data you have to provide for ``args'' depends entirely on what the deployed engines expect to find on the stack --- there is no ``standard'' on what should be put on the stack here. The XML and XDR engines included in the distribution will simply expect the value to be encoded to be found on the stack, but other engines may act differently. See section~\ref{meta engines} for an example of such an engine. \textsf{xds\_vencode()} will return any of the following return codes: \textsf{XDS\_OK} (everything worked fine), \textsf{XDS\_ERR\_NO\_MEM} (failed to allocate or to resize the internal buffer), \textsf{XDS\_ERR\_OVER\-FLOW} (the internal buffer is too small but is not owned by us), \textsf{XDS\_ERR\_INVALID\_ARG} (``xds'' or ``fmt'' are \textsf{NULL}), \textsf{XDS\_ERR\_UNKNOWN\_ENGINE} (an engine name specified in ``fmt'' is not registered in ``xds''), \textsf{XDS\_ERR\_INVALID\_MODE} (``xds'' is initialized in decode mode), or \textsf{XDS\_ERR\_UNKNOWN} (the engine returned an unspecified error). \subsection{int xds\_encode(xds\_t*~\underline{xds}, const~char*~\underline{fmt}, \dots{});} This routine is basically identical to \textsf{xds\_vencode()}, only that it uses a different prototype syntax. \subsection{int xds\_vdecode(xds\_t*~\underline{xds}, const~char*~\underline{fmt}, va\_list~\underline{args});} This routine is almost identical to \textsf{xds\_vencode()}: It expects an XDS context, a format string and a set of parameters for the engines, but \textsf{xds\_vdecode()} does not encode any data, it decodes the data back into the native format. The format string determines which engines are to be called by the framework in order to decode the values contained in the buffer. The values will then be stored at the locations found in the corresponding ``args'' entry. But please note that the exact behavior of the decoding engines is not specified! The XML and XDR engines included in this distribution expect a pointer to a location where to store the decoded value, but other engines may vary. \textsf{xds\_vdecode()} may return any of the following return codes: \textsf{XDS\_OK} (everything went fine), \textsf{XDS\_ERR\_INVALID\_ARG} (``xds'' or ``fmt'' are \textsf{NULL}), \textsf{XDS\_ERR\_TYPE\_MISMATCH} (the format string says the next value is of type $A$, but that's not what we found in the buffer), \textsf{XDS\_ERR\_UNKNOWN\_ENGINE} (an engine name specified in ``fmt'' is not registered in ``xds''), \textsf{XDS\_ERR\_INVALID\_MODE} (``xds'' has been initialized in encode mode), \textsf{XDS\_ERR\_UNDER\-FLOW} (an engine tried to read $n$ bytes from the buffer, but we don't have that much data left), or \textsf{XDS\_ERR\_UNKNOWN} (an engine returned an unspecified error). \subsection{int xds\_decode(xds\_t*~\underline{xds}, const~char*~\underline{fmt}, \dots{});} This routine is basically identical to \textsf{xds\_vdecode()}, only that it uses a different prototype syntax. \section{The XDR Engines} \label{xdr} \begin{tabular}{|c|c|c|c|} \hline \bf Function Name & \bf Expected ``args'' Datatype & \bf Input & \bf Output \\ \hline xdr\_encode\_uint32() & xds\_uint32\_t & 4 bytes & 4 bytes \\ xdr\_decode\_uint32() & xds\_uint32\_t* & 4 bytes & 4 bytes \\[1ex] xdr\_encode\_int32() & xds\_int32\_t & 4 bytes & 4 bytes \\ xdr\_decode\_int32() & xds\_int32\_t* & 4 bytes & 4 bytes \\[1ex] xdr\_encode\_uint64() & xds\_uint64\_t & 4 bytes & 4 bytes \\ xdr\_decode\_uint64() & xds\_uint64\_t* & 4 bytes & 4 bytes \\[1ex] xdr\_encode\_int64() & xds\_int64\_t & 4 bytes & 4 bytes \\ xdr\_decode\_int64() & xds\_int64\_t* & 4 bytes & 4 bytes \\[1ex] xdr\_encode\_float() & xds\_float\_t & 4 bytes & 4 bytes \\ xdr\_decode\_float() & xds\_float\_t* & 4 bytes & 4 bytes \\[1ex] xdr\_encode\_double() & xds\_double\_t & 8 bytes & 8 bytes \\ xdr\_decode\_double() & xds\_double\_t* & 8 bytes & 8 bytes \\[1ex] xdr\_encode\_octetstream() & void*, size\_t & variable & variable \\ xdr\_decode\_octetstream() & void**, size\_t* & variable & variable \\[1ex] xdr\_encode\_string() & char* & variable & variable \\ xdr\_decode\_string() & char** & variable & variable \\ \hline \end{tabular} \medskip Please note that the routines \textsf{xdr\_decode\_octetstream()} and \textsf{xdr\_decode\_string()} return a pointer to a buffer holding the decoded data. This buffer has been allocated with \textsf{malloc(3)} and must be \textsf{free(3)}ed by the application when it is not required anymore. All other callbacks write the decoded value into the location found on the stack, but these behave differently because the length of the decoded data is not known in advance and the application cannot provide a buffer that's guaranteed to suffice. \section{The XML Engines} \label{xml} \begin{tabular}{|c|c|c|c|} \hline \bf Function Name & \bf Expected ``args'' Datatype & \bf Input & \bf Output \\ \hline xml\_encode\_uint32() & xds\_uint32\_t & 4 bytes & 18--27 bytes \\ xml\_decode\_uint32() & xds\_uint32\_t* & 18--27 bytes & 4 bytes \\[1ex] xml\_encode\_int32() & xds\_int32\_t & 4 bytes & 16--26 bytes \\ xml\_decode\_int32() & xds\_int32\_t* & 16--26 bytes & 4 bytes \\[1ex] xml\_encode\_uint64() & xds\_uint64\_t & 8 bytes & 18--37 bytes \\ xml\_decode\_uint64() & xds\_uint64\_t* & 18--37 bytes & 8 bytes \\[1ex] xml\_encode\_int64() & xds\_int64\_t & 8 bytes & 16--36 bytes \\ xml\_decode\_int64() & xds\_int64\_t* & 16--36 bytes & 8 bytes \\[1ex] xml\_encode\_float() & xds\_float\_t & 4 bytes & variable \\ xml\_decode\_float() & xds\_float\_t* & variable & 4 bytes \\[1ex] xml\_encode\_double() & xds\_double\_t & 8 bytes & variable \\ xml\_decode\_double() & xds\_double\_t* & variable & 8 bytes \\[1ex] xml\_encode\_octetstream() & void*, size\_t & variable & variable \\ xml\_decode\_octetstream() & void**, size\_t* & variable & variable \\[1ex] xml\_encode\_string() & char* & variable & variable \\ xml\_decode\_string() & char** & variable & variable \\ \hline \end{tabular} \medskip Please note that the routines xml\_decode\_octetstream() and xml\_decode\_string() return a pointer to a buffer holding the decoded data. This buffer has been allocated with \textsf{malloc(3)} and must be \textsf{free(3)}ed by the application when it is not required anymore. All other callbacks write the decoded value into the location found on the stack, but these behave differently because the length of the decoded data is not known in advance and the application cannot provide a buffer that's guaranteed to suffice. \section{Frequently Asked Questions} \subsection{Why do we have separate encoding and decoding modes?} Some users complained about having to maintain separate XDS contexts for encoding and decoding. They wondered, why it is not possible to encode and decode with a single XDS context. The reason is that this limitatiton makes the XDS context structure and the programmer's API for XDS much simpler. If we were able to use a single context for encoding and decoding, we had to maintain \emph{two} lists of registered engines per XDS context: One set of encoding engines and one set of decoding engines. Consequently, the \textsf{xds\_register()} function would need to take an additional parameter, which determines whether you're registering an encoding or an decoding engine. All this is not necessary in the current design, because one list of registered engines suffices. Another important topic is buffer management. The buffer handling in encoding mode is subtly different from that in encoding mode: The XDS context contains a buffer, the size of that buffer and a kind of ``current position'' pointer. When an engine stores, say, 8 bytes of encoded data in the buffer, \textsf{xds\_vencode()} will increase the current position by 8 bytes --- the next encoding engine will append its encoded data at the end of the buffer. If the current position reaches the end of the buffer, the buffer is reallocated with an appropriately bigger size. In decoding mode, the same variables in the XDS context have a different meaning: Since the buffer is never going to be resized, the buffer size does not correspond to the size of the memory chunk that constitutes the buffer, it says how many bytes of information the buffer contains; it's the length of the contents. The current position is initialized to the beginning of the buffer and every time an engine claims to have decoded, say, 8 bytes from the buffer, the current position is increased by 8 bytes towards the end of the buffer. If the current position reaches the end of the buffer's contents, an \textsf{XDS\_UNDERFLOW} error is returned. Buffer handling is different in encoding and decoding mode in so far as that in encoding mode, the initial buffer is empty and the current position moves with the end of the content, determining where new data should be appended. In decoding mode, the initial buffer is filled and the current position wanders from the beginning to the end of the content. Thus, if an XDS context should be used for both encoding and decoding, the library had to manage two different buffers because the encoding and decoding buffers have different semantics. Thus, the \textsf{xds\_setbuffer()} and \textsf{xds\_getbuffer()} routines would need an additional parameter in order to set the two buffers independently. Considering all that, we found that the current design greatly reduces the complexity of the implementation and of the API while putting the user only through minimal inconvenience. \subsection{What are those xds\_int-something types good for?} \label{xds int stuff} The XDS library uses the data types \textsf{xds\_int32\_t}, etc.\ rather than \textsf{int}. This is necessary because we need to have a definive size for each data type. In ISO-C, though, the actual size of an \textsf{int} is undefined. In theory, the system header \textsf{sys/types.h} defines types with fixed sizes, but unfortunately the names of these data types vary from vendor to vendor. To solve that, we defined our own data types. The application programmer might want to take a look at the first few lines of the \textsf{xds.h} include file to see how the actual data types are mapped to the \textsf{xds\_xxx\_t} variant on your system. \subsection{Why do I have to register all the engines manually?} One idea that came up during the design of the API was to provide a way to register all elementary XML or XDR engines with a single function call, something like this: % \begin{quote} \begin{verbatim} xds = xds_init(XDS_ENCODE, XDS_XML); /* Use the XML engines. */ xds = xds_init(XDS_ENCODE, XDS_XDR); /* Use the XDR engines. */ \end{verbatim} \end{quote} % The advantage of this approach is that the application developer does not need to bother about registering some obscure functions like \textsf{xdr\_encode\_octetstream()}. We dismissed the idea nonetheless for the following reasons: \begin{itemize} \item Since the library is meant to be extensible, the \textsf{xds\_init()} has no (good) way of knowing which engines actually exist for an encoding scheme. Suppose someone writes a whole set of engines that implement the CORBA format, then he would not be able to register his engines without re-writing \textsf{xds\_init()}. \item On a similar notion, \textsf{xds\_init()} would not know about the meta engines required by the application developer. The call outlined above would only register the engines for the elementary data types; the meta engines do not even ``exist'' when the XDS library is compiled. \item This approach would make it hard to mix engines from different formats. If all engines are registered manually, the application programmer may chose to use the XDR format for encoding all kinds of integers, but to use the XML format for encoding strings, octet streams, or floating point numbers. \item If one routine would reference all engines of an encoding format, it meant that all engines of that format were linked into the binary once the application accesses that routine. It would not be possible to, say, register the engines dealing with integers without pulling the floating point engines into the program too --- even though nobody uses them. The author of this document wishes to remark, though, that this property of the library was later uh~\dots{} removed by the decision of the team leader to merge all engines into one source module per format. Sorry. \end{itemize} \begin{thebibliography}{xxx} \bibitem{xdr} RFC 1832: ``XDR: External Data Representation Standard'', R.~Srinivasan, August~1995 \bibitem{xml} XML-RPC Home Page: \textsf{http://www.xmlrpc.org/} \end{thebibliography} \end{document}