Index: ossp-pkg/xds/docs/libxds.tex RCS File: /v/ossp/cvs/ossp-pkg/xds/docs/libxds.tex,v co -q -kk -p'1.1' '/v/ossp/cvs/ossp-pkg/xds/docs/libxds.tex,v' | diff -u /dev/null - -L'ossp-pkg/xds/docs/libxds.tex' 2>/dev/null --- ossp-pkg/xds/docs/libxds.tex +++ - 2025-04-04 22:44:19.842681799 +0200 @@ -0,0 +1,571 @@ +% -*- mode: LaTeX; fill-column: 75; -*- +% +% $Id$ +% +\documentclass[a4paper,10pt,pointlessnumbers,bibtotoc]{scrartcl} +\usepackage[dvips,xdvi]{graphicx} +\usepackage{fancyvrb} +\typearea[2cm]{12} +\fussy + +\begin{document} + +\subject{Cable \& Wireless Application Development} +\title{XDS -- eXtensible Data Serialization} +\author{Peter Simons $<$simons@computer.org$>$} +\date{2001-08-01} +\maketitle + +\section{Introduction} + +In today's networked world, computer systems of all brands and flavours +communicate with each other. Unfortunately, these systems are far from +being identical: Many systems use different internal representations for +the same thing. Look at the (hexadecimal) number \$1234 for instance: On a +big endian machine, this number will be stored in memory the way you'd +intuitively expect: \$12~\$34 --- the more significant byte preceeds the +less significant one. On a little endian machine, though, the number \$1234 +will be stored like this: \$34~\$12 --- exactly the other way round. + +As a result, you cannot just write the number \$1234 to a socket and expect +the other end to understand it correctly, because if the endians differ, +the reader will read a different number than the writer sent. Things will +get even more complicated when you start exchanging floating point numbers, +for which about a dozen different encodings exist! + +Solving these problems is the domain of libxds; its purpose is to encode +data in a way that allows this data to be exchanged between computer +systems of different types. Assume you'd want to reliably transfer the +value \$1234 from host A to host B. Then you would encode the value using +libxds, transfer the encoded data via the network, and decode the value +again at the other end. Every application that follows this process will +read the correct value no matter what native representation its hosting +platform uses internally. + +\begin{figure}[tbh] + \begin{center} + \includegraphics[width=\textwidth]{data-exchange.eps} + \caption{Data exchange using libxds} + \label{data exchange} + \end{center} +\end{figure} + +There is a rich variety of applications for such a functionality: libxds +may be used to encode data before it is written to disk or read from the +disk, it may be used to encode data to be exchanged between processes over +the network, etc. Because of this variety, special attention has been paid +to the library design. + +\paragraph{The library has been designed to be extensible.} +The functionality is split into a generic encoding and decoding framework +and a set of formatting engines. These engines can be plugged into the +framework at run-time to actually encode and decode data. Because of this +architecture, libxds can be customized to deploy any data format the +developer sees fit. Included in the distribution are formatting engines for +the XDR format specified in \cite{xdr} and for the XML format specified in +\cite{xml}. + +\paragraph{The library is convenient to use.} +An arbitrary number of variables can be encoded or decoded with one single +function call. All memory management is done by libxds, the developer +doesn't need bother to allocate or to manage buffers for the encoded or +decoded data. Automatic buffer management can be switched off at run-time, +though, for maximum performance. + +\paragraph{Performance.} +Since all transferred data has to wander through libxds, the library has +been written to encode and decode with maximim performance. The generic +encoding framework adds almost no run-time overhead to the encoding +process. If non-automatic buffer management has been selected, hardly +anything but the actual formatting engines is executed. + +\paragraph{Robustness.} +In order to verify that the library is working correctly, a set of +regression tests is included in the distribution. The test suits will --- +among other things --- encode known values and compare the result with the +expected (correct) values. This ensures that libxds works correctly on any +platform. + +\paragraph{Use standard formats.} +The supported formats XDR and XML and widely known and accepted formats, +which are most likely interoperable with other marshaling implementations. +For XDR for example, it would be possible to encode data with libxds and to +decode it with an entirely different XDR implementation or vice versa. + +\paragraph{Portability.} +libxds has been written with portability in mind. Development took place on +FreeBSD, Linux and Solaris; other platforms has been used to test the +results. It is expected that libxds will compile and function on virtually +any POSIX.1-compilant system with a moderately modern ISO-C compiler. Gnu's +CC (gcc) is known to compile the library just fine. For maximum +portability, GNU autoconf has been used to determine the target system's +properties. + +\section{Architecture of libxds} + +\begin{figure}[htb] + \begin{center} + \includegraphics[width=\textwidth]{architecture.eps} + \caption{Components of libxds} + \label{libxds components} + \end{center} +\end{figure} + +The architecture of libxds is illustrated in figure~\ref{libxds +components}. libxds consists of three components: The generic encoding and +decoding framework, a set of formatting engines to encode and decode values +in a certain forman, and a run-time context, which is used to manage +buffers, registered engines, etc. + +In order to use the library, the first thing the developer has to do is to +create a valid XDS context by calling {\sf xds\_init()}. The routine +requires one parameter that determines whether to operate in encoding- or +decoding mode. A context can be used for encoding or decoding only; it is +not possible to use the same context for both operations. Once a valid XDS +context has been obtained, the routine {\sf xds\_register()} can be used to +register an arbitrary number of formatting engines within the context. + +A set of formatting engines has been included in the library. These +routines will handle any elementary datatype included in the ISO-C language +such as 32-bit integers, 64-bit integers, unsigned integers (of both 32- +and 64-bit), floating point numbers, strings and octet streams. + +Once all required formatting engines are registered, the routines {\sf +xds\_encode()} or {\sf xds\_\-decode()} may be used to actually perform the +encoding or decoding process. Any data type for which a formatting engine +has been registered can be handled by the library. + +This means, that it is possible for the developer to write custom +formatting engines for any data type he desires to use and to register them +in the context as long as these engines adhere to the {\sf xds\_engine\_t} +interface defined in {\sf xds.h}. + +In particular it is possible to register meta formatting engines. That is a +formatting engine designed to encode or decode structures --- data types +which consist of several elementary data types. The formatting engine for +such a structure will simply re-use the existing engines in order to encode +or decode the whole structure. The clou here is that the meta engine +doesn't even need to know \emph{which} low-level formatting engines are +registered in order to use them. Hence, a meta engine may format the whole +structure in XDR, XML, or any other format without needing to know anything +about the details. + +This topic is addressed in great detail in section~\ref{meta engines} of +this document, but before we come to that rather advanced topic, let us +start by studying two simple examples of how data is encoded and decoded +using libxds. + +\section{Using the XDS library} + +\subsection{Encoding} + +The following example program will encode three variables using the XDR +formatting engines. The result of the process will then be written to the +standard output stream, which can be redirected to a file or piped into the +decoding program described in the next section. Just take a look at the +source code for a moment, we will then go on to discuss all relevant +sections line by line. + +\begin{Verbatim}[numbers=left,fontsize=\small,frame=lines] +#include +#include +#include +#include +#include + +static void error_exit(int rc, const char* msg, ...) + { + va_list args; + va_start(args, msg); + vfprintf(stderr, msg, args); + va_end(args); + exit(rc); + } + +int main() + { + xds_t* xds; + char* buffer; + size_t buffer_size; + + xds_int32_t int32 = -42; + xds_uint32_t uint32 = 0x12345678; + const char* string = "This is a test."; + + xds = xds_init(XDS_ENCODE); + if (xds == NULL) + error_exit(1, "Failed to initialize XDS context: %s\n", strerror(errno)); + + if (xds_register(xds, "int32", &xdr_encode_int32, NULL) != XDS_OK || + xds_register(xds, "uint32", &xdr_encode_uint32, NULL) != XDS_OK || + xds_register(xds, "string", &xdr_encode_string, NULL) != XDS_OK) + error_exit(1, "Failed to register my encoding engines!\n"); + + if (xds_encode(xds, "int32 uint32 string", int32, uint32, string) != XDS_OK) + error_exit(1, "xds_encode() failed!\n"); + + if (xds_getbuffer(xds, XDS_GIFT, (void**)&buffer, &buffer_size) != XDS_OK) + error_exit(1, "getbuffer() failed.\n"); + + xds_destroy(xds); + + write(STDOUT_FILENO, buffer, buffer_size); + + free(buffer); + + fprintf(stderr, "Encoded data:\n"); + fprintf(stderr, "\tint32 = %d\n", int32); + fprintf(stderr, "\tuint32 = 0x%x\n", uint32); + fprintf(stderr, "\tstring = \"%s\"\n", string); + + return 0; + } +\end{Verbatim} + +\paragraph{Lines 1--5.} +The program starts by including several system headers, which define the +prototypes for some routines we use. The most interesting header in our +case is of course {\sf xds.h} --- the header of libxds. Please note that +all declarations required to use libxds are included in that file. + +\paragraph{Lines 7--13.} +The {\sf error\_exit()} routine is not relevant for the example; we just +define it to make the rest of the source code shorter and easier to read. + +\paragraph{Lines 16--53.} +This is the interesting part: The {\sf main()} routine. This function will +create the variables to be encoded on the stack, assign values to them, +initialize the XDS library, use it to encode the values, and write the +result of the encoding process to the standard output stream. Read on for +further details. + +\paragraph{Lines 26--28.} +First of all we have to obtain a XDS context for all further operation. +This is done by calling {\sf xds\_init()}. Since we intend to \emph{encode} +data, we initialize the context in encoding mode. The only other mode of +operation would be decoding mode, but this is demonstrated in the next +section. + +All routines in libxds return a code from a small list of return codes +defined in {\sf xds.h}, but {\sf xds\_init()} is different: It will return +a pointer to an {\sf xds\_t} in case of success and {\sf NULL} in case of +failure. One reason why {\sf xds\_init()} would fail is because it can't +allocate the memory required to initialize the context. In this case, the +system variable {\sf errno} is set to {\sf ENOMEM}. Another reason why {\sf +xds\_init()} would fail is because the mode parameter is invalid, in which +case {\sf errno} woulde be set to {\sf EINVAL}. If libxds has been compiled +with assertions enabled, such an error would result in an assertion error, +terminating the program with a diagnostic message immediately. + +\paragraph{Lines 30--33.} +Once we have obtained a valid XDS context, we register the formatting +engines we need. In this example, we'll encode a signed 32-bit integer, an +unsigned 32-bit integer, and a string. We'll be using XDR encoding in this +case, so the engines to register are {\sf xdr\_encode\_int32()}, {\sf +xdr\_encode\_uint32()}, and {\sf xdr\_encode\_string()}. (A complete list +of available formatting engines can be found in {\sf xds.h} or in +section~\ref{xdr}~and~\ref{xml}. Please note that we could switch the +deployed encoding format simply be using the corresponding {\sf +xml\_encode\_XXX()} engines here. We could even mix XDR and XML encoding as +we see fit but it's hard to think of a case where this would make sense. + +As you can see in the code, the developer is free to choose a name he'd +like to register the engine under. These names may only contain +alphanumerical characters plus the hyphon (``\verb#-#'') and the underscore +(``\verb#_#''). You can choose any name you want, but it is recommended to +follow the naming scheme of the corresponding formatting engine. Why this +is recommended will be seen in section~\ref{meta engines}. + +\paragraph{Lines 35--36.} +This is the place where the actual encoding takes place. As parameters, +{\sf xds\_encode()} requires a valid encoding context plus a format string +that describes how the following parameters are to be interpreted. While +the concept is obviously identical to {\sf sprintf()}, the syntax is +different. The format string may contain an arbitrary number of names, +which are delimited by an arbitrary number of any character that is not a +legal character for engine names. Thus you can delimit the names by colons, +blanks, or whatever you like. + +For each valid engine name in the format string, a corresponding parameter +must follow. What these parameters mean depends on the engine you're using. +The engines provided with the libxds library will expect the value to +encode, but theoretically developers are free to write formatting engines +that expect virtually any kind of information here. More about this will +explained in section~\ref{meta engines}. + +\paragraph{Lines 38--39.} +We have encoded all values we wanted to encode, now we can get the result +from the library. This happens by calling {\sf xds\_getbuffer()}. The +routine will store the buffer's address and length at the locations we +provided as parameters. Please note that we can choose whether we want the +buffer as a ``gift'' ({\sf XDS\_GIFT}) or as a ``loan'' ({\sf XDS\_LOAN}). + +The buffer being a ``loan'' means that the buffer is still owned by the +library -- we're only allowed to peak at it. But any call to an libxds +routine may potentially modify the buffer or even change the buffers +location. Hence the result of a {\sf xds\_getbuffer()} call with loaning +semantics is only valid until the next libxds routine is called. After +that, it is invalid. + +If we choose the gift semantics, the buffer we receive will be owned by us; +the library will not touch the buffer again. This means of course, that +we're responsible for {\sf free()}ing the buffer when we don't need it +anymore. + +\paragraph{Line 41.} +Destroy the XDS context and all data associated with it. This is possible +because we requested the buffer as ``gift''; the buffer is not associated +with libxds anymore. + +\paragraph{Line 43.} +Write the buffer with the encoded data to the standard output stream. + +\paragraph{Line 45.} +Now that we don't need the buffer anymore, we have to return the memory it +uses to the system. libxds won't do that for us. + +\paragraph{Lines 47--50.} +Write a short report of what we have done to the standard error channel. + +\bigskip +Finally, let us compile and execute the example program shown above. For +convenience, it is included in the distribution under the name {\sf +docs/encode.c}. You can compile and execute the program as follows: + +\begin{quote} +\begin{verbatim} +simons@dev13:~/libxds$ cd docs +simons@dev13:~/libxds/docs$ gcc -I.. encode.c -o encode -L.. -lxds +simons@dev13:~/libxds/docs$ ./encode >output +Encoded data: + int32 = -42 + uint32 = 0x12345678 + string = "This is a test." +simons@dev13:~/libxds/docs$ ls -l output +-rw-r--r-- 1 simons simons 28 Aug 2 15:21 output +\end{verbatim} +\end{quote} + +The result of executing the programm --- the file {\sf output} --- can be +displayed with {\sf hexdump(1)} or {\sf od(1)} and should look like this: + +\begin{quote} +\begin{Verbatim}[fontsize=\small] +simons@dev13:~/libxds/docs$ hexdump -C output +00000000 ff ff ff d6 12 34 56 78 00 00 00 0f 54 68 69 73 |.....4Vx....This| +00000010 20 69 73 20 61 20 74 65 73 74 2e 00 | is a test..| +0000001c +\end{Verbatim} +\end{quote} + +\noindent +We will also re-use this file in the next section, where we'll read it and +decode those values again. + + +\subsection{Decoding} + +The following example program will read the result of the encoding example +shown in the previous section and decode the values back into the native +representation. Then it will print those values to the standard error +stream so that the user can see the values are correct. Please take a look +at the source now, we'll discuss all relevant details in the following +paragraphs. + +\begin{Verbatim}[numbers=left,fontsize=\small,frame=lines] +#include +#include +#include +#include +#include + +static void error_exit(int rc, const char* msg, ...) + { + va_list args; + va_start(args, msg); + vfprintf(stderr, msg, args); + va_end(args); + exit(rc); + } + +int main() + { + xds_t* xds; + char buffer[1024]; + size_t buffer_len; + int rc; + + xds_int32_t int32; + xds_uint32_t uint32; + char* string; + + buffer_len = 0; + do + { + rc = read(STDIN_FILENO, buffer + buffer_len, sizeof(buffer) - buffer_len); + if (rc < 0) + error_exit(1, "read() failed: %s\n", strerror(errno)); + else if (rc > 0) + buffer_len += rc; + } + while (rc > 0 && buffer_len < sizeof(buffer)); + + if (buffer_len >= sizeof(buffer)) + error_exit(1, "Too much input data for our buffer.\n"); + + xds = xds_init(XDS_DECODE); + if (xds == NULL) + error_exit(1, "Failed to initialize XDS context: %s\n", strerror(errno)); + + if (xds_register(xds, "int32", &xdr_decode_int32, NULL) != XDS_OK || + xds_register(xds, "uint32", &xdr_decode_uint32, NULL) != XDS_OK || + xds_register(xds, "string", &xdr_decode_string, NULL) != XDS_OK) + error_exit(1, "Failed to register my decoding engines!\n"); + + if (xds_setbuffer(xds, XDS_LOAN, buffer, buffer_len) != XDS_OK) + error_exit(1, "setbuffer() failed.\n"); + + if (xds_decode(xds, "int32 uint32 string", &int32, &uint32, &string) != XDS_OK) + error_exit(1, "xds_decode() failed!\n"); + + xds_destroy(xds); + + fprintf(stderr, "Decoded data:\n"); + fprintf(stderr, "\tint32 = %d\n", int32); + fprintf(stderr, "\tuint32 = 0x%x\n", uint32); + fprintf(stderr, "\tstring = \"%s\"\n", string); + + free(string); + + return 0; + } +\end{Verbatim} + +\paragraph{Lines 1--25.} +Include the required header files, define the {\sf error\_exit()} helper +function, and create the required variables on the stack. + +\paragraph{Lines 27--39.} +These instructions will read an unspecified number of bytes from the +standard input stream --- as long as the input does not exceed the size of +the {\sf buffer} variable. In order to provide the program with the +apropriate input, redirect the standard input stream to the file {\sf +output} created in the previous section or connect the encoding and +decoding programs directly by a pipe. + +\paragraph{Lines 41-43.} +Create a context for decoding the values. The semantics are identical to +those described in the previous section. + +\paragraph{Lines 45--48.} +Register the decoding engines in the context. Please note that obviously +the decoding engines must correspond to the encoding engines used to create +the data we're about to process. Using, say, an XML engine to decode XDR +data will at best return with an error --- in the worst case, it will +return incorrect results! + +\paragraph{Lines 50-51.} +Here we do not get a buffer from the library, we \emph{set} the buffer +we've read earlier in the context for decoding. Please note that we use +loan semantics in this case, not gift semantics. This is necessary because +{\sf buffer} has not been allocated by {\sf malloc()} --- the variable +lives on the stack. This means that we cannot give it to libxds because +libxds expects to be able to {\sf free()} the buffer when the context is +destroyed. + +Loan semantics are fine, though, all we have to do is to take care that we +don't erase or modify the contents of {\sf buffer} while libxds operates on +it. The library itself will never touch the buffer in decode mode, no +matter whether loan or gift semantics have been chosen. + +\paragraph{Lines 53--54.} +Here come the actual decoding of the buffer's contents using {\sf +xds\_decode()}. The syntax is identical to {\sf xds\_encode()}'s, the only +difference is that the decoding engines do not expect the values --- like +the encoding engines did --- but the location where to store the value. +Thus we pass the addresses of the apropriate variables here. If the routine +returns with {\sf XDS\_OK}, the decoded values will have been stored in +those locations. + +It should be noted that the decoded string cannot trivially be returned +this way. Instead, {\sf xds\_decode()} will use {\sf malloc()} to allocate +a buffer barely large enough to hold the string. The address of that buffer +is then stored in the pointer {\sf string}. Of course this means that the +application has to {\sf free()} the string once it's not required anymore. + +\paragraph{Line 56.} +We don't need the context anymore, so we destroy it and free all used +resources. This does not affect {\sf buffer} in any way because we used +loan semantics. + +\paragraph{Lines 58-61.} +Print the decoded values to the standard error stream for the user to take +a look at them. + +\paragraph{Line 63.} +Now that we don't need the contents of {\sf string} anymore, we must return +the buffer allocated in {\sf xds\_decode()} to the system. + +\bigskip +Like the encoding program described earlier, the source code to this +program is included in the library distribution as {\sf docs/decode.c}. You +can compile and execute the program like this: + +\begin{quote} +\begin{verbatim} +simons@dev13:~/libxds$ cd docs +simons@dev13:~/libxds/docs$ gcc -I.. decode.c -o decode -L.. -lxds +simons@dev13:~/libxds/docs$ ./decode