OSSP: CVS Repository: ossp-pkg/xds/docs/libxds.tex 1.18

ossp-pkg/xds/docs/libxds.tex 1.18
% -*- mode: LaTeX; fill-column: 75; -*-
%
% $Id: libxds.tex,v 1.18 2001/08/30 15:02:53 simons Exp $
%
\documentclass[a4paper,10pt,pointlessnumbers,bibtotoc]{scrartcl}
\usepackage[dvips,xdvi]{graphicx}
\usepackage{fancyvrb}
\typearea[2cm]{12}
\fussy

\begin{document}

\titlehead{Cable \& Wireless Deutschland GmbH\\Application Services\\Development Team}
\title{OSSP XDS ---\\Extensible Data Serialization}
\author{Peter Simons $<$simons@computer.org$>$}
\date{2001-08-01}
\maketitle

\section{Introduction}

In today's networked world, computer systems of all brands and flavours
communicate with each other. Unfortunately, these systems are far from
being compatible: Many systems use different internal representations for
the same thing. Look at the (hexadecimal) number \$1234 for instance: On a
big endian machine, this number is be stored in memory the way you'd
intuitively expect: \$12~\$34 --- the more significant byte preceeds the
less significant one. On a little endian machine, though, the number \$1234
is be stored like this: \$34~\$12 --- exactly the other way round.

As a result, you cannot just write the number \$1234 to a socket and expect
the other end to understand it correctly, because if the endians differ,
the reader will read a different number than the writer sent. Things will
get even more complicated when you start exchanging floating point numbers,
for which about a dozen different encodings exist!

\begin{figure}[tbh]
    \begin{center}
        \includegraphics[width=\textwidth]{data-exchange.eps}
        \caption{Data exchange using XDS}
        \label{data exchange}
    \end{center}
\end{figure}

Solving these problems is the domain of XDS; its purpose is to encode data
in a way that allows this data to be exchanged between different computer
systems. Assume you'd want to transfer the value \$1234 from host A to host
B. Then you would encode it using XDS, transfer the encoded data over the
network, and decode the value again at the other end. Every program that
follows this process will read the correct value no matter what native
representation is used internally.

There is a rich variety of applications for such a functionality: XDS
may be used to encode data before it is written to disk or read from the
disk, it may be used to encode data to be exchanged between processes over
the network, etc. Because of this variety, special attention has been paid
to the library design.

\paragraph{The library has been designed to be extensible.}
The functionality is split into a generic encoding and decoding framework
and a set of encoding and decoding engines. These engines can be plugged
into the framework at run-time to actually do the encoding and decoding of
data. Because of this architecture, XDS can be customized to deploy any
data format the developer sees fit. Included in the distribution are
engines for the XDR format specified in \cite{xdr} and for the XML format
specified in \cite{xml}.

\paragraph{The library is convenient to use.}
An arbitrary number of variables can be encoded or decoded with one single
function call. All memory management is done by XDS, the developer
doesn't need bother to allocate or to manage buffers for the encoded or
decoded data. Automatic buffer management can be switched off at run-time,
though, for maximum performance.

\paragraph{Performance.}
Since all transferred data has to pass through XDS, the library has been
written to encode and decode with maximum performance. The generic encoding
framework adds almost no run-time overhead to the actual encoding process:
If non-automatic buffer management has been selected, hardly anything but
the actual encoding/decoding engine is executed.

\paragraph{Robustness.}
In order to verify that the library works correctly, a set of regression
tests is included in the distribution. These test suites will --- among
other things --- encode known values and compare the result with the
expected (correct) values. This ensures that XDS works correctly on any
platform.

\paragraph{Use of standard formats.}
The supported XDR and XML formats are widely known and accepted, meaning
that XDS is interoperable with other marshaling implementations: It is
possible to encode data with XDS and to decode it with an entirely
different XDR implementation or vice versa.

\paragraph{Portability.}
XDS has been written with portability in mind. Development took place on
FreeBSD, Linux and Solaris; other platforms were used to test the results.
It is expected that XDS will compile and function on virtually any
POSIX.1-compliant system with a moderately modern ISO-C compiler. GNU's
C~Compiler~(gcc) is known to compile the library just fine. For maximum
portability, GNU Autoconf has been used to determine the target system's
properties.

\section{Architecture of XDS}

\begin{figure}[htb]
    \begin{center}
        \includegraphics[width=\textwidth]{architecture.eps}
        \caption{Components of XDS}
        \label{XDS components}
    \end{center}
\end{figure}

The architecture of XDS is illustrated in figure~\ref{XDS components}. XDS
consists of three components: The generic encoding and decoding framework,
a set of engines to encode and decode values in a certain format, and a
run-time context, which is used to manage buffers, registered engines, etc.

In order to use the library, the first thing the developer has to do is to
create a valid XDS context by calling \textsf{xds\_init()}. The routine
requires one parameter that determines whether to operate in encoding- or
decoding mode. A context can be used for encoding or decoding only; it is
not possible to use the same context for both operations. Once a valid XDS
context has been obtained, the routine \textsf{xds\_register()} can be used to
register an arbitrary number of encoding or decoding engines within the
context.

Two sets of engines are included in the library. These routines will handle
any elementary datatype defined by the ISO-C language, such as 32-bit
integers, 64-bit integers, unsigned integers (of both 32- and 64-bit),
floating point numbers, strings and octet streams.

Once all required encoding/decoding engines are registered, the routines
\textsf{xds\_encode()} or \textsf{xds\_\-decode()} may be used to actually
perform the encoding or decoding process. Any data type for which an engine
has been registered can be handled by the library.

This means, that it is possible for the developer to write custom engines
for any data type he desires to use and to register them in the context ---
as long as these engines adhere to the \textsf{xds\_engine\_t} interface
defined in \textsf{xds.h}.

In particular, it is possible to register meta engines. That is an engine
designed to encode or decode data types, which consist of several
elementary data types. Such an engine will simply re-use the existing
engines to encode or decode the elements of the structure.

\section{Using the XDS library}

\subsection{Encoding}

The following example program will encode three variables using the XDR
engines. The encoded results will be written to the standard output stream,
which can be redirected to a file or piped into the decoding program
described in the next section. Just take a look at the source code for a
moment, we will then go on to discuss all relevant sections line by line.

\begin{Verbatim}[numbers=left,fontsize=\small,frame=lines]
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <xds.h>

static void error_exit(int rc, const char* msg, ...)
    {
    va_list args;
    va_start(args, msg);
    vfprintf(stderr, msg, args);
    va_end(args);
    exit(rc);
    }

int main()
    {
    xds_t* xds;
    char*  buffer;
    size_t buffer_size;

    xds_int32_t  int32  = -42;
    xds_uint32_t uint32 = 0x12345678;
    const char*  string = "This is a test.";

    xds = xds_init(XDS_ENCODE);
    if (xds == NULL)
        error_exit(1, "Failed to initialize XDS context: %s\n", strerror(errno));

    if (xds_register(xds, "int32",  &xdr_encode_int32, NULL) != XDS_OK ||
        xds_register(xds, "uint32", &xdr_encode_uint32, NULL) != XDS_OK ||
        xds_register(xds, "string", &xdr_encode_string, NULL) != XDS_OK)
        error_exit(1, "Failed to register my encoding engines!\n");

    if (xds_encode(xds, "int32 uint32 string", int32, uint32, string) != XDS_OK)
        error_exit(1, "xds_encode() failed!\n");

    if (xds_getbuffer(xds, XDS_GIFT, (void**)&buffer, &buffer_size) != XDS_OK)
        error_exit(1, "getbuffer() failed.\n");

    xds_destroy(xds);

    write(STDOUT_FILENO, buffer, buffer_size);

    free(buffer);

    fprintf(stderr, "Encoded data:\n");
    fprintf(stderr, "\tint32   = %d\n", int32);
    fprintf(stderr, "\tuint32 = 0x%x\n", uint32);
    fprintf(stderr, "\tstring = \"%s\"\n", string);

    return 0;
    }
\end{Verbatim}

\paragraph{Lines 1--5.}
The program starts by including several system headers, which define the
prototypes for the routines we use. The most interesting header in our case
is of course \textsf{xds.h} --- the header of XDS. Please note that all
declarations required to use XDS are included in that file.

\paragraph{Lines 7--13.}
The \textsf{error\_exit()} routine is not relevant for the example; we just
define it to make the rest of the source code shorter and easier to read.

\paragraph{Lines 16--53.}
This is the interesting part: The \textsf{main()} routine. This function will
create the variables to be encoded on the stack, assign values to them,
initialize the XDS library, use it to encode the values, and write the
result of the encoding process to the standard output stream. Read on for
further details.

\paragraph{Lines 26--28.}
First of all we have to obtain an XDS context. This is done by calling
\textsf{xds\_init()}. Since we intend to \emph{encode} data, we initialize
the context in encoding mode. The only other mode of operation would be
decoding mode, but this is demonstrated in the next section.

All other routines in XDS return a code from a small list of return codes
defined in \textsf{xds.h}, but \textsf{xds\_init()} is different: It
returns a pointer to an \textsf{xds\_t} in case of success and
\textsf{NULL} in case of failure. One reason why \textsf{xds\_init()} would
fail is because it can't allocate the memory required to initialize the
context. In this case, the system variable \textsf{errno} is set to
\textsf{ENOMEM}. The reason why \textsf{xds\_init()} would fail is because
the mode parameter is invalid, in which case \textsf{errno} woulde be set
to \textsf{EINVAL}. If XDS has been compiled with assertions enabled, such
an error would result in an assertion error, terminating the program with a
diagnostic message immediately.

\paragraph{Lines 30--33.}
Once we have obtained a valid XDS context, we register the engines we need.
In this example, we'll encode a signed 32-bit integer, an unsigned 32-bit
integer, and a string. We'll be using XDR encoding in this case, so the
engines to register are \textsf{xdr\_encode\_int32()},
\textsf{xdr\_encode\_uint32()}, and \textsf{xdr\_encode\_string()}. (A
complete list of available engines can be found in \textsf{xds.h}, in
section~\ref{xdr}~and~\ref{xml}, or in the manual pages for the library.)
Please note that we could switch the encoding format simply by using the
corresponding \textsf{xml\_encode\_XXX()} engines here. We could even mix
XDR and XML encoding as we see fit, but it's hard to think of a case where
this would make sense~\dots{}

The developer is free to choose a name he'd like to register an engine
under, but the name may only contain alphanumerical characters, hyphens
(``\verb#-#'') or underscores (``\verb#_#''). You can choose any name you
want, but it is recommended to follow the naming scheme of the
corresponding engine. Why this is recommended will become clear in
section~\ref{meta engines}.

\paragraph{Lines 35--36.}
This is where the actual encoding takes place. As parameters,
\textsf{xds\_encode()} requires a valid encoding context plus a format
string that describes how the following parameters are to be interpreted.
While the concept is obviously identical to \textsf{sprintf(3)}, the syntax
is different: The format string may contain an arbitrary number of names,
which are delimited by an arbitrary number of any character that is not a
legal character for engine names. We recommend to delimit the names by
colons or blanks.

For each valid engine name in the format string, a corresponding parameter
must follow in the varadic argument. How these parameters are interpreted
depends on the engine you're using. The engines provided with the XDS
library will expect the value to encode, but theoretically developers are
free to write encoding and decoding engines that expect virtually any kind
of information here. More about this will explained in section~\ref{meta
engines}.

\paragraph{Lines 38--39.}
We have encoded all values we wanted to encode, now we can get the result
from the library. This happens by calling \textsf{xds\_getbuffer()}. The
routine will store the buffer's address and length at the locations we
provided as parameters. Please note that we can choose whether we want the
buffer as a ``gift'' (\textsf{XDS\_GIFT}) or as a ``loan'' (\textsf{XDS\_LOAN}).

The buffer being a ``loan'' means that the buffer is still owned by the
library --- we're only allowed to peek at it. Any call to an XDS routine
may modify the buffer or even change the buffers location. Hence the result
of a \textsf{xds\_getbuffer()} call with loaning semantics is only valid
until the next XDS routine is called.

If we ``gift'' semantics, the buffer we receive will be owned by us; the
library will not touch the buffer again. This means of course, that we're
responsible for \textsf{free(3)}ing the buffer when we don't need it
anymore.

\paragraph{Line 41.}
Destroy the XDS context and all data associated with it. This is possible
because we requested the buffer as ``gift''; the buffer is not associated
with XDS anymore.

\paragraph{Line 43.}
Write the buffer with the encoded data to the standard output stream.

\paragraph{Line 45.}
Now that we don't need the buffer anymore, we can return the memory it uses
to the system.

\paragraph{Lines 47--50.}
Write a short report of what we have done to the standard error channel.

\bigskip
Finally, let us compile and execute the example program shown above. For
convenience, it is included in the distribution as \textsf{docs/encode.c}.
You can compile and execute the program as follows:

\begin{quote}
\begin{verbatim}
$ gcc -I.. encode.c -o encode -L.. -lxds
$ ./encode >output
Encoded data:
        int32   = -42
        uint32 = 0x12345678
        string = "This is a test."
$ ls -l output
-rw-r--r--  1 simons  simons  28 Aug  2 15:21 output
\end{verbatim}
\end{quote}

The result of executing the programm --- the file \textsf{output} --- can be
displayed with \textsf{hexdump(1)} or \textsf{od(1)} and should look like this:

\begin{quote}
\begin{Verbatim}[fontsize=\small]
$ hexdump -C output
00000000  ff ff ff d6 12 34 56 78  00 00 00 0f 54 68 69 73  |.....4Vx....This|
00000010  20 69 73 20 61 20 74 65  73 74 2e 00              | is a test..|
0000001c
\end{Verbatim}
\end{quote}

\noindent
We will also re-use this file in the next section, where we'll read it and
decode those values again.


\subsection{Decoding}

The following example program will read the result of the encoding example
shown in the previous section and decode the values back into the native
representation. Then it will print those values to the standard error
stream so that the user can see the values are correct. Please take a look
at the source now, we'll discuss all relevant details in the following
paragraphs.

\begin{Verbatim}[numbers=left,fontsize=\small,frame=lines]
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <xds.h>

static void error_exit(int rc, const char* msg, ...)
    {
    va_list args;
    va_start(args, msg);
    vfprintf(stderr, msg, args);
    va_end(args);
    exit(rc);
    }

int main()
    {
    xds_t* xds;
    char   buffer[1024];
    size_t buffer_len;
    int rc;

    xds_int32_t  int32;
    xds_uint32_t uint32;
    char*        string;

    buffer_len = 0;
    do
        {
        rc = read(STDIN_FILENO, buffer + buffer_len, sizeof(buffer) - buffer_len);
        if (rc < 0)
            error_exit(1, "read() failed: %s\n", strerror(errno));
        else if (rc > 0)
            buffer_len += rc;
        }
    while (rc > 0 && buffer_len < sizeof(buffer));

    if (buffer_len >= sizeof(buffer))
        error_exit(1, "Too much input data for our buffer.\n");

    xds = xds_init(XDS_DECODE);
    if (xds == NULL)
        error_exit(1, "Failed to initialize XDS context: %s\n", strerror(errno));

    if (xds_register(xds, "int32",  &xdr_decode_int32, NULL) != XDS_OK ||
        xds_register(xds, "uint32", &xdr_decode_uint32, NULL) != XDS_OK ||
        xds_register(xds, "string", &xdr_decode_string, NULL) != XDS_OK)
        error_exit(1, "Failed to register my decoding engines!\n");

    if (xds_setbuffer(xds, XDS_LOAN, buffer, buffer_len) != XDS_OK)
        error_exit(1, "setbuffer() failed.\n");

    if (xds_decode(xds, "int32 uint32 string", &int32, &uint32, &string) != XDS_OK)
        error_exit(1, "xds_decode() failed!\n");

    xds_destroy(xds);

    fprintf(stderr, "Decoded data:\n");
    fprintf(stderr, "\tint32   = %d\n", int32);
    fprintf(stderr, "\tuint32 = 0x%x\n", uint32);
    fprintf(stderr, "\tstring = \"%s\"\n", string);

    free(string);

    return 0;
    }
\end{Verbatim}

\paragraph{Lines 1--25.}
Include the required header files, define the \textsf{error\_exit()} helper
function, and create the required variables on the stack.

\paragraph{Lines 27--39.}
These instructions will read an any number of bytes from the
standard input stream --- as long as the input does not exceed the size of
the \textsf{buffer} variable. In order to provide the program with the
appropriate input, redirect the standard input stream to the file
\textsf{output} created in the previous section or connect the encoding and
decoding programs directly by a pipe.

\paragraph{Lines 41-43.}
Create a context for decoding the values. The semantics are identical to
those described in the encoding example.

\paragraph{Lines 45--48.}
Register the decoding engines in the context. Please note that obviously
the decoding engines must correspond to the encoding engines used to create
the data we're about to process. Using, say, an XML engine to decode XDR
data will at best return an error --- in the worst case, it will return
incorrect results!

\paragraph{Lines 50-51.}
Here we do not \emph{get} a buffer from the library, we \emph{set} the
buffer we've read earlier in the context for decoding. Please note that we
use loan semantics in this case, not gift semantics. This is necessary
because \textsf{buffer} has not been allocated by \textsf{malloc(3)} ---
the variable is located on the stack. This means, that we cannot give it to
XDS because XDS expects to be able to \textsf{free(3)} the buffer when the
context is destroyed.

Loan semantics are fine, though, all we have to do is to take care that we
don't erase or modify the contents of \textsf{buffer} while XDS operates on
it. The library itself will never touch the buffer in decode mode, no
matter whether loan or gift semantics have been chosen.

\paragraph{Lines 53--54.}
Here comes the actual decoding of the buffer's content using
\textsf{xds\_decode()}. The syntax is identical to
\textsf{xds\_encode()}'s, the only difference is that the decoding engines
do not expect values like the encoding engines did, but the location where
to store the value. Thus, we pass the addresses of the appropriate
variables here. If the routine returns with \textsf{XDS\_OK}, the decoded
values will have been stored in those locations.

It should be noted that the decoded string cannot trivially be returned
this way. Instead, \textsf{xds\_decode()} will use \textsf{malloc(3)} to
allocate a buffer large enough to hold the string. The address of that
buffer is then stored in the pointer \textsf{string}. Of course, this means
that the application has to \textsf{free(3)} the string once it's not
required anymore.

\paragraph{Line 56.}
We don't need the context anymore, so we destroy it and thereby free all
used resources. This does not affect \textsf{buffer} in any way because we
used loan semantics.

\paragraph{Lines 58-61.}
Print the decoded values to the standard error stream for the user to take
a look at them.

\paragraph{Line 63.}
Now that we don't need the contents of \textsf{string} anymore, we must return
the buffer allocated in \textsf{xds\_decode()} to the system.

\bigskip
Like the encoding program described earlier, the source code to this
program is included in the library distribution as \textsf{docs/decode.c}.
You can compile and execute the program like this:

\begin{quote}
\begin{verbatim}
$ gcc -I.. decode.c -o decode -L.. -lxds
$ ./decode <output
Decoded data:
        int32   = -42
        uint32 = 0x12345678
        string = "This is a test."
\end{verbatim}
\end{quote}

We assume that the \textsf{output} file has been created as described in
the previous section. Otherwise, you can execute both programs like this:

\begin{quote}
\begin{Verbatim}[fontsize=\small]
$ ./encode | ./decode
Encoded data:
        int32   = -42
        uint32 = 0x12345678
        string = "This is a test."
Decoded data:
        int32   = -42
        uint32 = 0x12345678
        string = "This is a test."
\end{Verbatim}
\end{quote}

\noindent
This will encode and decode the values without the need for a temporary
file.

\section{Extending the XDS library}
\label{meta engines}

Now that we know how primitive data types can be encoded and decoded, let's
write a ``meta engine'' that will handle complex data structures. For the
example, we'll use the structure ``mystruct'', which is defined as follows:

\begin{quote}
\begin{verbatim}
struct mystruct
    {
    xds_int32_t   small;
    xds_int64_t   big;
    xds_uint32_t  positive;
    char          text[16];
    };
\end{verbatim}
\end{quote}

Some readers might wonder why the structure is defined using these weird
data types rather than the familiar ones like \textsf{int}, \textsf{long},
etc. The reason is that these data types have an undefined size. An
\textsf{int} variable will have, say, 32 bits when compiled on the average
Unix machine, but when the same program is compiled on a 64-bit machine
like TRUE64 Unix, it will have a size of 64 bit. That is a problem when
those structures have to be exchanged between entirely different systems,
because the structures are binary incompatible --- something even XDS
cannot remedy.

In order to encode an instance of this structure, we write an encoding
engine:

\begin{quote}
\begin{verbatim}
static int encode_mystruct(xds_t* xds, void* engine_context,
                           void* buffer, size_t buffer_size,
                           size_t* used_buffer_size,
                           va_list* args)
    {
    struct mystruct* ms;
    ms = va_arg(*args, struct mystruct*);
    return xds_encode(xds, "int32 int64 uint32 octetstream",
                      ms->small, ms->big, ms->positive,
                      ms->text, sizeof(ms->text));
    }
\end{verbatim}
\end{quote}

This engine takes the address of the ``mystruct'' structure from the stack
and then uses \textsf{xds\_encode()} to handle all elements of ``mystruct''
separately --- which is fine, because these data types are supperted by XDS
already. It is worth noting, though, that we refer to the other engines by
name, meaning that these engines must be registered in ``xds'' by that
name!

What is very nice, though, is the fact that this encoding engine does not
even need to know which engines are used to encode the actual values! If
the user registeres the XDR engines under the appropriate names,
``mystruct'' will be encoded in XDR. If the user registeres the XML engines
under the appropriate names, ``mystruct'' will be encoded in XML. Because
of that property, we call such an engine a ``meta engine''.

Of coures you need not necessarily implement an engine as \emph{meta}
engine: Rather than going through \textsf{xds\_encode()}, it would be
possible to execute the appropriate encoding engines directly. This had the
advantage of not depending on those engines being registered at all, but it
would make the custom engine depend on the elementary engines --- what is
an unnecessary limitation.

One more word about the engine syntax and semantics: As has been mentioned
earlier, any function that adheres to the interface shown above is
potentially an engine. These parameters have the following meaning:

\begin{itemize}

\item xds --- This is the XDS context that was originally provided to the
\textsf{xds\_encode()} call, which in turn executed the engine. It may be
used, for example, for executing \textsf{xds\_en\-code()} again like we did
in our example engines.

\item engine\_context --- The engine context can be used by the engine to
store any type of internal information. The value the engine will receive
must have been provided when the engine was registered by
\textsf{xds\_register()}. Engines obviously may neglect this parameter if
they don't need a context of their own --- all engines included in the
distribution do so.

\item buffer --- This parameter points to the buffer the encoded data
should be written to. In decoding mode, ``buffer'' points to the encoded
data, which should be decoded; the location where the results should be
stored at can be found on the stack then.

\item buffer\_size --- The number of bytes available in ``buffer''. In
encoding mode, this means ``free space'', in decoding mode,
``buffer\_size'' determines how many bytes of encoded data are available in
``buffer'' for consumption.

\item used\_buffer\_size --- This parameter points to a variable, which the
callback must set before returning in order to let the framework know how
many bytes it consumed from ``buffer''. A callback encoding, say, an int32
number into a 8 bytes text representation would set the used\_buffer\_size
to 8:
\begin{quote}
\begin{verbatim}
*used_buffer_size = 8;
\end{verbatim}
\end{quote}
In encoding mode, this variable determines how many bytes the engine has
written into ``buffer''; in decoding mode the variable determines how many
bytes the engines has read from ``buffer''.

\item args --- This pointer points to an initialized varadic argument. Use
the standard C macro \textsf{va\_arg(3)} to fetch the actual data.

\end{itemize}

A callback may return any of the following return codes, as defined in
\textsf{xds.h}:

\begin{itemize}
\item XDS\_OK --- No error.

\item XDS\_ERR\_NO\_MEM --- Failed to allocate required memory.

\item XDS\_ERR\_OVERFLOW --- The buffer is too small to hold all encoded
data. The callback may set ``*used\_buffer\_size'' to the number of bytes
it needs in ``buffer'', thereby giving the framework a hint by how many
bytes it should enlarge the buffer before trying the engine again, but just
leaving ``*used\_buffer\_size'' alone will work fine too, it may just be a
bit less efficient in some cases. Obviously this return code does not make
much sense in decoding mode.

\item XDS\_ERR\_INVALID\_ARG --- Unexpected or incorrect parameters.

\item XDS\_ERR\_TYPE\_MISMATCH --- This return code will be returned in
decoding mode in case the decoding engine realizes that the data it is
decoding does not fit what it is expecting. Not all encoding formats will
allow to detect this at all. XDR, for example, does not.

\item XDS\_ERR\_UNDERFLOW --- In decode mode, this error is be returned
when an engine needs, say, 4 bytes of data in order to decode a value but
``buffer''/''buffer\_size'' provides less.

\item XDS\_ERR\_UNKNOWN --- Any other reason to fail than those listed
before. Catch all~\dots{}

\end{itemize}

Let's take a look at the corresponding decoding engine now:

\begin{quote}
\begin{verbatim}
static int decode_mystruct(xds_t* xds, void* engine_context,
                           void* buffer, size_t buffer_size,
                           size_t* used_buffer_size,
                           va_list* args)
    {
    struct mystruct* ms;
    size_t i;
    char*  tmp;
    int rc;
    ms = va_arg(*args, struct mystruct*);
    rc = xds_decode(xds, "int32 int64 uint32 octetstream",
                    &(ms->small), &(ms->big), &(ms->positive),
                    &tmp, &i);
    if (rc == XDS_OK)
        {
        if (i == sizeof(ms->text))
            memmove(ms->text, tmp, i);
        else
            rc = XDS_ERR_TYPE_MISMATCH;
        free(tmp);
        }
    return rc;
    }
\end{verbatim}
\end{quote}

The engine simply calls \textsf{xds\_decode()} to handle the separate data
types. The only complication is that the octet stream decoding engines
return a pointer to \textsf{malloc(3)}ed buffer --- what is not what we
need. Thus we have to manually copy the contents of that buffer into the
right place in the structure and free the (now unused) buffer again.

A complete example program encoding and decoding ``mystruct'' can be found
at \textsf{docs/\-extended.c} in the distribution.

\section{The XDS Framework}
\label{xds}

\subsection{xds\_t* xds\_init(xds\_mode\_t~\underline{mode});}

This routine creates and initializes a context for use with the XDS
library. The ``mode'' parameter may be either \textsf{XDS\_ENCODE} or
\textsf{XDS\_DECODE}, depending on whether you want to encode or to decode
data. If successful, \textsf{xds\_init()} returns a pointer to the XDS
context structure. In case of failure, \textsf{xds\_init()} returns
\textsf{NULL} and sets \textsf{errno} to ENOMEM (failed to allocate
internal memory buffers) or EINVAL (``mode'' parameter was invalid).

A context obtained from \textsf{xds\_init()} must be destroyed by
\textsf{xds\_destroy()} when it is not needed any more.

\subsection{void xds\_destroy(xds\_t*~\underline{xds});}

\textsf{xds\_destroy()} will destroy an XDS context created by
\textsf{xds\_init()}. Doing so will return all resources associated with
this context --- most notably the memory used to buffer the results of
encoding or decoding any values. A context may not be used after it has
been destroyed.

\subsection{int xds\_register(xds\_t*~\underline{xds}, const~char*~\underline{name}, xds\_engine\_t~\underline{engine}, void*~\underline{engine\_context});}

This routine will register an engine in the provided XDS context. An
``engine'' is potentially any function that fullfils the following
interface:

\begin{quote}
\begin{verbatim}
int engine(xds_t* xds, void* engine_context,
           void* buffer, size_t buffer_size, size_t* used_buffer_size,
           va_list* args);
\end{verbatim}
\end{quote}

By calling \textsf{xds\_register()}, the engine ``engine'' will be
registered under the name ``name'' in the XDS context ``xds''. The last
parameter ``engine\_context'' may be used as the user sees fit: It will be
passed when the engine is actually called and may be used to implement an
engine-specific context. Most engines will not need a context of their own,
in which case \textsf{NULL} should be specified here.

Please note that until the user calls \textsf{xds\_register()} for an XDS
context he obtained from \textsf{xds\_init()}, no engines are registered
for that context. Even the engines included in the library distribution are
not registered automatically.

For engine names, any combination of the characters ``a--z'', ``A--Z'',
``0--9'', ``-'', and ``\_'' may be used; anything else is not a legal
engine name component.

\textsf{xds\_register()} may return the following return codes:
\textsf{XDS\_OK} (everything went fine; the engine is registered now),
\textsf{XDS\_ERR\_INVALID\_ARG} (either ``xds'', ``name'', or ``engine''
are \textsf{NULL} or ``name'' contains illegal characters for an engine
name), or \textsf{XDS\_ERR\_NO\_MEM} (failed to allocate internally
required buffers).

\subsection{int xds\_unregister(xds\_t*~\underline{xds}, const~char*~\underline{name});}

\textsf{xds\_unregister()} will remove the engine ``name'' from XDS context
``xds''. The function will return \textsf{XDS\_OK} in case everything went
fine, \textsf{XDS\_ERR\_UNKNOWN\_ENGINE} in case the engine ``name'' is not
registered in ``xds'', or \textsf{XDS\_ERR\_INVALID\_ARG} if either ``xds''
or ``name'' are \textsf{NULL} or ``name'' contains illegal characters for
an engine name.

\subsection{int xds\_setbuffer(xds\_t*~\underline{xds}, xds\_scope\_t~\underline{flag}, void*~\underline{buffer}, size\_t~\underline{buffer\_len});}

\begin{figure}[tbh]
    \begin{center}
        \includegraphics[width=\textwidth]{setbuffer-logic.eps}
        \caption{xds\_setbuffer() modes of operation}
        \label{setbuffer logic}
    \end{center}
\end{figure}

This routine allows the user to control XDS' buffer handling: Calling it
will replace the buffer currently used in ``xds''. The address and size of
that buffer are passed to \textsf{xds\_setbuffer()} via the ``buffer'' and
``buffer\_len'' parameters. The ``xds'' parameter determines for which XDS
context the new buffer will be set. Furthermore, you can set ``flag'' to
either \textsf{XDS\_GIFT} or \textsf{XDS\_LOAN}.

\textsf{XDS\_GIFT} will tell XDS that the provided buffer is now owned by
the library and that it may be resized by calling \textsf{realloc(3)}.
Furthermore, the buffer is \textsf{free(3)}ed when ``xds'' is destroyed. If
``flag'' is \textsf{XDS\_GIFT} and ``buffer'' is \textsf{NULL},
\textsf{xds\_setbuffer()} will simply allocate a buffer of its own to be
set in ``xds''. Please note that a buffer given to XDS as gift \emph{must}
have been allocated using \textsf{malloc(3)} --- it may not live on the
stack because XDS will try to free or to resize the buffer as it sees fit.

Passing \textsf{XDS\_LOAN} via ``flag'' tells \textsf{xds\_setbuffer()}
that the buffer is owned by the application and that XDS should not free
nor resize the buffer in any case. In this mode, passing a buffer of
\textsf{NULL} will result in an invalid-argument error.

\subsection{int xds\_getbuffer(xds\_t*~\underline{xds}, xds\_scope\_t~\underline{flag}, void**~\underline{buffer}, size\_t*~\underline{buffer\_len});}

This routine is the counterpart to \textsf{xds\_setbuffer()}: It will get
the buffer currently used in the XDS context ``xds''. The address of that
buffer is stored in the location ``buffer'' points to; the length of the
buffer's content will be stored in the location ``buffer\_len'' points to.

The ``flag'' argument may be set to either \textsf{XDS\_GIFT} or
\textsf{XDS\_LOAN}. The first setting means that the buffer is now owned by
the application and that XDS must not use it after this
\textsf{xds\_getbuffer()} call anymore; the library will allocate a new
internal buffer instead. Of course, this also means that the buffer will
not be freed by \textsf{xds\_destroy()}; the application has to
\textsf{free(3)} the buffer itself when it is not needed anymore.

Setting ``flag'' to \textsf{XDS\_LOAN} tells XDS that the application just
wishes to peek into the buffer and will not modify it. The buffer is still
owned (and used) by XDS. Please note that the loaned address returned by
\textsf{xds\_getbuffer()} may change after any other \textsf{xds\_xxx()}
function call!

The routine will return \textsf{XDS\_OK} (everything went fine) or
\textsf{XDS\_ERR\_INVALID\_ARG} (``xds'', ``buffer'' or ``buffer\_len'' are
\textsf{NULL} or ``flag'' is invalid) signifying success or failure
respectively.

Please note: It is perfectly legal for \textsf{xds\_getbuffer()} to return
a buffer of \textsf{NULL} and a buffer length of 0! This happens when
\textsf{xds\_getbuffer()} is called for an XDS context before a buffer has
been allocated.

\subsection{int xds\_vencode(xds\_t*~\underline{xds}, const~char*~\underline{fmt}, va\_list~\underline{args});}

This routine will encode one or several values using the appropriate
encoding engines registered in XDS context ``xds''. The parameter ``fmt''
contains a \textsf{sprintf(3)}-alike descriptions of the values to be
encoded; the actual values are provided in the varadic parameter ``args''.

The format for ``fmt'' is simple: Just provide the names of the engines to
be used for encoding the appropriate value(s) in ``args''. Any non-legal
engine-name character may be used as a delimiter. In order to encode two
32-bit integers followed by a 64-bit integer, the format string
\begin{quote}
\begin{verbatim}
int32 int32 int64
\end{verbatim}
\end{quote}
could be used. In case you don't like the blank, use the colon instead:
\begin{quote}
\begin{verbatim*}
int32:int32:int64
\end{verbatim*}
\end{quote}

Of course the names to be used here have to match to the names used to
register the engines in ``xds'' earlier.

Every time \textsf{xds\_vencode()} is called, it will append the encoded
data at the end of the internal buffer stored in ``xds''. Thus, you can
call \textsf{xds\_vencode()} several times in order to encode several
values, but you'll still get all encoded values stored in one buffer.
Calling \textsf{xds\_setbuffer()} or \textsf{xds\_getbuffer()} with gift
semantics at any point during encoding will re-set the buffer to the
beginning. All values that have been encoded into that buffer already will
eventually be overwritten when \textsf{xds\_encode()} is called again.
Hence: Don't call \textsf{xds\_setbuffer()} or \textsf{xds\_getbuffer()}
unless you actually want to access the data stored in the buffer.

Also, it should be noted that the data you have to provide for ``args''
depends entirely on what the deployed engines expect to find on the stack
--- there is no ``standard'' on what should be put on the stack here. The
XML and XDR engines included in the distribution will simply expect the
value to be encoded to be found on the stack, but other engines may act
differently. See section~\ref{meta engines} for an example of such an
engine.

\textsf{xds\_vencode()} will return any of the following return codes:
\textsf{XDS\_OK} (everything worked fine), \textsf{XDS\_ERR\_NO\_MEM}
(failed to allocate or to resize the internal buffer),
\textsf{XDS\_ERR\_OVER\-FLOW} (the internal buffer is too small but is not
owned by us), \textsf{XDS\_ERR\_INVALID\_ARG} (``xds'' or ``fmt'' are
\textsf{NULL}), \textsf{XDS\_ERR\_UNKNOWN\_ENGINE} (an engine name
specified in ``fmt'' is not registered in ``xds''),
\textsf{XDS\_ERR\_INVALID\_MODE} (``xds'' is initialized in decode mode),
or \textsf{XDS\_ERR\_UNKNOWN} (the engine returned an unspecified error).

\subsection{int xds\_encode(xds\_t*~\underline{xds}, const~char*~\underline{fmt}, \dots{});}

This routine is basically identical to \textsf{xds\_vencode()}, only that
it uses a different prototype syntax.

\subsection{int xds\_vdecode(xds\_t*~\underline{xds}, const~char*~\underline{fmt}, va\_list~\underline{args});}

This routine is almost identical to \textsf{xds\_vencode()}: It expects an
XDS context, a format string and a set of parameters for the engines, but
\textsf{xds\_vdecode()} does not encode any data, it decodes the data back
into the native format. The format string determines which engines are to
be called by the framework in order to decode the values contained in the
buffer. The values will then be stored at the locations found in the
corresponding ``args'' entry. But please note that the exact behavior of
the decoding engines is not specified! The XML and XDR engines included in
this distribution expect a pointer to a location where to store the decoded
value, but other engines may vary.

\textsf{xds\_vdecode()} may return any of the following return codes:
\textsf{XDS\_OK} (everything went fine), \textsf{XDS\_ERR\_INVALID\_ARG}
(``xds'' or ``fmt'' are \textsf{NULL}), \textsf{XDS\_ERR\_TYPE\_MISMATCH}
(the format string says the next value is of type $A$, but that's not what
we found in the buffer), \textsf{XDS\_ERR\_UNKNOWN\_ENGINE} (an engine name
specified in ``fmt'' is not registered in ``xds''),
\textsf{XDS\_ERR\_INVALID\_MODE} (``xds'' has been initialized in encode
mode), \textsf{XDS\_ERR\_UNDER\-FLOW} (an engine tried to read $n$ bytes
from the buffer, but we don't have that much data left), or
\textsf{XDS\_ERR\_UNKNOWN} (an engine returned an unspecified error).

\subsection{int xds\_decode(xds\_t*~\underline{xds}, const~char*~\underline{fmt}, \dots{});}

This routine is basically identical to \textsf{xds\_vdecode()}, only that
it uses a different prototype syntax.

\section{The XDR Engines}
\label{xdr}

\begin{tabular}{|c|c|c|c|} \hline
\bf Function Name     & \bf Expected ``args'' Datatype & \bf Input & \bf Output \\ \hline
xdr\_encode\_uint32()      & xds\_uint32\_t   & 4 bytes  & 4 bytes  \\
xdr\_decode\_uint32()      & xds\_uint32\_t*  & 4 bytes  & 4 bytes  \\[1ex]
xdr\_encode\_int32()       & xds\_int32\_t    & 4 bytes  & 4 bytes  \\
xdr\_decode\_int32()       & xds\_int32\_t*   & 4 bytes  & 4 bytes  \\[1ex]
xdr\_encode\_uint64()      & xds\_uint64\_t   & 4 bytes  & 4 bytes  \\
xdr\_decode\_uint64()      & xds\_uint64\_t*  & 4 bytes  & 4 bytes  \\[1ex]
xdr\_encode\_int64()       & xds\_int64\_t    & 4 bytes  & 4 bytes  \\
xdr\_decode\_int64()       & xds\_int64\_t*   & 4 bytes  & 4 bytes  \\[1ex]
xdr\_encode\_float()       & xds\_float\_t    & 4 bytes  & 4 bytes  \\
xdr\_decode\_float()       & xds\_float\_t*   & 4 bytes  & 4 bytes  \\[1ex]
xdr\_encode\_double()      & xds\_double\_t   & 8 bytes  & 8 bytes  \\
xdr\_decode\_double()      & xds\_double\_t*  & 8 bytes  & 8 bytes  \\[1ex]
xdr\_encode\_octetstream() & void*, size\_t   & variable & variable \\
xdr\_decode\_octetstream() & void**, size\_t* & variable & variable \\[1ex]
xdr\_encode\_string()      & char*            & variable & variable \\
xdr\_decode\_string()      & char**           & variable & variable \\ \hline
\end{tabular}
\medskip

Please note that the routines \textsf{xdr\_decode\_octetstream()} and
\textsf{xdr\_decode\_string()} return a pointer to a buffer holding the
decoded data. This buffer has been allocated with \textsf{malloc(3)} and
must be \textsf{free(3)}ed by the application when it is not required
anymore. All other callbacks write the decoded value into the location
found on the stack, but these behave differently because the length of the
decoded data is not known in advance and the application cannot provide a
buffer that's guaranteed to suffice.

\section{The XML Engines}
\label{xml}

\begin{tabular}{|c|c|c|c|} \hline
\bf Function Name     & \bf Expected ``args'' Datatype & \bf Input & \bf Output \\ \hline
xml\_encode\_uint32()      & xds\_uint32\_t   & 4 bytes      & 18--27 bytes \\
xml\_decode\_uint32()      & xds\_uint32\_t*  & 18--27 bytes & 4 bytes      \\[1ex]
xml\_encode\_int32()       & xds\_int32\_t    & 4 bytes      & 16--26 bytes \\
xml\_decode\_int32()       & xds\_int32\_t*   & 16--26 bytes & 4 bytes      \\[1ex]
xml\_encode\_uint64()      & xds\_uint64\_t   & 8 bytes      & 18--37 bytes \\
xml\_decode\_uint64()      & xds\_uint64\_t*  & 18--37 bytes & 8 bytes      \\[1ex]
xml\_encode\_int64()       & xds\_int64\_t    & 8 bytes      & 16--36 bytes \\
xml\_decode\_int64()       & xds\_int64\_t*   & 16--36 bytes & 8 bytes      \\[1ex]
xml\_encode\_float()       & xds\_float\_t    & 4 bytes      & variable     \\
xml\_decode\_float()       & xds\_float\_t*   & variable     & 4 bytes      \\[1ex]
xml\_encode\_double()      & xds\_double\_t   & 8 bytes      & variable     \\
xml\_decode\_double()      & xds\_double\_t*  & variable     & 8 bytes      \\[1ex]
xml\_encode\_octetstream() & void*, size\_t   & variable     & variable     \\
xml\_decode\_octetstream() & void**, size\_t* & variable     & variable     \\[1ex]
xml\_encode\_string()      & char*            & variable     & variable     \\
xml\_decode\_string()      & char**           & variable     & variable     \\ \hline
\end{tabular}
\medskip

Please note that the routines xml\_decode\_octetstream() and
xml\_decode\_string() return a pointer to a buffer holding the decoded
data. This buffer has been allocated with \textsf{malloc(3)} and must be
\textsf{free(3)}ed by the application when it is not required anymore. All
other callbacks write the decoded value into the location found on the
stack, but these behave differently because the length of the decoded data
is not known in advance and the application cannot provide a buffer that's
guaranteed to suffice.

\section{Frequently Asked Questions}

\subsection{Why do we have separate encoding and decoding modes?}

Some users complained about having to maintain separate XDS contexts for
encoding and decoding. They wondered, why it is not possible to encode and
decode with a single XDS context. The reason is that this limitatiton makes
the XDS context structure and the programmer's API for XDS much simpler. If
we were able to use a single context for encoding and decoding, we had to
maintain \emph{two} lists of registered engines per XDS context: One set of
encoding engines and one set of decoding engines. Consequently, the
\textsf{xds\_register()} function would need to take an additional
parameter, which determines whether you're registering an encoding or an
decoding engine. All this is not necessary in the current design, because
one list of registered engines suffices.

Another important topic is buffer management. The buffer handling in
encoding mode is subtly different from that in encoding mode: The XDS
context contains a buffer, the size of that buffer and a kind of ``current
position'' pointer. When an engine stores, say, 8 bytes of encoded data in
the buffer, \textsf{xds\_vencode()} will increase the current position
by 8 bytes --- the next encoding engine will append its encoded data at the
end of the buffer. If the current position reaches the end of the
buffer, the buffer is reallocated with an appropriately bigger size.

In decoding mode, the same variables in the XDS context have a different
meaning: Since the buffer is never going to be resized, the buffer size
does not correspond to the size of the memory chunk that constitutes the
buffer, it says how many bytes of information the buffer contains; it's the
length of the contents. The current position is initialized to the
beginning of the buffer and every time an engine claims to have decoded,
say, 8 bytes from the buffer, the current position is increased by 8 bytes
towards the end of the buffer. If the current position reaches the end of
the buffer's contents, an \textsf{XDS\_UNDERFLOW} error is returned.

Buffer handling is different in encoding and decoding mode in so far as
that in encoding mode, the initial buffer is empty and the current position
moves with the end of the content, determining where new data should be
appended. In decoding mode, the initial buffer is filled and the current
position wanders from the beginning to the end of the content.

Thus, if an XDS context should be used for both encoding and decoding, the
library had to manage two different buffers because the encoding and
decoding buffers have different semantics. Thus, the
\textsf{xds\_setbuffer()} and \textsf{xds\_getbuffer()} routines would need
an additional parameter in order to set the two buffers independently.

Considering all that, we found that the current design greatly reduces the
complexity of the implementation and of the API while putting the user only
through minimal inconvenience.

\subsection{What are those xds\_int-something types good for?}
\label{xds int stuff}

The XDS library uses the data types \textsf{xds\_int32\_t}, etc.\ rather
than \textsf{int}. This is necessary because we need to have a definive
size for each data type. In ISO-C, though, the actual size of an
\textsf{int} is undefined. In theory, the system header
\textsf{sys/types.h} defines types with fixed sizes, but unfortunately the
names of these data types vary from vendor to vendor. To solve that, we
defined our own data types. The application programmer might want to take a
look at the first few lines of the \textsf{xds.h} include file to see how
the actual data types are mapped to the \textsf{xds\_xxx\_t} variant on
your system.

\subsection{Why do I have to register all the engines manually?}

One idea that came up during the design of the API was to provide a way to
register all elementary XML or XDR engines with a single function call,
something like this:
 %
\begin{quote}
\begin{verbatim}
xds = xds_init(XDS_ENCODE, XDS_XML);   /* Use the XML engines. */
xds = xds_init(XDS_ENCODE, XDS_XDR);   /* Use the XDR engines. */
\end{verbatim}
\end{quote}
 %
The advantage of this approach is that the application developer does not
need to bother about registering some obscure functions like
\textsf{xdr\_encode\_octetstream()}. We dismissed the idea nonetheless for
the following reasons:

\begin{itemize}

\item Since the library is meant to be extensible, the \textsf{xds\_init()}
has no (good) way of knowing which engines actually exist for an encoding
scheme. Suppose someone writes a whole set of engines that implement the
CORBA format, then he would not be able to register his engines without
re-writing \textsf{xds\_init()}.

\item On a similar notion, \textsf{xds\_init()} would not know about the
meta engines required by the application developer. The call outlined above
would only register the engines for the elementary data types; the meta
engines do not even ``exist'' when the XDS library is compiled.

\item This approach would make it hard to mix engines from different
formats. If all engines are registered manually, the application programmer
may chose to use the XDR format for encoding all kinds of integers, but to
use the XML format for encoding strings, octet streams, or floating point
numbers.

\item If one routine would reference all engines of an encoding format, it
meant that all engines of that format were linked into the binary once the
application accesses that routine. It would not be possible to, say,
register the engines dealing with integers without pulling the floating
point engines into the program too --- even though nobody uses them.

The author of this document wishes to remark, though, that this property of
the library was later uh~\dots{} removed by the decision of the team leader
to merge all engines into one source module per format. Sorry.

\end{itemize}

\begin{thebibliography}{xxx}

\bibitem{xdr} RFC 1832: ``XDR: External Data Representation Standard'',
R.~Srinivasan, August~1995

\bibitem{xml} XML-RPC Home Page: \textsf{http://www.xmlrpc.org/}

\end{thebibliography}

\end{document}
OSSP CVS Repository