Wishes -- use cases for layered IO ================================== [Feel free to add your own] Dirk's original list: --------------------- This file is there so that I do not have to remind myself about the reasons for Layered IO, apart from the obvious one. 0. To get away from a 1 to 1 mapping i.e. a single URI can cause multiple backend requests, in arbitrary configurations, such as in paralel, tunnel/piped, or in some sort of funnel mode. Such multiple backend requests, with fully layered IO can be treated exactly like any URI request; and recursion is born :-) 1. To do on the fly charset conversion Be, theoretically, be able to send out your content using latin1, latin2 or any other charset; generated from static _and_ dynamic content in other charsets (typically unicode encoded as UTF7 or UTF8). Such conversion is prompted by things like the user-agent string, a cookie, or other hints about the capabilities of the OS, language preferences and other (in)capabilities of the final receipient. 2. To be able to do fancy templates Have your application/cgi sending out an XML structure of field/value pair-ed contents; which is substituted into a template by the web server; possibly based on information accessible/known to the webserver which you do not want to be known to the backend script. Ideally that template would be just as easy to generate by a backend as well (see 0). 3. On the fly translation And other general text and output mungling, such as translating an english page in spanish whilst it goes through your Proxy, or JPEG-ing a GIF generated by mod_perl+gd. Dw. Dean's canonical list of use cases ---------------------------------- Date: Mon, 27 Mar 2000 17:37:25 -0800 (PST) From: Dean Gaudet To: new-httpd@apache.org Subject: canonical list of i/o layering use cases Message-ID: i really hope this helps this discussion move forward. the following is the list of all applications i know of which have been proposed to benefit from i/o layering. - data sink abstractions: - memory destination (for ipc; for caching; or even for abstracting things such as strings, which can be treated as an i/o object) - pipe/socket destination - portability variations on the above - data source abstraction, such as: - file source (includes proxy caching) - memory source (includes most dynamic content generation) - network source (TCP-to-TCP proxying) - database source (which is probably, under the covers, something like a memory source mapped from the db process on the same box, or from a network source on another box) - portability variations in the above sources - filters: - encryption - translation (ebcdic, unicode) - compression - chunking - MUX - mod_include et al and here are some of my thoughts on trying to further quantify filters: a filter separates two layers and is both a sink and a source. a filter takes an input stream of bytes OOOO... and generates an output stream of bytes which can be broken into blocks such as: OOO NNN O NNNNN ... where O = an old or original byte copied from the input and N = a new byte generated by the filter for each filter we can calculate a quantity i'll call the copied-content ratio, or CCR: nbytes_old / nbytes_new where: nbytes_old = number of bytes in the output of the filter which are copied from the input (in zero-copy this would mean "copy by reference counting an input buffer") nbytes_new = number of bytes which are generated by the filter which weren't present in the input examples: CCR = infinity: who cares -- straight through with no transformation. the filter shouldn't even be there. CCR = 0: encryption, translation (ebcdic, unicode), compression. these get zero benefit from zero-copy. CCR > 0: chunking, MUX, mod_include from the point of view of evaluating the benefit of zero-copy we only care about filters with CCR > 0 -- because CCR = 0 cases degenerate into a single-copy scheme anyhow. it is worth noting that the large_write heuristic in BUFF fairly clearly handles zero-copy at very little overhead for CCRs larger than DEFAULT_BUFSIZE. what needs further quantification is what the CCR of mod_include would be. for a particular zero-copy implementation we can find some threshold k where filters with CCRs >= k are faster with the zero-copy implementation and CCRs < k are slower... faster/slower as compared to a baseline implementation such as the existing BUFF. it's my opinion that when you consider the data sources listed above, and the filters listed above that *in general* the existing BUFF heuristics are faster than a complete zero-copy implementation. you might ask how does this jive with published research such as the IO-Lite stuff? well, when it comes right down to it, the research in the IO-Lite papers deal with very large CCRs and contrast them against a naive buffering implementation such as stdio -- they don't consider what a few heuristics such as apache's BUFF can do. Dean Jim's summary of a discussion ----------------------------- OK, so the main points we wish to address are (in no particular order): 1. zero-copy 2. prevent modules/filters from having to glob the entire data stream in order to start processing/filtering 3. the ability to layer and "multiplex" data and meta-data in the stream 4. the ability to perform all HTTP processing at the filter level (including proxy), even if not implemented in this phase 5. Room for optimization and recursion Jim Jagielski Roy's ramblings --------------- Data flow networks are a very well-defined and understood software architecture. They have a single, very important constraint: no filter is allowed to know anything about the nature of its upstream or downstream neighbors beyond what is defined by the filter's own interface. That constraint is what makes data flow networks highly configurable and reusable. Those are properties that we want from our filters. ... One of the goals of the filter concept was to fix the bird's nest of interconnected side-effect conditions that allow buff to perform well without losing the performance. That's why there is so much trepidation about anyone messin with 1.3.x buff. ... Content filtering is my least important goal. Completely replacing HTTP parsing with a filter is my primary goal, followed by a better proxy, then internal memory caches, and finally zero-copy sendfile (in order of importance, but in reverse order of likely implementation). Content filtering is something we get for free using the bucket brigade interface, but we don't get anything for free if we start with an interface that only supports content filtering. ... I don't think it is safe to implement filters in Apache without either a smart allocation system or a strict limiting mechanism that prevents filters from buffering more than 8KB [or user-definable amount] of memory at a time (for the entire non-flushed stream). It isn't possible to create a robust server implementation using filters that allocate memory from a pool (or the heap, or a stack, or whatever) without somehow reclaiming and reusing the memory that gets written out to the network. There is a certain level of "optimization" that must be present before any filtering mechanism can be in Apache, and that means meeting the requirement that the server not keel over and die the first time a user requests a large filtered file. XML tree manipulation is an example where that can happen. ... Disabling content-length just because there are filters in the stream is a blatant cop-out. If you have to do that then the design is wrong. At the very least the HTTP filter/buff should be capable of discovering whether it knows the content length by examing whether it has the whole response in buffer (or fd) before it sends out the headers. ... No layered-IO solution will work with the existing memory allocation mechanisms of Apache. The reason is simply that some filters can incrementally process data and some filters cannot, and they often won't know the answer until they have processed the data they are given. This means the buffering mechanism needs some form of overflow mechanism that diverts parts of the stream into a slower-but-larger buffer (file), and the only clean way to do that is to have the memory allocator for the stream also do paging to disk. You can't do this within the request pool because each layer may need to allocate more total memory than is available on the machine, and you can't depend on some parts of the response being written before later parts are generated because some filtering decisions require knowledge of the end of the stream before they can process the beginning. ... The purpose of the filtering mechanism is to provide a useful and easy to understand means for extending the functionality of independent modules (filters) by rearranging them in stacks via a uniform interface. Paul J. Reder's use cases for filters ------------------------------------- 1) Containing only text. 2) Containing 10 .gif or .jpg references (perhaps filtering from one format to the other). 3) Containing an exec of a cgi that generates a text only file 4) Containing an exec of a cgi that generates an SSI of a text only file. 5) Containing an exec of a cgi that generates an SSI that execs a cgi that generates a text only file (that swallows a fly, I don't know why). 6) Containing an SSI that execs a cgi that generates an SSI that includes a text only file. NOTE: Solutions must be able to handle *both* 5 and 6. Order shouldn't matter. 7) Containing text that must be altered via a regular expression filter to change all occurrences of "rederpj" to "misguided" 8) Containing text that must be altered via a regular expression filter to change all occurrences of "rederpj" to "lost" 9) Containing perl or php that must be handed off for processing. 10) A page in ascii that needs to be converted to ebcdic, or from one code page to another. 11) Use the babelfish translation filter to translate text on a page from Spanish to Martian-Swahili. 12) Translate to Esperanto, compress, and encrypt the output from a php program generated by a perl script called from a cgi exec embedded in a file included by an SSI :)