From manoj@io.com Wed Aug 16 21:38:14 2000 Path: engelschall.com!mail2news!apache.org!new-httpd-return-7115-rse+apache=en.muc.de From: manoj@io.com (Manoj Kasichainula) Newsgroups: en.lists.apache-new-httpd Subject: My proposal for buckets/filtering in 3.0. Date: 14 Aug 2000 18:59:14 +0200 Organization: Mail2News at engelschall.com Lines: 137 Approved: postmaster@m2ndom Message-ID: <20000814050814.A16555@manojk.users.mindspring.com> Reply-To: new-httpd@apache.org NNTP-Posting-Host: en1.engelschall.com X-Trace: en1.engelschall.com 966272354 36734 141.1.129.1 (14 Aug 2000 16:59:14 GMT) X-Complaints-To: postmaster@engelschall.com NNTP-Posting-Date: 14 Aug 2000 16:59:14 GMT X-Mail2News-Gateway: mail2news.engelschall.com Xref: engelschall.com en.lists.apache-new-httpd:41295 This is quite different than the other proposals. From the feedback I got during the meeting, this isn't suitable for Apache 2.0 because it's still not developed enough (it popped into my head *during* the meeting after all) and it requires more rewriting of existing modules than the current 2.0 filtering scheme. I agree with the sentiment that says to get 2.0 out soon rather than adding more features, so I agree. Also, I have a bad feeling someone else already put forward this idea. But, I'll send it out anyway. The main difference between this design and the others is that there are more buckets and fewer filters. Filters are things that talk to the network, and they are two-way. SSL and chunking are filters. They don't change very much in this case; as is the case now, they are just iols that can handle buckets. Most everything else is a bucket in this case. Actually, they are probably better described as matrushka dolls or those little nested plastic barrels. In fact, I'll call them barrels to increase confusion. Brigades don't exist in this scheme. They are just compound barrels. I think that each barrel will have a URI-identifier. This is useful for caching. Content-manipulating filters in the other designs are barrels in this one. Also, while they were "writers" in the other designs, they are "readers" in this one. So instead of a push-style design where filters are written to by other filters, this scheme has barrels reading from other barrels. So here's how a request would proceed: - HTTP request comes in, munched on by various filters, and passed to the request handler. - request handler picks out the URI that was requested, creates a URI barrel initialized to that URI, and reads the content and metadata out of it. ("ap_create_uri_barrel("http://www.deedee.com/dancing/", barrel_types)->read();") - The URI barrel, when created, figures out how the content should be delivered, and creates subbarrels to deal with them. I'll give three cases: Case #1: A file on the disk Case #2: A cgi script that outputs postscript that gets interpreted into a PNG image file Case #3: a proxy request In case #1, the URI barrel figures out that it's accessing a file, creates a filehandle barrel, and then binds its own content-handling calls to the filehandle barrel. file -> uri -> HTTP handler In case #2, the URI barrel creates a CGI barrel that is initialized with a file barrel pointing to the CGI script. Then uri_barrel creates a postscript barrel initialized with the CGI barrel and the parameter "PNG". The URI barrel then binds its content-handling calls to the postscript barrel. CGI script file -> mod_cgi -> mod_postscript -> uri -> HTTP handler In case #3, the URI barrel figures out that this is a remote request and creates an HTTP client barrel. The URI barrel then binds its conent requst handler to the HTTP barrel. Maybe there needs to be an intervening proxy barrel, or maybe the URI barrel knows proxy semantics, or maybe the request handler needs smarts about HTTP proxies. I'm not in touch with HTTP/1.1 proxying enough to know. HTTP client -> uri -> HTTP server handler - The request handler gets barrels back, and using the appropriate barrel functions, writes their headers and data through the filters back to the client. When anything creates a barrel, it passes in: - a pool to allocate memory from. The stuff read from the barrel must be in the scope of that pool. - a list of capabilities that are required, and a list of preferred capabilities. capabilities include: "send-from-file-descriptor", "send from memory", "write-to-content", and so on. This feels like content negotation, which scares me. But, very little of it is actually necessary in a first pass. A single "memory-block" capability is all that's mandatory, really. Everything else is extra features and optimization. The barrel bring created will then attempt to meet those wishes if possible, or return an error. This is really not mapped out well-enough, and needs code, which will wait until Apache 3.0 development starts up. But, here are the cool features I can imagine: - non-blocking I/O can be a capability of a barrel. One API of the barrel would be to return the file descriptor (or "event") it is waiting for, so that it can be selected on, and a full event-based server should be possible. When some barrel three levels deep doesn't support non-blocking I/O, the request handler can decide to punt to a seperate thread. This way, different programming models for modules can be supported. - Writability would be another capability. If a barrel is writable, that URI is available to DAV. - Set-asides and lifetime of data aren't that much of an issue anymore (or at least I haven't thought of a case where they are). Barrels are naturally kept around only as long as they are needed, since their scope is determined by the consumer of the barrel. Caching is done with a cacher barrel that uses a large-scoped pool, for example. - This scheme allows not just chains of barrels, but trees. So half of a document can come from PHP and half can come from SSI. You just need a container barrel that knows what parts are interpreted by what modules. There could be an MS-Word barrel that munches on multiple HTTP requests for HTML + images. This scares me. - Because of the content-negotiation features, a barrel can always figure out what the optimal format for sending content back should be. - subrequests are really easy (thanks for the idea of URI barrels, Ryan) - Should allow "magic cache" like was discussed back at the June/July '98 meeting. In fact, it should be really easy. There are plenty of unanswered questions here, such as how metadata will work, how exactly proxies fit into this, and how things like DAV collections fit in. But I'm sleepy. From fielding@kiwi.ICS.UCI.EDU Wed Aug 16 21:39:15 2000 Path: engelschall.com!mail2news!apache.org!new-httpd-return-7124-rse+apache=en.muc.de From: fielding@kiwi.ICS.UCI.EDU ("Roy T. Fielding") Newsgroups: en.lists.apache-new-httpd Subject: Re: My proposal for buckets/filtering in 3.0. Date: 15 Aug 2000 07:00:31 +0200 Organization: Mail2News at engelschall.com Lines: 155 Approved: postmaster@m2ndom Message-ID: <200008141352.aa13983@gremlin-relay.ics.uci.edu> Reply-To: new-httpd@apache.org NNTP-Posting-Host: en1.engelschall.com X-Trace: en1.engelschall.com 966315631 57145 141.1.129.1 (15 Aug 2000 05:00:31 GMT) X-Complaints-To: postmaster@engelschall.com NNTP-Posting-Date: 15 Aug 2000 05:00:31 GMT X-Mail2News-Gateway: mail2news.engelschall.com Xref: engelschall.com en.lists.apache-new-httpd:41304 >Most everything else is a bucket in this case. Actually, they are >probably better described as matrushka dolls or those little nested >plastic barrels. In fact, I'll call them barrels to increase >confusion. Hah, cute, but how is it different than our current subrequests? >I think that each barrel will have a URI-identifier. This is useful >for caching. Yes. We would also need to name the handlers that define how a barrel is constructed for a given request-URI. >Content-manipulating filters in the other designs are barrels in this >one. Also, while they were "writers" in the other designs, they are >"readers" in this one. So instead of a push-style design where filters >are written to by other filters, this scheme has barrels reading from >other barrels. > >So here's how a request would proceed: > >- HTTP request comes in, munched on by various filters, and passed to > the request handler. > >- request handler picks out the URI that was requested, creates a URI > barrel initialized to that URI, and reads the content and metadata > out of it. > ("ap_create_uri_barrel("http://www.deedee.com/dancing/", > barrel_types)->read();") > >- The URI barrel, when created, figures out how the content should be > delivered, and creates subbarrels to deal with them. I'll give three > cases: > > Case #1: A file on the disk > Case #2: A cgi script that outputs postscript that gets interpreted > into a PNG image file > Case #3: a proxy request > > In case #1, the URI barrel figures out that it's accessing a file, > creates a filehandle barrel, and then binds its own content-handling > calls to the filehandle barrel. > > file -> uri -> HTTP handler > > In case #2, the URI barrel creates a CGI barrel that is initialized > with a file barrel pointing to the CGI script. Then uri_barrel > creates a postscript barrel initialized with the CGI barrel and the > parameter "PNG". The URI barrel then binds its content-handling > calls to the postscript barrel. > > CGI script file -> mod_cgi -> mod_postscript -> uri -> HTTP handler > > In case #3, the URI barrel figures out that this is a remote request > and creates an HTTP client barrel. The URI barrel then binds its > conent requst handler to the HTTP barrel. Maybe there needs to be an > intervening proxy barrel, or maybe the URI barrel knows proxy > semantics, or maybe the request handler needs smarts about HTTP > proxies. I'm not in touch with HTTP/1.1 proxying enough to know. > > HTTP client -> uri -> HTTP server handler > >- The request handler gets barrels back, and using the appropriate > barrel functions, writes their headers and data through the filters > back to the client. Yep, subrequests. >When anything creates a barrel, it passes in: > >- a pool to allocate memory from. The stuff read from the barrel must > be in the scope of that pool. >- a list of capabilities that are required, and a list of preferred > capabilities. capabilities include: "send-from-file-descriptor", > "send from memory", "write-to-content", and so on. > > This feels like content negotation, which scares me. But, very > little of it is actually necessary in a first pass. A single > "memory-block" capability is all that's mandatory, really. > Everything else is extra features and optimization. Hmmm, sorry, I think I've heard that phrase one too many times this past week. Figure out what the architectural context is -- all of the forces that will impact this design in terms of the application needs. When you have covered all of that, everything else is extra features and optimization. Things like single-copy IO and sendfile support are not optimizations -- they are the requirements that motivate our next generation architecture. >The barrel bring created will then attempt to meet those wishes if >possible, or return an error. Hmmm, that sounds like magic to me. The whole point of bucket brigades was to specify that magic in a way that can be standard for all modules. >This is really not mapped out well-enough, and needs code, which will >wait until Apache 3.0 development starts up. But, here are the cool >features I can imagine: > >- non-blocking I/O can be a capability of a barrel. One API of the > barrel would be to return the file descriptor (or "event") it is > waiting for, so that it can be selected on, and a full event-based > server should be possible. When some barrel three levels deep > doesn't support non-blocking I/O, the request handler can decide to > punt to a seperate thread. This way, different programming models > for modules can be supported. It does mean that something has to read from the barrel and write to the network, right? Or is this a model where we give the network to the barrel and it writes? Kind of hard top manage the latter. >- Writability would be another capability. If a barrel is writable, > that URI is available to DAV. Yes, all source resources should be available to DAV. The way to do that is to asign them URI and pass their identifiers as metadata. Let the protocol filter decide what to do with that information. >- Set-asides and lifetime of data aren't that much of an issue anymore > (or at least I haven't thought of a case where they are). Barrels > are naturally kept around only as long as they are needed, since > their scope is determined by the consumer of the barrel. Caching is > done with a cacher barrel that uses a large-scoped pool, for example. Right, just like the 1.3 subrequest architecture. >- This scheme allows not just chains of barrels, but trees. So half of > a document can come from PHP and half can come from SSI. You just > need a container barrel that knows what parts are interpreted by > what modules. There could be an MS-Word barrel that munches on > multiple HTTP requests for HTML + images. This scares me. It should. Keep in mind that all of the sources would have to pass through the access control steps. >- Because of the content-negotiation features, a barrel can always > figure out what the optimal format for sending content back should > be. You mean every barrel will have to know everything about the request, including things like HTTP negotiation? Yikes. >- subrequests are really easy (thanks for the idea of URI barrels, > Ryan) I don't see much difference from 1.3 subrequests. >- Should allow "magic cache" like was discussed back at the June/July > '98 meeting. In fact, it should be really easy. Easy to identify cacheable items, yes, but how easy is it for the cache manager to manage overall allocations and reap old entries? ....Roy