OSSP: CVS Repository: ossp-pkg/sio/BRAINSTORM/asf-ideas.txt

ossp-pkg/sio/BRAINSTORM/asf-ideas.txt
From manoj@io.com Wed Aug 16 21:38:14 2000
Path: engelschall.com!mail2news!apache.org!new-httpd-return-7115-rse+apache=en.muc.de
From: manoj@io.com (Manoj Kasichainula)
Newsgroups: en.lists.apache-new-httpd
Subject: My proposal for buckets/filtering in 3.0.
Date: 14 Aug 2000 18:59:14 +0200
Organization: Mail2News at engelschall.com
Lines: 137
Approved: postmaster@m2ndom
Message-ID: <20000814050814.A16555@manojk.users.mindspring.com>
Reply-To: new-httpd@apache.org
NNTP-Posting-Host: en1.engelschall.com
X-Trace: en1.engelschall.com 966272354 36734 141.1.129.1 (14 Aug 2000 16:59:14 GMT)
X-Complaints-To: postmaster@engelschall.com
NNTP-Posting-Date: 14 Aug 2000 16:59:14 GMT
X-Mail2News-Gateway: mail2news.engelschall.com
Xref: engelschall.com en.lists.apache-new-httpd:41295

This is quite different than the other proposals. From the feedback I
got during the meeting, this isn't suitable for Apache 2.0 because
it's still not developed enough (it popped into my head *during* the
meeting after all) and it requires more rewriting of existing modules
than the current 2.0 filtering scheme. I agree with the sentiment that
says to get 2.0 out soon rather than adding more features, so I agree.

Also, I have a bad feeling someone else already put forward this idea.
But, I'll send it out anyway.

The main difference between this design and the others is that there
are more buckets and fewer filters.

Filters are things that talk to the network, and they are two-way. SSL
and chunking are filters. They don't change very much in this case; as
is the case now, they are just iols that can handle buckets.

Most everything else is a bucket in this case. Actually, they are
probably better described as matrushka dolls or those little nested
plastic barrels. In fact, I'll call them barrels to increase
confusion.

Brigades don't exist in this scheme. They are just compound barrels.

I think that each barrel will have a URI-identifier. This is useful
for caching.

Content-manipulating filters in the other designs are barrels in this
one. Also, while they were "writers" in the other designs, they are
"readers" in this one. So instead of a push-style design where filters
are written to by other filters, this scheme has barrels reading from
other barrels.

So here's how a request would proceed:

- HTTP request comes in, munched on by various filters, and passed to
  the request handler.

- request handler picks out the URI that was requested, creates a URI
  barrel initialized to that URI, and reads the content and metadata
  out of it.
  ("ap_create_uri_barrel("http://www.deedee.com/dancing/",
  barrel_types)->read();")

- The URI barrel, when created, figures out how the content should be
  delivered, and creates subbarrels to deal with them. I'll give three
  cases:

  Case #1: A file on the disk
  Case #2: A cgi script that outputs postscript that gets interpreted
  into a PNG image file
  Case #3: a proxy request

  In case #1, the URI barrel figures out that it's accessing a file,
  creates a filehandle barrel, and then binds its own content-handling
  calls to the filehandle barrel.

  file -> uri -> HTTP handler

  In case #2, the URI barrel creates a CGI barrel that is initialized
  with a file barrel pointing to the CGI script. Then uri_barrel
  creates a postscript barrel initialized with the CGI barrel and the
  parameter "PNG". The URI barrel then binds its content-handling
  calls to the postscript barrel.

  CGI script file -> mod_cgi -> mod_postscript -> uri -> HTTP handler

  In case #3, the URI barrel figures out that this is a remote request
  and creates an HTTP client barrel. The URI barrel then binds its
  conent requst handler to the HTTP barrel. Maybe there needs to be an
  intervening proxy barrel, or maybe the URI barrel knows proxy
  semantics, or maybe the request handler needs smarts about HTTP
  proxies. I'm not in touch with HTTP/1.1 proxying enough to know.

  HTTP client -> uri -> HTTP server handler
  
- The request handler gets barrels back, and using the appropriate
  barrel functions, writes their headers and data through the filters
  back to the client.

When anything creates a barrel, it passes in:

- a pool to allocate memory from. The stuff read from the barrel must
  be in the scope of that pool.
- a list of capabilities that are required, and a list of preferred
  capabilities. capabilities include: "send-from-file-descriptor",
  "send from memory", "write-to-content", and so on.

  This feels like content negotation, which scares me. But, very
  little of it is actually necessary in a first pass. A single
  "memory-block" capability is all that's mandatory, really.
  Everything else is extra features and optimization.

The barrel bring created will then attempt to meet those wishes if
possible, or return an error.

This is really not mapped out well-enough, and needs code, which will
wait until Apache 3.0 development starts up. But, here are the cool
features I can imagine:

- non-blocking I/O can be a capability of a barrel. One API of the
  barrel would be to return the file descriptor (or "event") it is
  waiting for, so that it can be selected on, and a full event-based
  server should be possible. When some barrel three levels deep
  doesn't support non-blocking I/O, the request handler can decide to
  punt to a seperate thread. This way, different programming models
  for modules can be supported. 

- Writability would be another capability. If a barrel is writable,
  that URI is available to DAV.

- Set-asides and lifetime of data aren't that much of an issue anymore
  (or at least I haven't thought of a case where they are). Barrels
  are naturally kept around only as long as they are needed, since
  their scope is determined by the consumer of the barrel. Caching is
  done with a cacher barrel that uses a large-scoped pool, for example.

- This scheme allows not just chains of barrels, but trees. So half of
  a document can come from PHP and half can come from SSI. You just
  need a container barrel that knows what parts are interpreted by
  what modules. There could be an MS-Word barrel that munches on
  multiple HTTP requests for HTML + images. This scares me.

- Because of the content-negotiation features, a barrel can always
  figure out what the optimal format for sending content back should
  be.

- subrequests are really easy (thanks for the idea of URI barrels,
  Ryan) 

- Should allow "magic cache" like was discussed back at the June/July
  '98 meeting. In fact, it should be really easy.

There are plenty of unanswered questions here, such as how metadata
will work, how exactly proxies fit into this, and how things like DAV
collections fit in. But I'm sleepy.

From fielding@kiwi.ICS.UCI.EDU Wed Aug 16 21:39:15 2000
Path: engelschall.com!mail2news!apache.org!new-httpd-return-7124-rse+apache=en.muc.de
From: fielding@kiwi.ICS.UCI.EDU ("Roy T. Fielding")
Newsgroups: en.lists.apache-new-httpd
Subject: Re: My proposal for buckets/filtering in 3.0.
Date: 15 Aug 2000 07:00:31 +0200
Organization: Mail2News at engelschall.com
Lines: 155
Approved: postmaster@m2ndom
Message-ID: <200008141352.aa13983@gremlin-relay.ics.uci.edu>
Reply-To: new-httpd@apache.org
NNTP-Posting-Host: en1.engelschall.com
X-Trace: en1.engelschall.com 966315631 57145 141.1.129.1 (15 Aug 2000 05:00:31 GMT)
X-Complaints-To: postmaster@engelschall.com
NNTP-Posting-Date: 15 Aug 2000 05:00:31 GMT
X-Mail2News-Gateway: mail2news.engelschall.com
Xref: engelschall.com en.lists.apache-new-httpd:41304

>Most everything else is a bucket in this case. Actually, they are
>probably better described as matrushka dolls or those little nested
>plastic barrels. In fact, I'll call them barrels to increase
>confusion.

Hah, cute, but how is it different than our current subrequests?

>I think that each barrel will have a URI-identifier. This is useful
>for caching.

Yes.  We would also need to name the handlers that define how a
barrel is constructed for a given request-URI.

>Content-manipulating filters in the other designs are barrels in this
>one. Also, while they were "writers" in the other designs, they are
>"readers" in this one. So instead of a push-style design where filters
>are written to by other filters, this scheme has barrels reading from
>other barrels.
>
>So here's how a request would proceed:
>
>- HTTP request comes in, munched on by various filters, and passed to
>  the request handler.
>
>- request handler picks out the URI that was requested, creates a URI
>  barrel initialized to that URI, and reads the content and metadata
>  out of it.
>  ("ap_create_uri_barrel("http://www.deedee.com/dancing/",
>  barrel_types)->read();")
>
>- The URI barrel, when created, figures out how the content should be
>  delivered, and creates subbarrels to deal with them. I'll give three
>  cases:
>
>  Case #1: A file on the disk
>  Case #2: A cgi script that outputs postscript that gets interpreted
>  into a PNG image file
>  Case #3: a proxy request
>
>  In case #1, the URI barrel figures out that it's accessing a file,
>  creates a filehandle barrel, and then binds its own content-handling
>  calls to the filehandle barrel.
>
>  file -> uri -> HTTP handler
>
>  In case #2, the URI barrel creates a CGI barrel that is initialized
>  with a file barrel pointing to the CGI script. Then uri_barrel
>  creates a postscript barrel initialized with the CGI barrel and the
>  parameter "PNG". The URI barrel then binds its content-handling
>  calls to the postscript barrel.
>
>  CGI script file -> mod_cgi -> mod_postscript -> uri -> HTTP handler
>
>  In case #3, the URI barrel figures out that this is a remote request
>  and creates an HTTP client barrel. The URI barrel then binds its
>  conent requst handler to the HTTP barrel. Maybe there needs to be an
>  intervening proxy barrel, or maybe the URI barrel knows proxy
>  semantics, or maybe the request handler needs smarts about HTTP
>  proxies. I'm not in touch with HTTP/1.1 proxying enough to know.
>
>  HTTP client -> uri -> HTTP server handler
>  
>- The request handler gets barrels back, and using the appropriate
>  barrel functions, writes their headers and data through the filters
>  back to the client.

Yep, subrequests.

>When anything creates a barrel, it passes in:
>
>- a pool to allocate memory from. The stuff read from the barrel must
>  be in the scope of that pool.
>- a list of capabilities that are required, and a list of preferred
>  capabilities. capabilities include: "send-from-file-descriptor",
>  "send from memory", "write-to-content", and so on.
>
>  This feels like content negotation, which scares me. But, very
>  little of it is actually necessary in a first pass. A single
>  "memory-block" capability is all that's mandatory, really.
>  Everything else is extra features and optimization.

Hmmm, sorry, I think I've heard that phrase one too many times this
past week.  Figure out what the architectural context is -- all of the
forces that will impact this design in terms of the application needs.
When you have covered all of that, everything else is extra features
and optimization.  Things like single-copy IO and sendfile support are
not optimizations -- they are the requirements that motivate our next
generation architecture.

>The barrel bring created will then attempt to meet those wishes if
>possible, or return an error.

Hmmm, that sounds like magic to me.  The whole point of bucket brigades
was to specify that magic in a way that can be standard for all modules.

>This is really not mapped out well-enough, and needs code, which will
>wait until Apache 3.0 development starts up. But, here are the cool
>features I can imagine:
>
>- non-blocking I/O can be a capability of a barrel. One API of the
>  barrel would be to return the file descriptor (or "event") it is
>  waiting for, so that it can be selected on, and a full event-based
>  server should be possible. When some barrel three levels deep
>  doesn't support non-blocking I/O, the request handler can decide to
>  punt to a seperate thread. This way, different programming models
>  for modules can be supported. 

It does mean that something has to read from the barrel and write to
the network, right?  Or is this a model where we give the network to
the barrel and it writes?  Kind of hard top manage the latter.

>- Writability would be another capability. If a barrel is writable,
>  that URI is available to DAV.

Yes, all source resources should be available to DAV.  The way to do that
is to asign them URI and pass their identifiers as metadata.  Let the
protocol filter decide what to do with that information.

>- Set-asides and lifetime of data aren't that much of an issue anymore
>  (or at least I haven't thought of a case where they are). Barrels
>  are naturally kept around only as long as they are needed, since
>  their scope is determined by the consumer of the barrel. Caching is
>  done with a cacher barrel that uses a large-scoped pool, for example.

Right, just like the 1.3 subrequest architecture.

>- This scheme allows not just chains of barrels, but trees. So half of
>  a document can come from PHP and half can come from SSI. You just
>  need a container barrel that knows what parts are interpreted by
>  what modules. There could be an MS-Word barrel that munches on
>  multiple HTTP requests for HTML + images. This scares me.

It should.  Keep in mind that all of the sources would have to pass
through the access control steps.

>- Because of the content-negotiation features, a barrel can always
>  figure out what the optimal format for sending content back should
>  be.

You mean every barrel will have to know everything about the request,
including things like HTTP negotiation?  Yikes.

>- subrequests are really easy (thanks for the idea of URI barrels,
>  Ryan) 

I don't see much difference from 1.3 subrequests.

>- Should allow "magic cache" like was discussed back at the June/July
>  '98 meeting. In fact, it should be really easy.

Easy to identify cacheable items, yes, but how easy is it for the
cache manager to manage overall allocations and reap old entries?

....Roy
OSSP CVS Repository